Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert .info files to use YAML #78

Closed
quicksketch opened this issue Sep 15, 2013 · 20 comments

Comments

Projects
None yet
7 participants
@quicksketch
Copy link
Member

commented Sep 15, 2013

The move to standardize on YAML files for human-facing configuration files was a good idea in Drupal 8. The ability to replace a Drupal-specific format (.info files) and replace them with a readable, widely-used standard was a great enhancement.

I think Backdrop should follow-suit here, but that means that we need a YAML parser as part of core. The most widely used YAML parsers are Symfony's YAML parser and spyc.

License-wise, both are options (GPL or MIT). Functionality-wise, it's kind of a coin-toss. spyc seems to be reported as faster but only supports YAML 1.0 standard. Symfony supports "most" of YAML 1.2.

For users moving between Drupal and Backdrop, I think adopting the Symfony YAML parser would cause the least amount of confusion. However that also means we'd need to add a PSR-0 autoloader, as is required by Symfony.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2013

Benchmarks on Symfony vs. Spyc:

Symfony YAML parsing time: 1.1368360519409
Spyc YAML parsing time: 0.6804358959198
JSON parsing time: 0.023433923721313

The Spyc YAML parser is consistently about twice as fast as Symfony's parser, though not nearly as fast as JSON still. This test parses 183 YML/JSON files, as generated by the Drupal 8 CMI.

Here's the Gist for this script: https://gist.github.com/quicksketch/6567356

@sun

This comment has been minimized.

Copy link

commented Sep 15, 2013

The differences between 1.1 and 1.2 are minimal. Drupal would actually work fine with 1.1. I don't know the differences between 1.0 and 1.1, but it is possible that they can be ignored. I'd recommend to explore that first, since the performance difference appears to be major.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2013

Yeah, I tested the 183 config files again with both Spyc and Symfony, they were almost 100% compatible. I found two issues:

  • Symfony doesn't encode hyphens correctly. I filed an issue with them: symfony/symfony#9039. Spyc can't read Symfony's incorrect hyphens (and it probably shouldn't) but I filed an issue with them too: mustangostang/spyc#26
  • Spyc reads in new lines as backslash-n \n characters. As opposed to Symfony, which properly interprets them as new lines. I'm trying to see what impact this has, and if it's significant (it probably isn't for .info files).
@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2013

Spyc reads in new lines as backslash-n \n characters.

Scratch that. Again, Symfony's encoder looks like it may be at fault here. It literally uses \n (not a new line) in the generated YAML files. So Spyc is just interpreting it literally. Symfony automatically converts them to new lines when decoding. I'm not sure if that's Symfony abusing YAML or a difference in the format versions.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2013

Nope, it's Spyc's fault this time. :)

Quoted \n characters should be converted to proper new lines on intpretation, as noted in the 1.0 spec here: http://yaml.org/spec/1.0/#id2566934

Filed an issue with Spyc here: mustangostang/spyc#27

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2013

So other than the newline issue, Spyc seems both faster and easier to implement in our current situation (since we can register it with our existing autoloader even before #77 is implemented and skip PSR-0 for the time being). I'm not sure if we'll need a PSR-0 library in the future or not (e.g. Guzzle or other libraries D8 added), but using Spyc certainly would make life easier in the short term. I can't help but notice Guzzle also depends on Symfony2 EventDispatcher, which gets me thinking even that library would start down the dependency chain.

So for the time being I'm pretty much entirely behind Spyc. The one inconsistency I found can be easily fixed, and it doesn't affect our use for .info files anyway (which I wouldn't be using inline \n characters to begin with).

@jenlampton

This comment has been minimized.

Copy link
Member

commented Sep 15, 2013

I'm completely in favor of using yaml for info files. Also, Spyc sounds great to me, twice as fast as Symfony is nothing to sneeze at! Making things sane, and keeping things fast. win-win!

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2013

Okay, I amended the tests once again just for fun, comparing raw PHP vs. the others. For anyone curious how fast YAML is versus PHP.

https://gist.github.com/quicksketch/6575436

Symfony YAML parsing time: 1.1098148822784
Spyc YAML parsing time: 0.72001695632935
JSON parsing time: 0.015068054199219
PHP include and execute time: 0.018171072006226
PHP execute-only time: 0.0055980682373047

I put the results into a table, taking the left column and dividing by the top column.

So results are basically (from slowest to fastest):

Parser Symfony YAML Spyc YAML JSON PHP include+parse PHP parse-only
Symfony YAML - 1.5x slower 73x slower 61x slower 200x slower
Spyc YAML 1.5x faster - 48x slower 40x slower 131x slower
JSON 73x faster 48x faster - 1.2x faster 2.7x slower
PHP include+execute 61x faster 40x faster 1.2x slower - 3.2x slower
PHP execute only 200x faster 131x faster 2.7x faster 3.2x faster -

It's interesting to me that JSON is just as fast or faster than including and then executing PHP. These tests are with APC enabled. Without it, the PHP include+execute times take roughly twice as long, making JSON a consistently faster option when needing to load files.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2013

And here's a followup for peak memory usage without storing the parsed YAML as a variable. So this is peak memory usage per approach, not storing the YAML result, and minus the Drupal bootstrap (3.18 MB). That's about a good as test I could think of for getting memory usage of the approach alone, though some things like the looping through the directory structure may still add some overhead.

Symfony YAML: 1.91 MB
Spyc YAML: 0.99 MB
JSON: 0.86 MB
PHP (APC cached): 0.63 MB
PHP (no APC): 23.55 MB

As far as memory goes, PHP is the clear loser here, since PHP can't free back up the memory after parsing has been completed, which is what I think @larowlan was probably hinting at me on Twitter. ;)

@tedbow

This comment has been minimized.

Copy link

commented Sep 16, 2013

So I think I know the answer to this but just asking to be sure.

Symfony and Spyc would follow the same YAML spec so to the module contributor it wouldn't affect how they made YAML files, right?

Also that would mean you could use Spyc now and then swap out for Symfony in the future without changes need for contrib modules, right?

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2013

@tedbow Right on both counts. Though as it turns out, YAML isn't a very easy-to-implement spec. As far as Drupal uses YAML (for info files, services, configuration files), we should be entirely compatible. As noted above, I found two inconsistencies with Spyc vs. Symfony, but both problems have already been solved.

@lsmith77

This comment has been minimized.

Copy link

commented Sep 16, 2013

Note that Symfony also supports this very useful official YAML extension: http://yaml.org/type/merge.html
This along with the ability to use comments makes YAML nicer for application configuration than JSON. As for performance, maybe there is room for improvement in the Symfony implementation. We do not really feel the pain as its all cached to PHP.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2013

his along with the ability to use comments makes YAML nicer for application configuration than JSON.

Hi @lsmith77, I agree there. The JSON benchmarks are actually much more relevant to the CMI issue (#2) than they are here. I think what's likely to happen in Backdrop is the following breakdown of formats:

  • PHP: (info hooks) The fastest if code is already parsed anyway, used for class registry, element info, and other situations where values are not cached either because the cache system isn't loaded or it's faster not to use it.
  • JSON: (CMI) The fastest flat-file format. Used in situations where machine is both writing and reading the format, with little human editing.
  • YAML: (Info files) Significantly slower but human-writable format. Used in situations where a human is writing the file by hand, but a machine is reading it.

Any discussion about why we shouldn't use JSON for CMI should go in that issue: #2.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Sep 18, 2013

I've got a branch that gets nearly everything converted and tests almost fully passing https://github.com/quicksketch/backdrop/compare/yaml-info

There are 5-6 tests failing at the moment. Probably just a few YAML files that aren't formatted correctly.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2013

I'm having second thoughts about this conversion. The primary reason for converting to YAML files is to make tests identified in new *.tests.yml files. However as pointed out in #81, we can simply extend the .info file format to support groups. This results in:

  • Fewer changes from D7.
  • A faster solution than either YML parser.
  • Brings info files inline with .ini file format, other than our extension to support nested arrays.

The downsides:

  • YML is not available for other (as of yet unknown) purposes.
  • .info is still a Drupal format, even though it's very similar to INI and not very difficult to learn.

I'm increasingly concerned about the divergence from D7, and changes like this definitely take the option of cross-compatibility off the table. In any case, this conversion is still a "move X to Y", with limited benefit. I'd like to explore .info files more in #81 and come back to this if that option fails.

@mikemccaffrey

This comment has been minimized.

Copy link

commented Oct 10, 2013

+1 for keeping .info files in the current format. If we are going to break from Drupal 7 conventions, we should have a really good reason for making developers learn how to do things differently. Otherwise, folks might as well go learn all the intricacies of Drupal 8.

@quicksketch

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2013

Thanks @mikemccaffrey, I've found that expanding .info files for testing is not only faster, but a trivial enhancement to our existing implementation. Let's reconsider YML only if we have another legitimate need.

@DirectorHaas

This comment has been minimized.

Copy link

commented Oct 12, 2013

@quicksketch "I'm increasingly concerned about the divergence from D7, and changes like this definitely take the option of cross-compatibility off the table."

2 cents: agreed that preventing too much divergence from D7 will improve adoption rates of Backdrop (of course Overlay can go) IMO. I've harped on this some elsewhere but while Backdrop does not have to be exactly D7, the .info files being in YML is such a perceptually small change that it isn't worth many hours of work. The easier contrib modules and themes can be transferred to Backdrop, the higher the adoption rate, especially if some contrib modules will not be ported to D8, that will make all the difference depending on the needs of the site.

@mikemccaffrey

This comment has been minimized.

Copy link

commented Oct 12, 2013

@QuantumTime: "the .info files being in YML is such a perceptually small change that it isn't worth many hours of work"

It is also such a perceptually small change that no one will expect it to have changed. Imagine someone going to build a backdrop site for the first time, creating a custom module just like they do in Drupal, and having the site error even though they've done everything just like they've done it a hundred times before. Now imagine that happening a dozen times, as they hit each of our "perceptually small changes". How many speedbumps will someone be willing to hit before they give up and go back to the well-traveled road?

@DirectorHaas

This comment has been minimized.

Copy link

commented Oct 12, 2013

@mikemccaffrey Agreed, and if all the changes to documentation come into play, then more API changes also imply needed new documentation.

Nothing stops a project like the D7 version of http://pressflow.org/, for instance, from timing a long-term release statement thus moving many D7 sites (someday) over to PressFlow, or any other D7 compatible fork, before or when D9 is released.

That said, I definitely think the ideas behind Backdrop are still very good since the goal is to get as many of the D8 changes into Backdrop and as much D7 compatibility as possible. Site owners/builder should be able to get the Spark changes for instance with presumably much less cost that going to D8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.