Skip to content
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.

Core Change: Installation Metadata #7784

Merged
merged 0 commits into from Nov 13, 2011
Merged

Conversation

Sharpie
Copy link
Contributor

@Sharpie Sharpie commented Sep 23, 2011

This stack of changes adds metadata support and uses that metadata to preserve installation options through brew upgrade

Summary of Core Changes

  • A new method, filtered_args, is added to the FormulaInstaller class. The installer calls filtered_args during build to create the ARGV that array that is used when the installer forks to compile a formula. filtered_args address issue brew install --HEAD should not install --HEAD deps #7724 but also provides room for future manipulation of the ARGV vector that formula build with.
  • The build method of FormulaInstaller now writes out a HOMEBREW_MANIFEST file that contains:
    • The ARGV options that triggered options from a formula.
    • The unused formula options.
      This installation receipt is a simple YAML file and a new class called Tab is provided to manage the creation of receipts and the extraction of data from existing receipts.
  • Now that some metadata is recorded, filtered_args loads the Tab from previous installs (referencing LinkedKegs) and appends used_options to ARGV. This addresses issue "brew upgrade" should remember install options #5250 and makes installation options persist through brew upgrade

I like this metadata implementation because it requires no change to the way Formula works. From the perspective of formula files, options are being passed through ARGV the same as it ever was.

Recording installation metadata should support other efforts to improve Homebrew options and dependencies:

Comments and suggestions are appreciated.

args.concat previous_install.used_options
args.uniq! # Just in case some dupes were added

args.delete_at args.index '--HEAD' if args.build_head? and not explicitly_requested?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be clearer IMO:

args.delete '--HEAD' if args.build_head? and not explicitly_requested?

Technically it removes all '--HEAD' but as you made it uniq just before it will not change a thing.
It is a bit less efficient (average is n compared to n/2 elements comparisons), but not relevant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. How did I come up with delete_at instead of delete when reading the Array docs?

Thanks for pointing this out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pleasure :)

@Sharpie
Copy link
Contributor Author

Sharpie commented Sep 30, 2011

@mxcl @MikeMcQuaid @adamv @jacknagel

Thoughts on this? I've been playing around with it for a couple of weeks and it seems to fix #7724 and #5250 and should lay a foundation for future work on dependency resolution.

@Sharpie
Copy link
Contributor Author

Sharpie commented Sep 30, 2011

Allright, I've expanded the range of filtered flags to:

%w[--HEAD --verbose -v --debug -d]

I left --devel out because we haven't implemented it yet and I'm not sure how it will interact with -d.

@Sharpie
Copy link
Contributor Author

Sharpie commented Sep 30, 2011

Also added --interactive to the list. The action of some flags, such as verbose, isn't entirely suppressed as the filtered args only affect the build step---so symlink commands are still printed.

@mxcl
Copy link
Contributor

mxcl commented Oct 2, 2011

Glad this is being worked on. I have not reviewed the code yet, and would like to make some points in case they are not the case etc.

  • The manifest file should be stored in #{HOMEBREW_PREFIX}/Library/Someplace because the installation options are specific to that installation of Homebrew.
  • The manifest file should probably be stored in a versioned filesystem database so that we see what installed versions have which options specified.

Maybe you already did these things, it looks like you were thorough, just thought I'd better mention them.

So maybe: #{HOMEBREW_PREFIX}/Library/Metadata/foo-1.2.yaml.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 2, 2011

Currently the manifest file is written into the installation prefix (Cellar/<formula>/<version>) as INSTALL_RECIPT.yml. I could write it into a separate directory, but right now I'm not convinced that we should create another versioned filesystem when the Cellar already contains one. An added bonus is that LinkedKegs points to the "currently active version".

@mxcl
Copy link
Contributor

mxcl commented Oct 2, 2011

Yes you are right, what you have done is better. Though the capitals seem ungainly.

@jacknagel
Copy link
Contributor

So there isn't any filtering done on the recording side, right? It basically records the entire install command line, and then the filtering is done when loading the options from the manifest?

What about filtering --ignore-dependencies, then?

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 2, 2011

So there isn't any filtering done on the recording side, right? It basically records the entire install command line, and then the filtering is done when loading the options from the manifest?

Not exactly---the entire command line is not recorded. The key code is in the class method Tab.for_install:

  def self.for_install f, args
    arg_options = args.options_only
    formula_options = f.options.map { |o, _| o }

    Tab.new :used_options => formula_options & arg_options,
            :unused_options => formula_options - arg_options,
            :timestamp => Time.now,
            :tabfile => f.prefix + 'INSTALL_RECEIPT.yml'
  end

That pulls the options out of the command line and then extracts the option flags from the formula's options array. The flags passed on the command line are then used to split the extracted option array into two pieces---a list of options used and a list of options that were not used.

So, the bottom line is that if the flag is not included in def options, it won't be recorded in the receipt. I figure we can detect things like --HEAD and --devel by inspecting version numbers.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 2, 2011

Yes you are right, what you have done is better. Though the capitals seem ungainly.

I'm definitely open to a different name for the receipt file. I chose all caps to make it seem important and so that it would show up near the top of ls output.

@jacknagel
Copy link
Contributor

Oh, I see what's happening now. Obviously I just need to RTFcode.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 2, 2011

One thing we need to decide on is what to do when there is no receipt for a formula. Currently, if INSTALL_RECEIPT.yml does not exist, the Tab class just forges a receipt that states the formula was installed with no options.

This isn't very good behavior, but I did it in order to allow this code to be dropped into existing Homebrew installs without causing a ruckus.

@MikeMcQuaid
Copy link
Member

Can't we store the "manifest file" in the Cellar? Does it need to be anything more than a CR separated list in a dot file? Simple solutions win. Need to look at this properly later but thought I'd ask the stupid questions now.

@jacknagel
Copy link
Contributor

FWIW, I like the capitalized filename for the exact reason @Sharpie noted.

@jacknagel
Copy link
Contributor

And I think yaml will ultimately prove to be easier to deal with in the future, e.g. if we add a new field to the metadata, etc.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 3, 2011

One more reason to store the manifest in a file in the installation prefix rather than in another directory or some centralized database such as a file in the Cellar is that it makes it easy to add metadata to bottles---zero coding required.

Also, it seems to me that a central database would require us to implement several things:

  • Implement serialization/deserialization (YAML loading/dumping of Ruby objects is handled by standard libraries---almost zero implementation details and dumping automatically reacts to changes in class definitions)
  • Prune old metadata, ensure new metadata does not already exist (with YAML files, already handled by the same machinery that manages Kegs)
  • With a central database, need some sort of global locking mechanism to ensure multiple brew programs don't write at once...

At first glance, a central database seems more complicated to me---but I'm willing to consider it.

@MikeMcQuaid
Copy link
Member

I do get that we get a lot from a centralised database, it just seems a little over engineered to me. Personally I always think the simplest solution for the current problem is the best and then you can further improve it if need be. I'm not convinced by the need for metadata or global locking. It feels quite non-Unixy to me. Sorry if I sound like a bit of a dick, I'm just really sensitive to complexity. I also need to RTFcode properly.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 3, 2011

I do get that we get a lot from a centralised database, it just seems a little over engineered to me.

I think we may be confusing each other here---I am not advocating a centralized database, but I thought that you were.

Personally I always think the simplest solution for the current problem is the best and then you can further improve it if need be. I'm not convinced by the need for metadata or global locking. It feels quite non-Unixy to me. Sorry if I sound like a bit of a dick, I'm just really sensitive to complexity. I also need to RTFcode properly.

I'm with you on this one---I never liked the idea of storing metadata because I thought it would complicate brew too much. However, it looks like we need some sort of metadata to have smarter dependency resolution and option permanence. So I figured I would take a shot at an implementation to see if I could come up with something simple enough that I wouldn't hate it.

@MikeMcQuaid
Copy link
Member

Apologies, yes, we agree on the database (misread you).

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 11, 2011

@mxcl @MikeMcQuaid @adamv @jacknagel

I would like to merge some sort of metadata support this weekend if possible---it is blocking other important items like optional dependency resolution and multi-repo support.

{
:used_options => used_options,
:unused_options => unused_options,
:timestamp => timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the file timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This can be taken out.

@MikeMcQuaid
Copy link
Member

No major objections here, posted a few minor quibbles.

@jacknagel
Copy link
Contributor

Looks good to me, and I will reiterate my support for the single YAML file vs. flat text files.

I see the appeal of newline separated lists but I think if we want to be able to add new metadata (e.g. tagging kegs that were installed from other repositories via brew-tap, &c.), YAML is the way to go so that we (a) can easily support legacy installs that don't have every field and (b) don't end up with ten different metadata files in every keg.

@MikeMcQuaid
Copy link
Member

I don't see the disadvantage of having multiple files, personally. If we have ten suggested pieces of metadata I'd perhaps agree but for 2-3 I don't. I'm not blocking this though, if everyone disagrees, feel free to merge as-is :)

@jacknagel
Copy link
Contributor

If we have ten suggested pieces of metadata I'd perhaps agree but for 2-3 I don't.

Well, we only need to record 2-3 things now, but if we decide we want to record other things in the future, we might wish we had done it differently. And then if we want to change, we still have to support legacy installs. But I'm just repeating myself now so I'll quit harping on it.

On a different note, it would be nice to get at least the options filtering merged soon. --HEAD being passed to deps is starting to get old.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 19, 2011

Well, we only need to record 2-3 things now, but if we decide we want to record other things in the future, we might wish we had done it differently. And then if we want to change, we still have to support legacy installs. But I'm just repeating myself now so I'll quit harping on it.

My feelings exactly. I know that I am planning to use the metadata files in the multi-repository code to flag installs that did not come from the master repository. I don't think flat delimited files would work for long---eventually things will get complicated enough that we will end up writing something that is much, much more complicated than just calling YAML.load_file.

I would be willing to consider an alternate format such as JSON or INI if portability is an issue---but YAML is attractive because it is in the standard library and any Ruby object can be serialized to YAML.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 19, 2011

@adamv @mxcl

Any objections? I'll shoot for merging this Friday if nothing comes up.

@MikeMcQuaid
Copy link
Member

I find YAML unattractive because I don't use Ruby outside of Homebrew and haven't seen many parsers or much use for it outside Ruby development. Newline separated files are very easy to parse in any language and seem more Unixy to me. I don't see the need for a structured language for two types of date. Hell, you could do it on one with:
+an-used-option
-an-unused-option

Personally I'm dubious about the argument that "we might need this later". If you can suggest at least a couple more concrete things we will definitely need in the future then that's valid reasoning, otherwise it strikes me as over engineering.

As said, I'm not overruling or anything here, just want to voice perhaps a different opinion. This is still good work and you guys are still pros but we'll see what @mxcl or @adamv say.

@eregon
Copy link
Contributor

eregon commented Oct 19, 2011

mikemcquaid:
"Personally I'm dubious about the argument that "we might need this later". If you can suggest at least a couple more concrete things we will definitely need in the future then that's valid reasoning, otherwise it strikes me as over engineering."

Well, I guess the last item in the description might be a such a concrete thing:
"Pull request #7643: For multi repo support, it will be necessary to flag kegs that were created by non-standard formula to help with bug reports."

I believe it will soon be a mess if you use a custom format for more than a single information (and you know how annoying unix config files are with all different formats).
I would not like many files either.

The argument about libyaml not-so-widespread would be very strong if it added a dependency. But a YAML parser is included in Ruby and it seems they are numerous YAML implementations (from yaml.org). However, we might get these inconsistencies about dates between syck and psych...

@MikeMcQuaid
Copy link
Member

I don't find unix config files annoying, I usually find them very easy to parse from any language. How do I parse YAML from Bash?

@eregon
Copy link
Contributor

eregon commented Oct 19, 2011

mikemcquaid: If it's a simple Array of String, you could just grep '^- ' | cut -c '3-'
If it is more complicated, it shows the need for a more complete format IMO.

@jacknagel
Copy link
Contributor

We could use the git-config machinery to store settings, since git ignores config settings that it doesn't understand:

# format: brew.FORMULA.setting
$ git config --add brew.python.option "--framework"
$ git config --add brew.python.option "--universal"
$ git config --get-all brew.python.option 
--universal
--framework

# did this come from an -alt repo?
$ git config --bool brew.python.alt false
$ git config --get brew.python.alt
false

This machinery can even be used without touching .git/config; the --file option lets one specify a config file. So we could have a single config file for all of the kegs, or keep the "one file per keg" and the above would become

# format: brew.setting
$ git config --file /path/to/Cellar/python/2.7.2/HOMEBREW_MANIFEST --add brew.option "--framework"
$ git config --file /path/to/Cellar/python/2.7.2/HOMEBREW_MANIFEST --add brew.option "--universal"
$ git config --file /path/to/Cellar/python/2.7.2/HOMEBREW_MANIFEST --get-all brew.option 
--universal
--framework

# did this come from an -alt repo?
$ git config --file /path/to/Cellar/python/2.7.2/HOMEBREW_MANIFEST --bool brew.alt false
$ git config --file /path/to/Cellar/python/2.7.2/HOMEBREW_MANIFEST --get brew.alt
false

Another reason to use --file is that it doesn't rely on the Homebrew git repo having been initialized already.

@Sharpie
Copy link
Contributor Author

Sharpie commented Oct 20, 2011

@MikeMcQuaid

I find YAML unattractive because I don't use Ruby outside of Homebrew and haven't seen many parsers or much use for it outside Ruby development. Newline separated files are very easy to parse in any language and seem more Unixy to me. I don't see the need for a structured language for two types of date. Hell, you could do it on one with:
+an-used-option
-an-unused-option

Personally I'm dubious about the argument that "we might need this later". If you can suggest at least a couple more concrete things we will definitely need in the future then that's valid reasoning, otherwise it strikes me as over engineering.

As said, I'm not overruling or anything here, just want to voice perhaps a different opinion.

As far as parsing config files goes, open('file').read.split for newline delimited files and YAML.load_file 'file' are pretty much the same amount of complexity. However, it seems to me that the newline delimited file could store options prefixed with + or - and then one other piece of data that did not have those prefixes. Adding more pieces of information will quickly make the parsing code for the newline file more complicated than open('file').read.split.

Given two Ruby implementations of equal complexity, I would rather take the one that offers more flexibility as any significant changes to the metadata system could be painful.

However, I do appreciate the value of a format that is easily readable by other tools and would like to set up a solid system the first time around (because future changes will be painful). Perhaps we could do one of the following:

  • Use YAML, but extend a command like brew info such that brew info <formula> --used-options prints the used_options section in newline-delimited format.
  • Use another format like JSON for which a parser exists in nearly every programming language. It looks like multi-repo support may require us to vendor a JSON library in order to talk with the GitHub API---we could get some extra bang for our buck and use the library to serialize the metadata files.
  • Use git config. I was not aware of the --file option and this seems like a very clever way to have Git do some of our work for us once again. I like this idea---the only issue I can think of is that working with metadata would require git to be installed.

@jacknagel
Copy link
Contributor

I like this idea---the only issue I can think of is that working with metadata would require git to be installed.

Yeah, though we already require git for brew update to work anyway, so that covers everyone that has Homebrew installed already, and since git is included in XCode 4, it's becoming somewhat of a rarity for new Homebrew installs to be "git-less".

So we could either skip the metadata recording if git isn't installed or error out and tell the user to install it first.

@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 10, 2011

Just came back to this after letting it sit for a while. I thought about using Git config files but the problem is that encode and decode methods have to be written that translates a Ruby object to system calls and back.

I don't want to write and debug these---I would rather use a library that takes care of the serialization for me.

So, I switched the storage format to JSON and vendored a copy of the multi_json gem to handle the encoding and decoding.

Pros:

  • JSON is so widespread that just about any programing language worth programming in has access to a native parsing library. Even bash.
  • Since GitHub switched to a JSON-only API, it makes sense to add a JSON library to Homebrew's bag of tricks. Multi-repo support in pull request Core Change: Multi-repository support #7643 will require one.
  • Any web frontend that gets written for Homebrew will have easy access to metadata.

Cons:

  • Ruby's YAML emitter produces nicely pretty-printed output. The JSON encoder barfs out one long string. Cmmand line tools like jsonpretty probably makes this a moot point for anyone who cares.
  • Loss of flexibility in which objects can be serialized. For example, the YAML emitter can handle Ruby Time objects but JSON cannot. This may be a good thing as it means the data that can be stored will be simple enough to be portable to other environments.

Let me know what y'all think---I can always roll back to YAML if this doesn't seem like a good idea.

@MikeMcQuaid
Copy link
Member

Seems pretty reasonable to me, go for it!

@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 10, 2011

Allright.

@adamv @mxcl @jacknagel

Ok, last call for comments on this one---I'll pull the trigger this weekend if nothing comes up.

For cereal this time.

@@ -127,6 +129,9 @@ class FormulaInstaller
data = read.read
raise Marshal.load(data) unless data.nil? or data.empty?
raise "Suspicious installation failure" unless $?.success?

# Write an installation recipt (a Tab) to the prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/recipt/receipt/

@jacknagel
Copy link
Contributor

Gave it a quick test, didn't have any issues.

We should probably teach brew info to display used_options for installed formula.

@Sharpie Sharpie merged commit 7148211 into Homebrew:master Nov 13, 2011
@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 13, 2011

Merged.

Thanks to everyone for the comments and suggestions.

@jacknagel
Copy link
Contributor

@Sharpie urgent

I get this backtrace on every install now

Error: uninitialized constant Yajl::ParseError
Please report this bug:
    https://github.com/mxcl/homebrew/wiki/checklist-before-filing-a-new-issue
/usr/local/Library/Homebrew/vendor/multi_json/engines/yajl.rb:7
/usr/local/Library/Homebrew/vendor/multi_json.rb:50:in `require'
/usr/local/Library/Homebrew/vendor/multi_json.rb:50:in `engine='
/usr/local/Library/Homebrew/vendor/multi_json.rb:10:in `engine'
/usr/local/Library/Homebrew/vendor/multi_json.rb:72:in `encode'
/usr/local/Library/Homebrew/tab.rb:59:in `to_json'
/usr/local/Library/Homebrew/tab.rb:66:in `write'
/usr/local/Library/Homebrew/formula_installer.rb:134:in `build'
/usr/local/Library/Homebrew/utils.rb:225:in `ignore_interrupts'
/usr/local/Library/Homebrew/formula_installer.rb:126:in `build'
/usr/local/Library/Homebrew/formula_installer.rb:54:in `install'
/usr/local/Library/Homebrew/cmd/install.rb:89:in `install_formulae'
/usr/local/Library/Homebrew/cmd/install.rb:86:in `each'
/usr/local/Library/Homebrew/cmd/install.rb:86:in `install_formulae'
/usr/local/Library/Homebrew/cmd/install.rb:24:in `install'
/usr/local/bin/brew:83:in `send'
/usr/local/bin/brew:83

@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 13, 2011

The multi_json gem will try to load a few compiled JSON implementations before falling back on an included copy of ok_json which is a Ruby-only implementation.

I bet what is happening is that you have an older copy of Yajl installed that it is trying to use but failing to load. I'm away from my computer at the moment, but when I get back I will hard-wire the library to ok_json because parser speed isn't an issue for us.

@jacknagel
Copy link
Contributor

Thanks.

@jacknagel
Copy link
Contributor

Hmm. It actually succeeds about 10% of the time, so I'm not sure what is going on.

@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 14, 2011

@jacknagel

Does the patch in pull request #8574 fix the issue?

@jacknagel
Copy link
Contributor

It appears so.

@Sharpie
Copy link
Contributor Author

Sharpie commented Nov 14, 2011

Allright. Patch applied.

@Homebrew Homebrew locked and limited conversation to collaborators Feb 16, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants