Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute API 2 #77

Closed
wants to merge 5 commits into from
Closed

Attribute API 2 #77

wants to merge 5 commits into from

Conversation

@danielsdeleo
Copy link
Contributor

@danielsdeleo danielsdeleo commented Dec 15, 2014

NOTE: This RFC is incomplete, the specification of the new feature is TBD.

This RFC describes a new attributes API to be added alongside the current one. The goals of this new API are:

  • Separate desired and observed state.
  • Structure the syntax in a way that "trainwreck" errors
    (NoMethodError for [] on nil) can be avoided.
  • Structure the syntax in a way that attribute tracing can be
    implemented simply.
  • Separate node-specific persistent storage from
    cookbook/role/environment attributes.
  • Structure the syntax in a way that can be implemented efficiently.

This RFC is adapted from #9 which is retracted.

* structure/formatting matches the standard
* rework explanation of the motivations
* Remove all breaking change stuff, this will be released as a new API
  alongside the old one.
* Remove API examples; will debate a few API style options before
  adding.
@danielsdeleo
Copy link
Contributor Author

@danielsdeleo danielsdeleo commented Dec 15, 2014

I created a gist with several syntax possibilities here: https://gist.github.com/danielsdeleo/b942599fffbc9c9df407

`do_something if node[:some_attribute_present]` would break (Ruby does
not allow custom false-y types), and monkey patching `nil` is
error-prone because `nil` is an immediate object (there is only one
instance).
Copy link
Contributor

@lamont-granquist lamont-granquist Dec 15, 2014

Another major issue with the method-missing syntax:

node.foo   # returns node['foo']
node.class # returns the Chef::Node symbol

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 15, 2014

This is now addressed in the "Method Missing Method Name Collisions" section.

Copy link
Contributor

@lamont-granquist lamont-granquist Jun 11, 2015

We might want to call this out as the reason why method chaining syntax MUST NOT be used. I actually misunderstood the problem on first reading, and I don't think its obvious to the casual reader exactly what the implications are to the API design. Any kind of node['a']['b']['c'] or node.a.b.c kind of method chaining will recreate this problem and therefore CANNOT be part of the new design.

Copy link
Contributor Author

@danielsdeleo danielsdeleo Jun 11, 2015

I didn't want to clutter this section too much, but I guess that kind of thing is what the rationale section is for.

Copy link
Contributor Author

@danielsdeleo danielsdeleo Jun 12, 2015

Ok, added a discussion in the rationale section.

Also add "method missing method name collisions" to the list of
motivations.
attributes provided by Ohai. Defining a new syntax where Ohai and
user-defined attributes are accessed by different methods makes cookbook
code clearer.

Copy link
Contributor

@tyler-ball tyler-ball Dec 15, 2014

Is there going to be some way of reading the merged attributes? Like merged_attributes[:fqdn]? Otherwise in my recipe I still have to remember which attributes are set in the system hash and which are set in the attributes hash.

EG, if I want to have some branching logic in my cookbook based on the state of a user, I have to remember whether I query that information from the system attributes or my user cookbook attributes.

Copy link
Contributor

@lamont-granquist lamont-granquist Dec 15, 2014

doing node.override[:fqdn] = 'blah' doesn't actually work today since automatic supersedes everything. pulling the system attributes out so that they don't get union'd with the other node attributes makes sense and will avoid people trying to go down that road.

i implemented a little CMDB on top of my node attrs at Rhapsody and had a one-on-one with adam about 4 years ago now where we talked about that and he directed me to using something like node.default[:cmdb][:serial_num] to be an attribute which I would treat as being higher precedence over the dmi serial_num in the automatic attributes since some of them were '0000000000' and others were '1234567890' (companies that either didn't take the time to burn their serial numbers into DMI or on RMA'd boards). that's the correct design pattern to use, and this makes it a bit more explicit that automatic attrs come from ohai and are consumed and you don't think about mutating them.

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 16, 2014

Yeah, the argument we are making is that you should have to know whether the data you're using is from Ohai (which implies it was observed and true at the beginning of the chef run), or from user-defined attributes (which are aspirational, and reflect a thing that chef may have made true or might make true in the future). In addition to semantic differences, there are the behavior differences Lamont points out, e.g., the Ohai data is intended to be read-only, which is currently implemented by giving it the highest precedence.

Copy link
Contributor

@tyler-ball tyler-ball Dec 16, 2014

Ah, I didn't know that we were making the user explicitly know which attributes come from Ohai. In that case, I totally understand keeping it separate for writing and reading.

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Dec 16, 2014

What about strings vs. symbols? That's probably a bike-shed, tho. In ruby 2.2 there will be little practical difference between short symbols/strings with symbols being GC'd and perf fixes that have gone in.

I'd argue against symbols because of having to do :'foo-bar' for strings with dashes in them and using strings removes the need to know about that workaround (favoring the newbies over the rubyists).

Providing an attributes API where the full attribute path is known when
new attributes are set would greatly ease the implementation of the
tracing feature.

Copy link
Contributor

@tyler-ball tyler-ball Dec 16, 2014

So does this mean I can no longer do something like:

mysql = attr('mysql')
mysql['version'] = '1.2'

What would attr('mysql', 'version') return then? I only set '1.2' on the hash returned from the node, so my node object wasn't actually updated, correct?

If so I think an example like this could be added. Or a line along the lines of

The object returned from the attr call will not be connected to the node. IE, updating them will not update the node's persistent state.

Or if attr is going to return a Hash/Mash, that could just be detailed when the syntax is detailed below.

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 16, 2014

I don't think that kind of hash-like syntax would be something we'd do, and I think also we'll try to make set and get go through different method calls. Again, the options I managed to think of for the syntax are here: https://gist.github.com/danielsdeleo/b942599fffbc9c9df407 but I'm sure there's a few options that didn't occur to me yet.

In addition, the details of the underlying storage could change; for example, we could implement a "flat" key value store, but have the keys be arrays, so under the covers you have:

["a", "b"] => "foo",
["a", "c"] => "bar"

That said I haven't put much thought yet into whether we should do that or just map things back to the nested hashes.

Copy link
Contributor

@coderanger coderanger Dec 16, 2014

Worth keeping RFC 18 in mind too, for the simple cases. node['foo.bar'] could be handled nicely (or node.attr('foo.bar')) but you can still drop down to a lower API that could give you back nils if you want it.

@tyler-ball
Copy link
Contributor

@tyler-ball tyler-ball commented Dec 16, 2014

This doc doesn't mention anything about precedence levels - are those staying the same? Going away entirely?

Can you add a note about how these are being affected?

@coderanger
Copy link
Contributor

@coderanger coderanger commented Dec 16, 2014

If I get a vote about precedence levels, roles and envs should only be allowed to set override, default just becomes "attributes" and is what all cookbooks use.

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Dec 16, 2014

I'd vote against messing with precedence levels again, the transition from Chef 10 to Chef 11 was way too painful.

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Dec 16, 2014

And reminder that the syntax discussion is going on here:

https://gist.github.com/danielsdeleo/b942599fffbc9c9df407

Gists aren't great for discussion because you can't comment on a line or
to a particular person, so moving this in-repo to facilitate discussion.
@danielsdeleo
Copy link
Contributor Author

@danielsdeleo danielsdeleo commented Dec 16, 2014

I moved the syntax examples in-repo so we can more easily discuss them; gists are lacking in the comment-on-specific-line and comment notification departments

@danielsdeleo
Copy link
Contributor Author

@danielsdeleo danielsdeleo commented Dec 16, 2014

@adamhjk @ranjib @lamont-granquist @coderanger if you want to restate your comments about the syntax options here I'd appreciate it. Also, any other syntaxes anyone wants to propose are welcome.

# Mass Default assignment

## Block w/ block variable, method chaining
defaults_for("mysql") do |mysql|
Copy link
Contributor

@coderanger coderanger Dec 16, 2014

All the method chaining ones look okay in isolation, but I think it would be impossibly busy to look at a whole screen of them. Just too many words, it is a lot easier to glance over punctuation than real words.

Copy link
Contributor

@tyler-ball tyler-ball Dec 17, 2014

@coderanger You're saying it is easier to read

attr('first.second.third.fourth.a').set(1)
attr('first.second.third.fourth.b').set(2)
attr('first.second.third.c').set(3)
attr('first.second.third.d).set(4)

than

defaults_for('first') do |first|
  first.defaults_for('second') do |second|
    second.defaults_for('third') do |third|
      third.defaults_for('fourth') do |fourth|
        fourth.set('a').to(1)
        fourth.set('b').to(2)
      end
      third.set('c').to(3)
      third.set('d').to(4)
    end
  end
end

right?

The two examples @danielsdeleo posted don't look different at all to me, but they don't have very deep nesting. Its more logic switches.

Copy link
Contributor

@coderanger coderanger Dec 17, 2014

Yes, I think the first one is better but still not as good as something like attr['first.second.third.fourth.a'] = 1. The set in there is "human" text but isn't part of my data so I find it distracting. I need to context switch a bit. Maybe that's something I would get used to over time, but at least right now it seems bothersome. The = is just easier to ignore, I'm trained to recognize it as syntax instead of data.

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 17, 2014

Is there an example of a cookbook that uses deep and variable nesting like that in the wild?

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 17, 2014

Also, you could use element assignment along with namespace blocks (using a block variable) like:

defaults_for("nginx") do |n|
  n["foo"] = "bar"
end

I personally disfavor the element reference stuff because it looks too much like the old syntax to me, so I think it makes transitioning to the new thing confusing.

Copy link
Contributor

@coderanger coderanger Dec 17, 2014

See https://github.com/poise/jenkins/blob/master/attributes/default.rb for an example of relatively deep nesting. The block syntax seems nice to me, better than the #tap hack some cookbooks are using the accomplish the same thing for sure. It's just the method chaining/human readable bits that I dislike.

Copy link
Contributor

@coderanger coderanger Dec 17, 2014

Also another monkeywrench is that if we are doing stuff with dotted paths, n[['foo', 'bar']] it terrrrrrible. But I don't think n(['foo', 'bar']) = 1 is allowed syntax?

Copy link
Contributor Author

@danielsdeleo danielsdeleo Dec 17, 2014

You can splat in element reference and assignment, so this will work:

thing['foo', 'bar', 'baz'] = 1

I don't know if that's terrible or not; but it's also ambiguous what n["foo.bar.baz"] = 1 means if we implemented the dots shortcut. I would be 👎 on the obeying the dot shortcut rule if we adopted syntax like that, personally, though.

Anyway, you can't do the assignment-y thing with parens, only square brackets.

Copy link
Contributor

@adamhjk adamhjk Dec 17, 2014

For the record, I'm -1 on the new syntax using square brackets at all. We
need to make it clear it isn't a hash.
On Dec 16, 2014 6:45 PM, "Dan DeLeo" notifications@github.com wrote:

In new/rfcXXX-new-attributes-api-syntax.rb
#77 (diff):

+## Array path option
+default_attr(["mysql", "version"], "1.2.3")
+
+## Multi-arg element assignment operator option:
+
+default_attr["mysql", "version"] = "1.2.3"
+
+## Method chaining:
+
+default_attr("mysql", "version").to("1.2.3")
+
+# Mass Default assignment
+
+## Block w/ block variable, method chaining
+defaults_for("mysql") do |mysql|

You can splat in element reference and assignment, so this will work:

thing['foo', 'bar', 'baz'] = 1

I don't know if that's terrible or not; but it's also ambiguous what n["foo.bar.baz"]
= 1 means if we implemented the dots shortcut. I would be [image: 👎]
on the obeying the dot shortcut rule if we adopted syntax like that,
personally, though.

Anyway, you can't do the assignment-y thing with parens, only square
brackets.


Reply to this email directly or view it on GitHub
https://github.com/opscode/chef-rfc/pull/77/files#r21949584.

@jaymzh
Copy link
Contributor

@jaymzh jaymzh commented Dec 17, 2014

One thing I'd like to see is more around handling arrays. We use a lot of arrays and the biggest problem with the current syntax is using them in arrays.

node.default['foo']['bar'] << 'thing'

Will work if cookbook_default happens to have initialized that as an array, but not otherwise. Or if it was initialized in default, but you use normal or force_default, you're boned.

So a new syntax should make this easy and obvious. Maybe:

default_attr('foo', 'bar').append('thing')

If bar doesn't exist, it's autovivified as an array.

Ideally, it'd be awesome if the autovivication step of the new syntax understood "oh, there's nothing here, but in the only other place I DO have a thing here, it's an array..." - Chef 10 would reason based on the operator and "do the right thing", but the complication of the current syntax meant it got dropped.

I also think it's important to consider that while part of the idea here is to keep people from thinking of these as real hashes, people will want to be able to shove in a hash and have it converted:

default_attr('apache', 'vips', 'www.iamcool.com') = {
  'docroot' => '/var/www',
  'SSLEnabled' => true
  'directories' => {
    '/var/www/secretsauce' => {
      'allow' => ['127.0.0.1'],
    }
  }
}

Also, consider the case where I want on ordered list of "things" that look like hashes - be they actual hashes, or real Attributes:

default_attr('mybla','alltheblas') = [
  {
    'name' => 'bigbla',
    'val' => 'doof',
  },
  {
    'name' => 'smallbla',
    'val' => 'book',
  }
]

What happens here? What does the internal representation looks like? How do I access that with the new syntax?

@jaymzh
Copy link
Contributor

@jaymzh jaymzh commented Dec 17, 2014

Another comment... Huge 👎 on the dotted syntax. It's a HUGE avenue for subtle bugs. Keys will have dots, and this will totally cause massive headaches. This will be like back when you could do node['foo'].key(bar) and because you forgot the ?, it auto-vified key, and returned a truth-y value when you should have gotten a false.

@martinb3
Copy link
Contributor

@martinb3 martinb3 commented Jan 9, 2015

👍 for Block with instance eval, method chaining. I definitely don't like seeing dots that have syntactic meaning inside strings. I also think there's a lot of cognitive dissonance with node data and the Hash data structure, and any compatibility layer will likely mean people will have to learn (a) the new data structures and (b) the quirks of any compat or translation later between the new data structures and regular hashes.

I'd also like to see arrays come back to wider use, and I think distancing the node data structures from Ruby hashes is an opportunity to make arrays as viable data structure again for node data, as @Jaymz mentioned.

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Jan 9, 2015

@danielsdeleo there needs to be some kind of statement added that RFC18 is rejected for this implementation and that RFC18 needs to be amended so that we can do this.

@adamhjk also feels very strongly from a UX perspective that square brackets should be entirely dropped because attributes aren't hashes. from a code perspective I also agree that the method chaining required by square brackets and the you-dont-know-when-it-ends problem that it creates when trying to solve problems that we want to solve with attributes means that square brackets are hard to implement consistently. so the square brackets options should be dropped. the single-square-bracket-operator-with-a-splat options are clever but not very ruby and should also be dropped.

the chef-sugar method_missing alternative also needs to get dropped, nobody wants that one either and it conflicts with one of the stated goals that we're trying to avoid.

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Jan 9, 2015

I'd also like to state that symbols will not be supported in the new API and only strings will be supported.

And the clear tie-breaker there is 'build-essential' just works vs. :build-essential failing and requiring the user to know magic to do :'build-essential'

@adamhjk
Copy link
Contributor

@adamhjk adamhjk commented Feb 19, 2015

General consensus is strings only.

@jaymzh
Copy link
Contributor

@jaymzh jaymzh commented Mar 8, 2015

Any thoughts here on handling arrays, as I mentioned above? It seems an entire usecase dropped by this API at this point, which makes it a no-go for lots of people...

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Mar 8, 2015

In a way i kind of like the idea of just making this work:

node.default['foo']['bar'] = []
node.override['foo']['bar'] << 'thing'

So that when you have to autovivify an attribute it walks back up the precedence tree and creates a new hash or array if it finds one at a lower precedence level. Helps to keep the schema consistent.

Obviously, insert whatever the new syntax turns out to be...

@jaymzh
Copy link
Contributor

@jaymzh jaymzh commented Mar 9, 2015

I agree 100%, @lamont-granquist I'd love something like that. I just don't want to ratify a new syntax that doesn't take this into account, and the current syntax assumes key-values at every level, and nothing array-like.

@adamhjk
Copy link
Contributor

@adamhjk adamhjk commented Mar 9, 2015

Lets get @kallistec to weigh in on arrays

Sent from my iPhone

On Mar 8, 2015, at 6:41 PM, Phil Dibowitz notifications@github.com wrote:

I agree 100%, @lamont-granquist I'd love something like that. I just don't want to ratify a new syntax that doesn't take this into account, and the current syntax assumes key-values at every level, and nothing array-like.


Reply to this email directly or view it on GitHub.

@danielsdeleo
Copy link
Contributor Author

@danielsdeleo danielsdeleo commented Mar 9, 2015

In the method chaining syntax it should be pretty straightforward to make array append do what you mean; e.g., in the given example default_attr('foo', 'bar').append('thing') the DSL owns the object that's getting #append called on it, so it can infer you want an Array. We could alias that as #<<, as well. One question though, is how these should handle conflicts; if I already have a non-array value, does an append operation nuke it, or raise an error? I'm partial to the latter as I think 99% of the time it would be a user mistake, but could be persuaded otherwise.

As for the array of hashes thing, I'll have to think about that. I will note that every example of it that I've seen can be implemented as a map; in this example, for instance:

[
  {
    'name' => 'bigbla',
    'val' => 'doof',
  },
  {
    'name' => 'smallbla',
    'val' => 'book',
  }
]

The "name" attribute is probably unique so you could do this instead:

{
  'bigbla' => {
    'name' => 'bigbla',
    'val' => 'doof',
  },
  'smallbla' => {
    'name' => 'smallbla',
    'val' => 'book',
  }
}

@jaymzh
Copy link
Contributor

@jaymzh jaymzh commented Mar 9, 2015

I think it should raise an error for sure. Data-type conflicts are totally user error.

To your other point, there's no reason to limit what people put in an array. The reason I brought up this example, is because if it's inside an array, given this new syntax, does the hash inside that get treated like a hash or like an attribute? For example, can I do:

attr('foo', 0, 'bar') = 'thing'

Because that's keeping parity with what we have now (and I need). Having to do something like this would suck:

elmt = attr('foo')[0]
elmt['bar'] = 'thing'
attr('foo')[0] = elmt

@timurb
Copy link

@timurb timurb commented Jun 3, 2015

Setting is really great (± rspec-like syntax) but dotted notation for getting is what I’d really miss.

attr('nginx', 'upstart', 'foreground') is much heavier to read and type than attr('nginx.upstart.foreground') especially when you use 3+ components in the attribute name.

By the way what is the point in differentiating keys containing dots to keys not containing them?
Why would you want to have attr('foo', 'bar') meaning something different to attr('foo.bar')?
Is there any reason not to treat them the same?

@danielsdeleo
Copy link
Contributor Author

@danielsdeleo danielsdeleo commented Jun 3, 2015

Whenever the attribute is a hostname, IPv4 address, or used by another system that uses dots to separate keys. For example, on my Mac, everything reported by ohai network/settings is a key with dots.

@tyler-ball
Copy link
Contributor

@tyler-ball tyler-ball commented Jun 3, 2015

An alternative is to write attr(%w{nginx upstart foreground}) - I think this is as easy to read as the dotted notation

@adamhjk
Copy link
Contributor

@adamhjk adamhjk commented Jun 4, 2015

and in case it didn't make it into the thread - I've vetoed the dot
notation. We're removing ambiguity, not keeping it.

Love,
Adam

On Wed, Jun 3, 2015 at 12:26 PM, Tyler Ball notifications@github.com
wrote:

An alternative is to write attr(%w{nginx upstart foreground}) - I think
this is as easy to read as the dotted notation


Reply to this email directly or view it on GitHub
#77 (comment).

@timurb
Copy link

@timurb timurb commented Jun 6, 2015

@danielsdeleo you are right, we'll have a problem with iterating over dotted keys when you need to extract a subkey of that, like attr('my_hostnames').each { |h| h['someattr'] } -- this would be hard to implement when attributes are directly mapped to a dotted notation.

By the way, what will be a way to handle such cases with the new syntax? Will it be like I've written here or is it going to be some different syntax?

@timurb
Copy link

@timurb timurb commented Jun 6, 2015

... or attr(h, 'someattr') to avoid referring to them as hashes?

@lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Sep 11, 2015

Another interesting bug with node attributes is that they get round-tripped to JSON which can be problematic. A foodcritic user has submitted a bug report to add checking for using symbol values in node data and comparing node data against symbol values:

Foodcritic/foodcritic#286

Catching that is actually easier to do in code and not static analysis by not allowing the API to set a value that wasn't valid JSON and couldn't be roundtripped through JSON and return as itself.

lamont-granquist added a commit that referenced this issue Apr 13, 2016
- We've never done most of what we agreed to do.

- We argued out a few points in #77 that we should capture

    1.  we decided that functional arguments foo("bar", "baz") are
        better than method chaining, and this was incorporated into
        the Chef 12 attributes updates.

    2.  we decided on strings over symbols in the arguments.

    3.  we also decided that RFC #18 needed to be amended because
        the implications of foo("bar.baz") were poor.

- This also argues for the addition of an option (-F offhand blindly
  copied from awk, dunno if that'll really work) to deal with
  cases where '.' as a field separator causes issues -- because typing
  arrays on the command line is incredibly poor UX and nobody has
  bothered to implement it.

- This officially demotes the dot-syntax to a commandline workaround,
  and pushes *argv style / array formatting to the forefront for
  ruby APIs.
@tas50 tas50 closed this Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10 participants