Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for squashing and pure entries #71

Open
dagwieers opened this issue Sep 15, 2017 · 46 comments
Open

Support for squashing and pure entries #71

dagwieers opened this issue Sep 15, 2017 · 46 comments
Projects

Comments

@dagwieers
Copy link
Contributor

dagwieers commented Sep 15, 2017

Proposal: Redesign Aggregates

Author: Dag Wieers <@dagwieers> IRC: dag

Date: 2017/09/15

  • Status: New
  • Proposal type: core design
  • Targeted Release: 2.5
  • Associated PR: <link to GH PR in ansible/proposals if PR was submitted>
  • Estimated time to implement: ?

Motivation

During AnsibleFest SF 2017 some issues came to light regarding the Aggregates functionality that was designed as part of the ansible-network roadmap.

This proposal does not question the merit of this needed functionality, but some design decisions require improvements before this is ready for wider consumption.

  • Various participants questioned the naming of this functionality, as the name does not seem to adequately describe the purpose of the functionality.

    • Aggregating multiple items in a single run does not clearly indicate that existing entries may be purged
  • The interface was questioned as well, especially the following aspects of the functionality:

    • Some believe that rather than a parameter to the module, it should be a new state.

    • Some believe that this functionality should not be part of the modules, but instead be an integral part of Ansible core instead. There is no need to require a whole new interface, it could fit right into with_items, as long as the module indicate it can handle more than one item in a single run (e.g. supports_aggregation). This would then also replace the existing special cases related to yum, apt and others that already offer similar functionality.

    • Purging would be optional, you could add/remove multiple items in a single run without purging existing items. This would also be something that the module would need to indicate (e.g. supports_purging).

  • Handling failures is also something that should be considered better. If processing an item fails, should it continue with subsequent items (as with_item does). Should it also perform the purge when adding/modifying an item fails (probably not). So this requires we understand the various use-cases and problem-causes.

  • This functionality is useful in a wider context than just networking, so this should have been discussed with use-cases outside of networking only.

Participants wondered whether this functionality would have been acceptable when coming from the community and being reviewed by Ansible core developers. The conclusion was: probably not.

Problems

What problems exist that this proposal will solve?

  • Releasing and documenting this as-is (with the known problems), will make a future implementation harder to get right and requires backward compatibility.

Solution proposal

  • Redesign this functionality with other use-cases in mind
  • Redesign this with all stakeholders involved
  • Redesign this with regard to good user interface design

Documentation (optional)

Hold off with documenting and promoting this as part of v2.4, so we don't have to support this going forward in light of a user-breaking redesign.

Anything else?

The above feedback was provided during the AnsibleFest SF 2017 Network Working Group and was largely unanimous by participants.

This proposal was made to collect feedback from other participants in order to design a better user interface for this functionality before it becomes a wider standard.

@jmcgill298
Copy link

+1

@bcoca

This comment has been minimized.

@privateip
Copy link

One point of clarification, aggregates (or whatever they end up being called) are significantly more than an optimization. Aggregates allow the task to define the aggregate set of configure and only the aggregate set of configuration that should be present on a target. So if the aggregate is A, B, C and the actual target has A, B, D, then the module should add C and remove D. This is impossible to accomplish using with_* loops today and should be built into the design of the new loop feature.

@gundalow

This comment has been minimized.

@bcoca

This comment has been minimized.

@gundalow

This comment has been minimized.

@ganeshrn
Copy link
Member

ganeshrn commented Sep 22, 2017

@bcoca

net_vlan:
  vlan_id: "{{ item.vlan_id }}"
  name: "{{ item.name }}"
  description: "{{ item.description }}"
  purge: True
loop:  
  - { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
  - { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
  - { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }
loop_control:
  optimize: True

In above example with proposed loop and optimize option set to True
will the task executor send the entire list of dict under loop as a task arg to module?

So the validation of the entries within each dict (option validation) and batch processing of the list needs be handled in the module.

In case purge/append option is enabled module needs to handle deleting
all the vlan and configure only the vlan given under loop. This essentially requires module to be called only once with a full list.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 22, 2017

@ganeshrn Please don't delete all, followed by adding, that could have an impact. IMO it's better to selectively delete, update or add. Which obviously requires support by the module (which is something it should advertise). If a module supports this methodology, I don't see why someone would disable this. It should be the only modus operandi. (i.e. 'absent' skipping the add/update and 'present' skipping the delete)

In my opinion, the purge-option is in fact a state. I don't know yet what the best name would be, but next to absent and present (which only affects the items listed) that third state would affect the existing config. It wouldn't need an absent-alternative because an empty input would mean cleaning everything.

@bcoca
Copy link
Member

bcoca commented Sep 22, 2017

@ganeshrn that is one way, we probably need 2 diff modes depending on module support, basically an extension of current squashing but 'explicit' vs 'implicit' behaviour.

The other way would be to 'pretemplate' the loop and execute all of this remotely (@dagwieers i believe you proposed this already) , but this has a couple of caveats, mainly that loop items cannot depend on previous items execution (loop forks have similar issues).

@dagwieers I'm favoring the additional options because i have not been able to find a 'state' that reflects what you are saying, .. but even append does not seem to cover this well either .. best I can come up with is a declared state as an 'make sure it looks like this and ONLY like this'?

@dagwieers

This comment has been minimized.

@ganeshrn
Copy link
Member

@dagwieers Agreed, sequence of add/delete is important.
As the existing configuration is replaced with what is given in the task, the state can also be described as replace.

@bcoca
Copy link
Member

bcoca commented Sep 22, 2017

state=present supplant=False|True? .. or mabye a purge=first|last|None?

I think we all agree on function ...so we are left with UI .. mostly naming ... one of the hard parts of computing!

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 22, 2017

An additional option makes room for 4 different combinations where we only really need 3.

On top of that, we then also have to ensure that purge is not used with any other state that people may provide (e.g. query or list).

This really is a separate state, next to absent/present (which only affects what is specified). Maybe we have to introduce new terminology altogether from a different field. (Like "idempotence" comes from math)

Reading up on idempotence I stumbled on pure

If any existing configuration/state would influence the resulting state, it is not considered pure.

@itdependsnetworks

This comment has been minimized.

@bcoca

This comment has been minimized.

@itdependsnetworks

This comment has been minimized.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 22, 2017

purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.

pure however is a state both about what is specified, but also what remains. And it matches the definition in computer science.

@calfonso
Copy link

Here are the notes from the IRC working group session where we discussed this proposal. https://meetbot.fedoraproject.org/ansible-network/2017-10-11/network_working_group.2017-10-11-16.01.html

@dagwieers
Copy link
Contributor Author

dagwieers commented Nov 25, 2017

My proposal for what users interface with in Ansible is exactly the same as the new loop syntax in v2.5:

- net_vlan:
    vlan_id: "{{ item.vlan_id }}"
    name: "{{ item.name }}"
    description: "{{ item.description }}"
    state: pure
  loop:  
    - { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
    - { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
    - { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }

So state: pure means only these items need to exist and any existing item not listed will be purged.
And state: present would ensure everything exists, but in a single transaction (if supported).
And state: absent would ensure everything does not exist, but in a single transaction (if supported).

So this clearly is something we have to support in Ansible core, and add to modules over time. The argument spec would add something like: supports_squashing=True if the module supports squashing.

Obviously, this only works if the module supports such declarations in a single transaction and Ansible understands that if the module supports this, it no longer needs to run actions for individual items, but can send the whole list in a single call to the module.

This would equally works for state: present because the module would have indicated it supports "squashing". So playbooks don't need to be adapted, if modules start supporting squashing, and they have a newer Ansible with squashing support, it would start using it. If not, it works as before.

@privateip
Copy link

@dagwieers i fail to see how you support squashing with this proposal. Each iteration of net_vlan would be separate call. What argument in the module will receive the list of hashes? Can you elaborate on how this might work?

@dagwieers
Copy link
Contributor Author

dagwieers commented Nov 27, 2017

@privateip No, if the module advertises supports_squashing=True the core system will send the whole dataset at once to the module (which at that point supports this). If it does not advertise this, it works as before, iterating over every item.

The main advantage is that existing playbooks will start doing it more efficiently as soon as a module (and Ansible) supports this.

Updated to use the new terminology squashing over aggregation. I agree it's better.

@privateip
Copy link

I'm still not clear what the receiver is in this scenario. Obviously it has to be the module but what in the module. A new argument?

Also, this would require that all modules are loaded (imported) on the controller side (I'm assuming in the action plugin) prior to passing the arguments. Just trying to clarify my understanding, please correct any assumption here that is wrong

@dagwieers
Copy link
Contributor Author

dagwieers commented Nov 27, 2017

Obviously it is the module, and yes, it would be some new internal argument. Those details are part of the implementation. And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode BTW.

But it wouldn't need a specific action plugin, the default action_plugin would do this out of the box. (Modules that require its own action plugin would have to support this too if they want to support squashing, so yes it would need some supporting call for this, or if possible, part of the existing interface).

@privateip
Copy link

And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode BTW.

No, supports_check_mode is provided to the instance of AnsibleModule via _ANSIBLE_ARGS, the module code is not loaded on the controller side.

Basically this proposal moves the functionality of the current aggregates argument from being a per module option to being loop and then implements a new check to see if the module supports the feature by loading the module code on the controller side first before sending to the remote device. In other words, the functionality isn't altered. This proposal just moves some things around to feel a little more natural, which is not to say its right, wrong or indifferent, just clarifying the proposal.

@dagwieers
Copy link
Contributor Author

dagwieers commented Nov 28, 2017

No, supports_check_mode is provided to the instance of AnsibleModule via _ANSIBLE_ARGS, the module code is not loaded on the controller side.

Check plugins/action/__init__.py, it's the controller that will skip running a module in check-mode if the module does not support check-mode. So the controller knows beforehand that the module does not need to run at the remote node. This would not be different.

Update: Alright, I misinterpreted what was going on there. It seems the individual modules are run remotely and return skipped. The plugins/action/__init__.py apparently only relates to action plugins. So I guess that can be optimized now as well. No need to run them remotely in check-mode if we know they don't support check-mode anyway.

@cognifloyd
Copy link

Or, instead of as we could use remote: yes (defaulting to remote: no). That's simple.

@cognifloyd
Copy link

Also, something like exclusive: yes or inclusive: yes (not sure which makes more sense) could be used in place of purge: yes. I think that would address this from @dagwieers:

purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.

@jmcgill298
Copy link

I wanted to refrain from using words that could be used to indicate which host is doing the looping (looping happens in the module, not on the host, i.e. connection: local).

@privateip
Copy link

@dag We are no longer actively developing this feature in core and will move any implementation herein into roles on as-needed basis. Any objections to closing this proposal?

@dagwieers
Copy link
Contributor Author

I don't see how roles can solve this.

@privateip
Copy link

Roles solve this by their implementation. If I pass a set of values into a role, the implementation of the role can perform the necessary adds and removes to normalize the desired data set with the configured data set on the device. So you are correct in the sense that roles do not inherently solve this but the implementation of tasks of a role do.

Since we are no longer developing modules with an aggregates argument, that was the basis for my suggestion to close this. I realize there may still be a desire to discuss this as a future feature so if you want to keep this open for that reason that makes total sense.

@dagwieers
Copy link
Contributor Author

dagwieers commented Apr 18, 2018

@privateip Yes, I would like to keep this open, as a role-based implementation does not offer performance/efficiency improvements.

Looping over a task for hundreds of items is in most cases hard to impossible unless it is core/module-based implementation. I have several customers/users asking for this to make Ansible usable in large setups.

@cognifloyd
Copy link

cognifloyd commented Aug 29, 2018

Here's a variation of an earlier example. Here I swapped purge for inclusive (see #71 (comment)) and replaced as with layer:

- name: Reserve mgmt vlans
  net_vlan:
    name: reserved_vlan
    state: active
  loop:
    - { vlan_id: 4 }
    - { vlan_id: 5, name: mgmt }
    - { vlan_id: 6 }
  loop_control:
    inclusive: yes
    layer: task # defaults to layer: playbook

A loop layer determines the layer that is responsible for looping over a task. By default, the playbook handles the looping (ie layer: playbook), meaning that the task is repeated for each loop item. Alternately, the entire loop can be handed over to the task, allowing the task to loop over each item itself (ie layer: task). This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping.

In the case of layer: task looping ansible would have some kind of helper utilities to help the task run and report results for each loop item. It would also allow the task (be it a module or an action) to optimize the running of all of the items so that network modules or other api modules can take advantage of batch APIs or running multiple commands in one connection.

So, I think this is easy enough to explain in user facing documentation as I attempted to do above. What does everyone think?

@caphrim007
Copy link

@cognifloyd do you have any opinions on what

"This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping."

would look like?

@cognifloyd
Copy link

cognifloyd commented Aug 30, 2018

The documentation is already parsed for options, so it shouldn't be a big deal to also put some more metadata in the documentation.

DOCUMENTATION = '''
---
module: my_sweet_module
short_description: bla bla bla
version_added: "2.7"
description:
  - Describe it more completely here
supports:
  - task_layer_loop
  - check_mode
options:
  option_a:
    required: true
    type: string
  option_b:
    type: string
'''

or another option:

supports:
  - check_mode: yes
  - loop_layer: task  # defaults to playbook, as everything supports playbook layer looping

or even:

supports:
  check_mode: yes
  loop_layer: task

or maybe, if a supports list isn't appealing, something like

supports_check_mode: yes
supports_task_layer_loop: yes

I like a list/dict because it makes it feels cleaner (less verbose, but more clear) and is more extensible for future features without a lot of supports_* flags.

Note that I also included an check_mode as documenting it like this would allow skipping a task in check_mode without ever running the module. But, I'm not asking to add supports_check_mode to the documentation... It's an example of a possible extension of the doc concept. I am asking for an accessible way to internalize looping, allowing tasks (be they modules, actions, or some combination) to optimize the looping internally.

@cognifloyd
Copy link

cognifloyd commented Aug 30, 2018

I suppose we could call the two layers play and task instead of playbook and task.

loop_control:
  layer: play # the default (supported by ALL modules)
loop_control:
  layer: task
DOCUMENTATION = '''
...
supports:
  loop_layer: play # the default (current behavior)

@dagwieers dagwieers changed the title Hold off and redesign Aggregates Support for squashing and pure entries Sep 1, 2018
@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 1, 2018

@cognifloyd I think that syntax is too verbose and complex. If we expect a lot of modules over time would support squashing and pure, having to specify all that is a drag.

Another issue I have with loop: now is the fact the the playbook nor the task is looping, so it adds to the confusion. To the user we simply hand off a dataset to the module, so I would even refrain from using the loop keyword here. Bolting this on top of the loop-mechanism is wrong. (Major change of opinion since last year)

So rather than use loop or with_items my new proposal is this:

- net_vlan:
    description: Production VLAN
    state: pure
  dataset:  
    - { vlan_id: 100, name: prod_vlan_1 }
    - { vlan_id: 200, name: test_vlan_2, description: Test VLAN }
    - { vlan_id: 300, name: test_vlan_3, description: Test VLAN }

So the fact that dataset is provided, means we are sending this as a whole to the module. (How exactly is not relevant now, I would do this internally and not expose it as a parameter). At this point Ansible will know what it needs to do, there's no need here to check if the module actually supports it. (If it doesn't, we'll have to make catch that and make it clear to the user though)

This would also work with state: present and state: absent obviously.

@cognifloyd
Copy link

cognifloyd commented Sep 4, 2018

First, what does it take to move this proposal to an "accepted" state?

An alternative to the loop keyword that implies that the data is handled by the action/module itself would be fine with me (vs creating a loop mode flag that makes loops go through completely different code paths, potentially creating end-user confusion at the behavioral differences). I'm not opposed to using dataset: as that keyword.

As for state: pure I think that's something that each individual module would implement. The first few core-managed modules to adopt that state would add that state to various places in the documentation so that it can become a well-known state name that modules can reuse when they have a relevant use-case. I don't think it would require any core changes to support state: pure.

More thoughts on dataset: implementaiton in ansible core.

- name: my sweet task name
  my_sweet_module:
    this_is: a top level arg
  dataset:
    - { this_is: "an arg in a dataset entry" }

In the action/module that I'm writing (that triggered looking this proposal up last week to see if anything changed), I ended up creating something like an dataset or aggregate arg (though I didn't use the term aggregate or dataset. I used the plural of the primary identifier. ie if the primary identifier were thing I called the aggregate/dataset arg things. Or if it were vlan maybe it would be called vlans.)

Here are some questions that might have to be answered once we get to implementation (I hope asking now will not delay accepting this proposal).

  • How do we validate the args in the dataset?
  • Do we automatically merge top-level args with each entry in the dataset, or have the module access the top-level args separate from the dataset entries?
  • In merging, which takes precedence, a top-level arg, or an the same arg in a dataset entry?
  • How do we handle required args that are provided in the dataset and not as top-level args?

For my action/module, I had my primary identifier arg name and several other args including aliases. If I only specified one name (ie I didn't use the dataset) then using the top-level aliases arg was fine. However, if a list of names was supplied (ie I supplied a dataset) with a top-level aliases arg, I would want to throw a validation error as aliases must be unique (defined per item in the dataset). With a dataset, the aliases had to be supplied in the dataset, or not at all. For the other args, I merged them with my dataset so that each item in the dataset also had all of the top-level args included. Then, I looped over the now-merged-and-validated dataset to generate the external API call.

So, I would imagine that the ansible-core-provided dataset handling would handle merging the top-level task args with the dataset as well as handling basic validation. If it just merged them, and I had to add my extra 'uniqueness' constraint, that would probably be fine.

Also, for my use-case I think it makes more sense for the dataset entries to have precedence over the top-level args. Does that also make sense for the network appliance use case? Or do the network modules need to have top-level args override entries in the dataset? How do network module aggregates currently work with merging top-level and aggregate args?

@dagwieers
Copy link
Contributor Author

@cognifloyd The official decision was that this functionality is implemented using roles: #71 (comment)

But to me that is no solution for the needs I have, so that's why I'd like to first look at a representation in the playbook, then see if it could cover the different needs and look at making a basic implementation. Obviously, this impacts core (a bit), module_utils (mostly) and modules. There's no simple solution, especially for implementing state: pure which requires a complete redesign of the module.

I am not going into details of your questions, because that's going to add to bikeshedding mostly :-) But I do have some opinions. Most of the heavy lifting will be in module_utils and be as transparant to the module writer, as it is today. So parameter handling will just happen, and you'll get an object that's easy and transparant to iterate over so you can basically reuse as much of the existing logic as possible. That will be the goal.

@zyv
Copy link

zyv commented Mar 29, 2019

It would be nice to have something like this for our Bitbucket modules:

The rationale here is that these modules use API calls to manipulate individual entries, and can be significantly sped up, if the proposal by @dagwieers is accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Proposals
In Discussion
Development

No branches or pull requests