Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hold off and redesign Declarative Intent #70

Closed
dagwieers opened this issue Sep 15, 2017 · 50 comments
Closed

Hold off and redesign Declarative Intent #70

dagwieers opened this issue Sep 15, 2017 · 50 comments
Labels

Comments

@dagwieers
Copy link
Contributor

dagwieers commented Sep 15, 2017

Proposal: Redesign Declarative Intent

Author: Dag Wieers <@dagwieers> IRC: dag

Date: 2017/09/15

Motivation

During AnsibleFest SF 2017 it was clear that the Declarative Intent functionality that was designed as part of needed ansible-network roadmap is flawed in a few areas.

This proposal does not question the merit of this needed functionality, but some design decisions require improvements before this is ready for wider consumption.

  • The implementation diffuses the configuration namespaces with the expectation declaration, which is going to be very confusing. Not just because some of these parameters are used elsewhere for configuration, but simply because everything is a top-level module parameter. This is bad user interface design. The remark that the documentation will make clear what parameters are part of the configuration declaration and what parameters are expectations does not make this acceptable.

    The example from the documentation:

    - name: configure interface
      net_interface:
        # Declared configuration
        name: GigabitEthernet0/2
        description: public interface configuration
        enabled: yes
    
        # Intended state
        state: connected
        txrate: 10gb

    So state here is an expected state of the interface, and tx_rate is the expected transmit rate, not and actual declarative configuration or even an intention as such. A better interface would be:

    - name: configure interface
      net_interface:
        # Declared configuration
        name: GigabitEthernet0/2
        description: public interface configuration
        enabled: yes
    
        # Expected state
        expect:
          link: connected
          tx_rate: 10gb
  • Declarative intent seems to indicate we are declaring our intentions. But that's not what it does, it declares expectations rather than intentions. If we look at the above examples the items listed are not (as documented) the "Intended state", but rather the "Expected state" an interface is in.

    "Intended state" seems to imply it is something we can control, where "expected state" better describes what it is. An intention is something one contributes to, an expectation is something we consider getting back. So the functionality a misnomer in itself.

    From the documentation:

    Declarative intent modules also take a set of options that allow playbook designers to specify the desired or intended operational state of the resource. These options allow tasks to define the normalized operating state of the resource once the configuration has been completed. State options do not make configuration changes to the resource. They serve to validate that a given resource has normalized its operating environment to reflect the intended state. If the operating state does not reflect the intended (or desired) state of the resource as described in the playbook task, the module is considered failed and remediation actions can be taken in the playbook during the playbook execution.

  • There is another problem with doing both configuration and expectation checking in a single module. If one of both operations fail, how would we know if the configuration failed, or the expectations were not met. And should a failed expectation be a module failure or not ? I can see different use-cases for testing expectations and as such, it would be more versatile of expectations would be handle by special tasks rather than being part of the existing modules.

    Or maybe the core Ansible product should understand the difference between config and expectation and could be told to either ignore/warn on expectation failures, or have a different failure path. Rather than diffuse the handling of the module intrinsically. Maybe you don't want to run expectation tests in some runs (e.g. production updates), but you want them in other runs (like integration tests, or compliance testing).

  • This functionality is useful in a wider context than just networking, so this should have been discussed with use-cases outside of networking only. And this maps with the existing assert module and registering the return values. If we agree that doing a validation check as part of the call, we could leave this up to Ansible core to do the test (and take the appropriate action):

    - name: configure interface
      net_interface:
        name: GigabitEthernet0/2
        description: public interface configuration
        enabled: yes
      expect:
        link: connected
        tx_rate: 10gb

    This then would make writing integration tests easier, and ensures that other modules would automatically have this functionality without requiring any special modifications. This is a much stronger use-case than the original implementation.

Problems

What problems exist that this proposal will solve?

  • Releasing and documenting this as-is (with the known problems), will make a future implementation harder to get right and requires backward compatibility.

Solution proposal

  • Redesign this functionality with other use-cases in mind
  • Redesign this with all stakeholders involved
  • Redesign this with regard to good user interface design

Documentation (optional)

Hold off with documenting and promoting this as part of v2.4, so we don't have to support this going forward in light of a user-breaking redesign.

Anything else?

The above feedback was provided during the AnsibleFest SF 2017 Network Working Group and was largely unanimous by participants.

This proposal was made to collect feedback from other participants in order to design a better user interface for this functionality before it becomes a wider standard.

Also collecting pros/cons for each solution in https://public.etherpad-mozilla.org/p/ansible-proposal-70 Please add your thoughts here and they will be rolled into this proposal doc by @gundalow

@dagwieers dagwieers changed the title Hold off and redesign declarative intent Hold off and redesign Declarative Intent Sep 15, 2017
@jmcgill298
Copy link

+1

I agree with the language of intended vs expected and that there needs to be great consideration given to how these two are integrated.

@bcoca
Copy link
Member

bcoca commented Sep 15, 2017

i think this is a bit too much of a semantic discussion 'expectations' vs 'intentions' ... you 'expect' things to result as you 'intend'.

In any case this already boils down to what the main project has always advocated, 'declare' vs 'mandate' or that we phrase options and develop actions in a way that is declarative over the traditional imperative.

So anything in a 'state=present' is expected AND intended to have those values, i see no need for an 'expect' extra keyword as all modules should already 'declare expected/intended state'.

@itdependsnetworks
Copy link

@bcoca This is why this discussion needs to happen. Configuration state is not operational state, and thus the confusion. Please do not dismiss this as semantic discussion.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 15, 2017

@bcoca: Except that in this case state and txrate are outside of our control. We cannot control them, hence, it is not an intention. The documentation states "Ansible is not a monitoring tool" but that's what these parameters are really for, to monitor if these values are exactly as we expect them to be.

(And yes, the use of state is highly confusing in these examples, which makes it even worse overall, but a good example of all the things that are wrong with this functionality.)

@bcoca
Copy link
Member

bcoca commented Sep 15, 2017

@itdependsnetworks that is the one part I see having merit as 'intended' vs 'active' state is normally not as prevalent in servers as it is in appliances, i.e. normally we restart services as soon as possible to remove the difference.

@dagwieers if you are using parameters for comparison and not in an effort to affect the actual state, I would argue that is an incorrect usage of state, which is declarative.

@dagwieers
Copy link
Contributor Author

@bcoca So you agree with me, without knowing it :-)

There are 3 options:

  • "declarative expectations" do not belong inside Ansible (we don't want this functionality)
  • "declarative expectations" do not belong inside configuration modules (we don't want to mangle config and tests)
  • "declarative expectations" require new infrastructure inside Ansible core (we want the functionality, but need to adapt Ansible to understand this as a new concept altogether)

Now it is bolt-on without understanding the consequences, long-term vision or cross-domain insights.

@itdependsnetworks
Copy link

@bcoca I think the networking space is largely different, and your word carries a lot of weight, so I expect "not as prevalent in servers as it is in appliances" to be true often. Whether it is this or calling filters that I use on the daily as "extremely specialized" ultimately can undermine ansible producing the best product possible.

@dagwieers
Copy link
Contributor Author

@bcoca From the DI documentation

Declarative intent modules also take a set of options that allow playbook designers to specify the desired or intended operational state of the resource. These options allow tasks to define the normalized operating state of the resource once the configuration has been completed. State options do not make configuration changes to the resource. They serve to validate that a given resource has normalized its operating environment to reflect the intended state. If the operating state does not reflect the intended (or desired) state of the resource as described in the playbook task, the module is considered failed and remediation actions can be taken in the playbook during the playbook execution.

So it's more like an included assert into the module. We might as well do:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  assertions:
    link: connected
    tx_rate: 10gb

And make this directive part of every module. Would make writing integration tests easier :-) (i.e. no need to register and program conditionals in failed_when)

@dagwieers
Copy link
Contributor Author

@itdependsnetworks I don't agree, even on servers there are states that we cannot influence directly. The state of a cluster, the network link or negotiated speed, ... anything that is a given and could be changed outside of our control is worthwhile to validate. The question is if this should be some special parameter, or a return value that we can check as part of the task, or a separate assert-task (as we do with integration tests).

@bcoca
Copy link
Member

bcoca commented Sep 15, 2017

I think my point was missed, that we are getting stuck in the semantics vs the 2 real differences we have here.

  • The traditional modules can be split in 'declarative' (i.e file) and 'imperative' (shell), and 'information/fact gathering' . network is trying to introduce a another type which is 'check against' ... which i would argue the declarative in 'check mode' or the gathering with a local comparison can already accomplish. I find the 'declarative intent' phrasing unfortunate as it blurs the lines between the traditional modules and the new intended feature.

  • network devices have a much bigger lag between configured state and active state, which is sometimes done on purpose (I don't think this one was intended as part of the discussion above, but I think it required clarification also before going forward).

@jmcgill298
Copy link

@bcoca I think the people joining on this thread so far in agreement that there is an issue with creating a "check against" type and are trying to find a better solution that

  1. treats configuration separate from operation
  2. makes it easier for users to not only ensure configuration was made, but ensure that configuration resulted in the correct operational state

Not to give any preference to @dagwieers suggestion, but I do see where the assertions appended to the task are easier to consume and do not create an additional type.

@bcoca
Copy link
Member

bcoca commented Sep 15, 2017

my preference is to just use include_role/tasks to do the same operations as this is basically 'get facts + compare/assert' .. and I know they are not w/o problems but I would prefer to put the effort to finish bringing them up to spec vs creating a whole new breed.

@privateip
Copy link

Just a quick historical note, DI was conceived and much of it built before we had persistent connections. It was needed to overcome some of the limitations we had with the connection module (pre-persistence). Now that is no longer the case, we have the opportunity to re-consider this from the ground floor up. I think I would be more in favor of @bcoca suggestion of better facts + asserts long term.

@dagwieers
Copy link
Contributor Author

@privateip This is confusing me, because we announced this at AnsibleFest SF last week for the complete audience as part of the v2.4 innovations in the networking space. So are we announcing things we already know we don't want to do going forward ?

@privateip
Copy link

No the work was done and we decided to move forward with it for networking modules. My comments are in the context of considering a more broadly implementable solution across all of Ansible.

@dagwieers
Copy link
Contributor Author

That's what I think is not a good idea. I don't think we want different Ansible concepts depending on the related technology domain. It's too confusing IMO.

@privateip
Copy link

Agreed ... if we adopt an Ansible wide solution to this I'm fully on board to refactoring the network modules to align with the strategy.

@gundalow
Copy link
Contributor

The Core Network team has agreed that we will not add the technical guide (ansible/ansible#26954) for Declarative Intent into 2.4.0 to avoid confusing the end users.

@ganeshrn
Copy link
Member

ganeshrn commented Sep 19, 2017

@bcoca Can you please elaborate on include_role/tasks approach that you proposed in above comment.

@bcoca
Copy link
Member

bcoca commented Sep 19, 2017

@ganeshrn basically, just use tasks to do the same thing vs having to encode it into action/module.

@abadger
Copy link
Contributor

abadger commented Sep 19, 2017

I like both bcoca's ideas and dag's ideas at the moment. bcoca's idea could be done without code changes. But it means we need to start shipping and supporting pre-baked roles. What does that look like?

dag's idea is a much nicer UI than the current declarative intent proposal. It requires changes to the playbook language. However, there's no information on implementation so there's more design work to do before we could spec out just how much work this would be (and the caveats).

@samdoran
Copy link

The main debate seems to center around creating new functionality built into modules, relying on existing capabilities that can be handled by registering the output of a task and testing it with another task, or coming up with an elegant way to leverage existing capabilities in Ansible but with a better UI for modules.

Validating the operational state of a system after Ansible makes changes has always been possible, but it's not done as commonly on non-network operating systems. Just trust the module to change the system, and trust the system to change its state.

For the times when that doesn't get the job done (problematic services that don't really start or DBs that take a while to become ready), we have register, assert, until, wait_for, etc. The down side to these is that it puts the burden on the playbook author to put all the pieces of validation together. This can become repetitive very quickly if you need to validate the operational state after every task, which is much more common in network operating systems.

We have the trappings of these type of things already with validate on some modules and waitfor on networking modules (I think that's still a thing). Rather than inventing a third "check against" type of module, how can we unify these existing features and present a better UI to accomplish configuration change and state validation succinctly? We have the technology, as they say.

I think this could be a shorthand/alternative syntax for register + assert/wait_for but reuse existing functionality in Ansible without having to create something new or retool modules (maybe that's far too lofty, but one can dream).


Etherpad for reference/real time collaboration.

@bcoca
Copy link
Member

bcoca commented Sep 19, 2017

I would say modules already should check that the desired state is achieved, in most cases this is done by checking the 'success' of the tool/api/function call invoked and trusting THAT. What they don't normally do is arbitrary 'extra checks' outside of the commands/API they use i.e sytemctl restart tomcat doesn't make sense that the systemd module also checks http://localhost:8080/status.xml for a 200 http response.

The validate is not exactly what most think, it is pretty limited, it does NOT check that the module did it's work (copy/template produced a file already), but that the 'work product' is accepted/usable by the destination tool BEFORE putting it into place, this is not the same as the features described above.

I'm remiss abgout adding something like this built into the language as it will always be more limited than what a 'full fledged task' can do. Even something like an 'assert strategy' that can interject tasks before/after execution of each task should be a lot more to use and implement (openshift/stack has one already).

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

@bcoca I agree with most of what you say, my only objections (and for me trumps mostly everything else) is convenience and complexity.

Writing a role per task for this seems inconvenient, even if these roles already exist (for every network device ?) there is a burden (selecting, trusting, managing roles) and the complexity understanding and troubleshooting issues when they arise. If you abstract away through roles, and it fails, the error throws you into that complexity.

And the great strength of Ansible is that it makes hard/complex tasks easy through a simple interface. Do we move the complexity into roles (user-facing) or in the core (abstracted away in a single task) ?

If my example was too limited compared to assert that's solved by doing this:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  assertions:
  - this.link == 'connected'
  - this.tx_rate == '10gb'

True, it doesn't offer the same as one or more tasks could offer, but maybe it fits 80% of the use-cases, and the remaining part can still add the required tasks.

@mikedlr
Copy link

mikedlr commented Sep 20, 2017

Specific example of situation where the result of the module may not be what we want:

when we create an RDS instance, we have the option to wait for a timeout. If we do wait then sometimes the RDS instance is up and running as expected. Sometimes it takes longer, up to over 10 minutes.

It would be very useful to have the actions like creating a database in the DBM instance happen at the point when the RDS has finally started working. I currently do exactly that: create the instance without waiting, then do some other slow actions (create ec2 instances) then come back and wait for the instance to be up before continuing with further dependent configuration.

@mikedlr
Copy link

mikedlr commented Sep 20, 2017

My proposal would be something like:

   - name: check that interface is ready
     assert: 
         module: net_interface_facts
             link: connected
             tx_rate: 10gb        
        retry_timeout:  200
        assertion:
             - "succeeded|true"

  - name: configure interface
    net_interface:
      name: GigabitEthernet0/2
      description: public interface configuration
      enabled: yes

The "assert" (not sure if it should share a name with normal assert, I currently think so) becomes a plugin that runs other modules. The modules can be fact modules or standard modules run in check mode. It keeps retrying until it has success or times out. By running it without retry it simply fails and stops the script.

This can then be used with any existing module which supports check mode (or is a proper facts module) and at the same time solves both the verification and the make sure it's happened cases.

@bcoca
Copy link
Member

bcoca commented Sep 20, 2017

@abadger, @dagwieers one of the advantages of implementing this in roles/includes is that users don't need a programmer to extend or add these behaviors, a few 'reference implementations' is all that is needed for a much wider audience to start producing coverage for more cases.

That said, looking at dag's example, yes it is nice and concise, but i would argue that it introduces many 'uncontrolled' questions as:

  • which other module do i execute to gather the facts?
  • do i used the same connection/become information?
  • how do i handle errors caused with this 'hidden task'?
  • do i expose the results? how?
  • loops?

I would argue there is a simpler way to do the same thing, which just needs some extra support in modules, I'll use systemd as an example as it already returns 'state after action' information in it's response:

- name: restart tomcat
 systemd:
    name: tomcat
    state: restarted
    enabled: yes
  failed_when:   # basically a 'reverse assert'
    - result.ActiveState not in ['Active', 'Activating']
    - result.UnitFileState !=  "enabled"
    - result.UMask != "0022"

Like with @mikedir 's 'assert wrapping modules' , this would require minor modifications to modules to return the 'after action state', probably with a toggle as this might be pretty large and time consuming in some cases. If we really need to 'build in assert' I believe this is probably the most effective way and should be less onerous on users as it does not duplicate failed_when as a new keyword assertions.

@privateip
Copy link

+1 @bcoca proposal. We are essentially doing a form of this today with *_command modules (see wait_for argument here). However, by making this a task directive instead of baking into the module seems to align much more universally across the playbook while still maintaining the visual separation of config vs state.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

@bcoca This is very close to what I am proposing. The difference between failed_when and assertions is that assertions would allow you to use a different failure-path (e.g. ignore them and continue with the playbook). Since we don't control this state, we may not want to fail on them in certain cases.

It would mean you can control how ansible-playbook behaves by default, and on-demand. (See discussion above about this being the state of the cluster, or the state of an interface link).

  • which other module do i execute to gather the facts?

This example expects the information to come from the module. So yes the module exposes this additional information relating the object being managed. (In case of network modules, it would implement the required delay to intercept possible changes related to the task, just as it was with DI)

  • do i used the same connection/become information?

It is still one task, so yes I guess.

  • how do i handle errors caused with this 'hidden task'?

That is up to the user if he wants this to be a possible failure case or not. I can see use-cases for both fail on assertion failure, or continue with the playbook.

  • do i expose the results? how?

Either it works or it fails (and the user wants it to fail). If the user does not want to fail on assertions, exposing a warning seems useful.

  • loops?

Would work because it's just a task with some additional handling on exit.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

BTW I always wanted to make the proposal to introduce this as the default registered variable for the task result values. Which you could use with changed_when, failed_when and now assertions.

- command: foo bar
  changed_when: this.rc in [2, 3]
  failed_when: this.rc >= 4

Avoids having to register and is much cleaner if you don't need to expose it to other tasks.

@bcoca
Copy link
Member

bcoca commented Sep 20, 2017

@dagwieers so i think mixing up 3 features takes us nowhere:

  • i'm fine with adding a this (i believe we used to expose result(s) to when: inside a loop) that does not get registered for conditionals to work.
  • assertions is just a not failed_when at this point
  • fact gathering after a task to get these results is another different issue, i prefer to rely on module supplied data vs having to execute another task, for many modules this could be the default i.e. systemd, for others it is probably worth a toggle return_full_state=no|yes (bikeshed away!).

@dagwieers
Copy link
Contributor Author

@bcoca We need a way to influence whether the assertions fail or not at run-time. Sometimes you want to test the operational state as part of the playbook (because you want to verify things are exactly as you like), but at other times you want to perform all tasks regardless of the operational state, because you can fix those after running that playbook.

This means we need a run-time flag to influence the behavior (fail, warn or ignore ?), and a default behaviour.

failed_when is pretty specific to that, it defines when the task must fail. But if we have no direct control over the operational state we are testing, we may not want to fail.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

Talking about use-cases and user interfaces makes perfect sense to get to something useful., it doesn't matter we are building new concepts that are useful to a wider audience. Not sure why you consider this taking us nowhere.

Kind of condescending, especially if you ignore half of what I am saying and adopt the other half :-)

I will make the proposal for this and we can consider it as a given here.

@bcoca
Copy link
Member

bcoca commented Sep 20, 2017

This means we need a run-time flag to influence the behavior (fail, warn or ignore ?), and a default behaviour.

ignore_errors?

I'm not sure how that is condescending, I'm just pointing out that if you keep adding features on top of the existing discussion, it will extend infinitely, discussing each feature separately will limit the scope and length of such discussions.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

You can't design an equivalent solution for DI by discussing each "feature" independently. That will bring us nowhere, because we don't know yet what we want. (Look at how this evolved from top to bottom).

The condescending part includes you ignore half of what I am saying.

No, ignore_errors does not help controlling these assertions at run-time. It does not separate the stuff that is a real module failure with operational states.

Take the original example:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  assertions:
  - this.link == 'connected'
  - this.tx_rate == '10gb'

You may want to run it with the behavior that a failed assertion may be ignored, or should be a warning. (Some flag, I don't know how to name it --failed-assertions=warn. I am sure people can come up with a better name, I am not married to a specific name either. We called it expectations before.)

In your case:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  failed_when:
  - this.link != 'connected'
  - this.tx_rate != '10gb'

What if there is no link on one interface and you're running this in production and a lot of important configuration follows. What to do if this an urgent operation in a maintenance window. Sure the fact that we have a link down is something that needs to be looked at, but does it warrant stopping the complete execution ? I don't think so. A bad cable should not block you from performing other actions.

So let's add ignore_errors in the playbook, now it will continue even if the module would fail hard for things we do control and should work. Operational states are a different beast than configuration items we control ourselves. And may need to be handled differently.

@bcoca
Copy link
Member

bcoca commented Sep 20, 2017

well, the ignore_errors is also templatable and can handle complex conditions, but not need to go that far when the failed when can also do same.

  failed_when:
  - assertions_fatal|bool AND ( this.link != 'connected' AND  this.tx_rate != '10gb') OR this|failed

in any case your examples did not show an additional 'toggle' for assertions being fatal or not, which should not be lmited to a global scope as you might want to handle that a block/role/include/task level also, so you need another keyword for that too.

@dagwieers
Copy link
Contributor Author

dagwieers commented Sep 20, 2017

Right, but then it becomes an inconvenient and very custom thing. Again, that's where designing this as a new concept in core makes it easier and valuable for people to use. That's how we evolved from custom shell scripts, to templated playbooks, to using Jinja2, to proper YAML playbooks and various neat features.

The same reason why I think using roles, while technically possible, is inconvenient and overly complex.

The proposal is still growing as we discuss, and it depends on naming, functionality and use-cases. Whether we want to handle this per block/role/include/task is something to discuss. My use-cases wouldn't require it, but I don't want to force my use-case one everyone and wouldn't mind if this is configurable at every level.

(Remember this was all to find an alternative and more versatile solution for Declarative Intent, so I would like to learn from the Networking people how they would be using this before making any decisions. However I can see myself using this in playbooks and for integration testing.)

@itdependsnetworks
Copy link

Agreed with @dagwieers it is certainly possible to do this with roles, just inconvenient and overly complex.

@bcoca
Copy link
Member

bcoca commented Sep 20, 2017

i would argue it is less complex with roles, you are just shifting the amount of people that can handle this from people that can write/plays roles to developers of the modules and core engine.

No matter which avenue we need to write this for ALL the OSs/functions/etc, i find that creating 'reference roles' as I mention above will get us there a lot sooner than 'reference modules' and core features.

@gundalow
Copy link
Contributor

Also collecting pros/cons for each solution in https://public.etherpad-mozilla.org/p/ansible-proposal-70

@itdependsnetworks
Copy link

I think one thing to remember is this functionality already exist, @bcoca is your intention to remove this functionality?

@gundalow
Copy link
Contributor

Would would be useful is if everyone[1] please describes what they believe the problem we are trying to solve is

[1] Yes, that includes you

@gundalow
Copy link
Contributor

As a network engineer running a Playbook , how can I ensure that at the end of a Playbook run my network is configured correctly. Given that my network contains things that Ansible can not configure (external factors), such as ensuing that a physical cable is plugged in and links to switches and that link is UP.

If a check fails (cable unplugged) I want the ability to either roll back configuration or fail at that point

@itdependsnetworks
Copy link

1 - Updating the operational state to be a dict while keeping the same functionality
2 - Make this an abstract enough feature that can be applied to other modules.

@gundalow
Copy link
Contributor

All,
We plan to conclude the discussion on Wednesday 4th October, so please ensure all your thoughts are added by then.

@rcarrillocruz
Copy link

I'm going to drop my thoughts:

  1. While this can be done with roles, I don't think we should rely on roles. I tend to see roles as last-resort thing to use when modules don't give me all options, and this is a feature that is general enough to put into modules IMHO.
  2. I really want this to be a general thing, not a bolt-on networking thing.
  3. I like where the whole 'assertions' thing is going, but as @bcoca said I'm a bit torn adding a new keyword, as there's overlap with failed_when. I rather augment failed_when expressiveness than adding a new one.

How about something like:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  failed_when:
    stop_on_failure: no
    assert:
      - stop_on_failure: yes
        that: "this.link != 'connected'"
      - that: this.tx_rate != '10gb'
        warn_msg: "ZOMG, your link is not 10Gb, however you can move on"
      - that: this.rx_rate != '10gb'

That way we could use failed_when as the current 'fail in case this list of assertions fail', but we also augment it to finer grain control to say 'ok, in general i don't want to stop play if asserts are false just warn (which message could be configured in an assert by assert basis) BUT treat each one of the assertions individually as they may change behaviour as needed'.
This also uses assert/that keywords, that is already available in the Ansible language making it easier to learn for existing users. I expect we could also reuse/refactor logic from assert module and librify functionality to use it here.

  1. While we can template ignore_errors, it becomes complex quickly, we should be able to express things easier like above example

@bcoca
Copy link
Member

bcoca commented Sep 25, 2017

A few of things about 'roles approach':

  • it can be done NOW, no additional core code is needed.
  • compatible with existing Ansible versions.
  • immediate feedback on usage.
  • quick and easy to iterate to what users want/need/use (including by users themselves)
  • With one or 2 reference implementations, users can modify/extend/create their own.
  • it does NOT prevent features being built into the language, which can benefit from lessons learned.

So i would dedicate a short time to creating a couple of these roles, share them on galaxy and see what happens. Even if you want to build this into the core language and update every module to support it, the roles would still work regardless and provide feedback from users in a way that is quick and easy to respond to.

@rcarrillocruz
Copy link

Also, the failed_when could expose a delay param, that would effectively mean 'wait $delay time before gathering facts and checking assertions'.
Like:

- name: configure interface
  net_interface:
    name: GigabitEthernet0/2
    description: public interface configuration
    enabled: yes
  failed_when:
    stop_on_failure: no
    delay: 30
    assert:
      - stop_on_failure: yes
        that: "this.link != 'connected'"
      - that: this.tx_rate != '10gb'
        warn_msg: "ZOMG, your link is not 10Gb, however you can move on"
      - that: this.rx_rate != '10gb'

@privateip
Copy link

@dag This is no longer being considered a viable implementation. We are moving to roles to handle this implementation which will be decoupled from Ansible core.

Any objections to closing this proposal at this time?

@dagwieers
Copy link
Contributor Author

@privateip No, makes perfect sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests