Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

add vpc nat gateway module. #1687

Closed

Conversation

jonhadfield
Copy link

This adds the functionality to manage a VPC Managed NAT Gateway.

Previous PR #1438 closed as github won't allow me to re-open after a forced push.

required: false
default: 300
region:
description:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add region. extend with 'ec2' doc fragment instead

@jonhadfield
Copy link
Author

@wimnat I've corrected and squashed previous commits.

@robynbergeron
Copy link
Contributor

Thanks @jonhadfield for this new module. When this module receives 'shipit' comments from two community members and any 'needs_revision' comments have been resolved, we will mark for inclusion.

@viper233
Copy link
Contributor

viper233 commented Mar 7, 2016

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_args": {"allocation_id": "eipalloc-XXXXXXX", "aws_access_key": "ASIAJR62UYMGF4TYUJTA", "aws_secret_key": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "ec2_url": null, "nat_gateway_id": null, "profile": null, "region": null, "security_token": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "state": "present", "subnet_id": "subnet-XXXXXXX", "validate_certs": true, "wait": true, "wait_timeout": 300}, "module_name": "ec2_vpc_nat_gateway"}, "msg": "allocation: eipalloc-XXXXXXX does not exist."}

I'm not able associate with an eip allocation when assuming a role in another account. I don't have any problem creating the eip and it seems valid, correct id and VPC.

Any idea what could be wrong? I'm getting a similar issue with the #1446 nat gateway facts module not authenticating. Other AWS modules work fine, ec2_vpc_net_facts, ec2_vpc_subnet_facts.

Thanks!

@viper233
Copy link
Contributor

viper233 commented Mar 7, 2016

Nevermind, this has been fixed in ansible/ansible/pull/14633

@khogeland
Copy link

I think this is missing something, or maybe I'm missing something... currently, the method of identification (and therefore idempotency) is EIP allocation. EIPs are by design not identifiable (allocate when needed, or take an available allocated IP if possible), so they are unusable as ID. It seems there needs to be a field like multiple_gateways_allowed which would let you enforce a single NAT gateway per subnet or VPC.

@jonhadfield
Copy link
Author

Thanks for the feedback @khogeland.
To (idempotently) enforce the presence of a NAT gateway in a particular subnet with a specific EIP you must provide at a minimum the subnet-id and the allocation-id of the EIP.
I intentionally don't auto-allocate an EIP when creating a NAT gateway as I believe that's the job of the ec2_eip module. This not only forces idempotence (you have to specify an existing allocation-id) but avoids duplicating functionality.
I managed to get ansible/ansible-modules-core#2753 merged so that the allocation-id would be returned for a successful VPC EIP allocation.

@khogeland
Copy link

Right, it's the "with a specific EIP" part that is a problem. The ec2_eip module is not idempotent, it only returns an unused EIP, allocating a new one if all current EIPs are associated. What you're suggesting requires users to do backflips to not create another EIP (which would cause another NGW to be created). That's backwards - you shouldn't be managing EIPs, the EIPs should only be created if needed. The user flow should like this this:
user requests NGW -> no NGWs in subnet -> module creates NGW and associates new IP
user requests NGW -> already an NGW in subnet -> module returns existing NGW

@jonhadfield
Copy link
Author

It is possible to have more than one NAT gateway per subnet, so to ensure idempotency of ensuring one exists, you must provide a combination of allocation-id and subnet-id. So the second flow would prevent more than one being created.
I imagine the flows should be:
user requests NGW with alloc-id and subnet id -> doesn't exist -> module creates
user requests NGW with alloc-id and subnet id -> does exist -> shows un-changed
I guess there could be a third flow:
user requests NGW with subnet-id -> module allocates an EIP and creates another NAT GW
but this wouldn't be idempotent as it would just keep adding NAT GWs until you hit your EIP limit

@khogeland
Copy link

I don't agree, but I don't have a better solution. Maybe an exact_count parameter similar to the ec2 module? Unsure why Amazon decided that NGWs shouldn't be tagged. 😕

@jonhadfield
Copy link
Author

I guess I should have put 'I imagine the flows can only be' as it's not really a preference but a limitation.
I keep raising the tagging issues: limited to 10, lack of them on sec group rules, etc. with our AWS contact, but I only get shrugs as a response.

@gaieges
Copy link

gaieges commented Apr 22, 2016

Where does this stand? Is it waiting on further review or [tag] modifications from Amazon?

@jonhadfield
Copy link
Author

@gaieges The module has been created and reviewed by myself and @Etherdaemon and tested by ourselves and other community members: #1438
The discussion above was a concern/preference over idempotency but I don't think there is a clear case for further changes.
We're now just waiting for it to be merged.

@aioue
Copy link
Contributor

aioue commented Apr 27, 2016

Related #1882

@aioue
Copy link
Contributor

aioue commented Apr 28, 2016

@jonhadfield @Etherdaemon is it possibly to add this file to my project and reference it someone in order to test/use it before it is merged into 2.1?

@Etherdaemon
Copy link
Contributor

@aioue for sure - I already do this now for modules that I have written and hasn't been merged yet due to reviews etc. Just add it to your library folder and ensure it includes the contributor.

EG:

├── README.md
├── STRUCTURE_GUIDELINE.md
├── ansible.cfg
├── inventory
├── library
  ├── ec2_vpc_nat_gateway.py
├── roles
│   ├── allinone
│   │   ├── defaults
│   │   │   └── main.yml
│   │   └── tasks
│   │       ├── main.yml
└── playbook_a.yml

type: string
sample: "nat-0d1e3a878585988f8"
'''

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some RETURN documentation

@wimnat
Copy link
Contributor

wimnat commented Apr 28, 2016

needs_revision

@jonhadfield
Copy link
Author

Will sort out today. Thanks.

@gregdek
Copy link
Contributor

gregdek commented Jun 22, 2016

Thanks @jonhadfield for this PR. This PR requires revisions, either because it fails to build or by reviewer request. Please make the suggested revisions. When you are done, please comment with text 'ready_for_review' and we will put this PR back into review.

[This message brought to you by your friendly Ansibull-bot.]

@jonhadfield
Copy link
Author

@gregdek It failed to build due to an issue with an unrelated module that's now fixed.
I've rebased which has forced a rebuild and it's now green.

@gregdek
Copy link
Contributor

gregdek commented Jun 23, 2016

Thanks @jonhadfield -- it appears to be related to "old" PRs, for some value of "old" that we can't quite nail down. I hesitate to say "just rebase and it'll probably go away," but that seems to be the answer in most cases.

@wimnat
Copy link
Contributor

wimnat commented Jun 27, 2016

shipit works_for_me

# Check max_count has not been exceeded before attempting to create
if module.params.get('max_count') < num_nat_gateways:
module.fail_json(
msg="The maximum desired number of NAT Gateways specified in "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't fail when the right number exist - that's a desired scenario.

@ryansb
Copy link
Contributor

ryansb commented Jun 29, 2016

I'm not quite happy with the behavior yet. If the right number of instances do exist, the module fails. My play:

  - name: allocate EIP
    ec2_eip:
      in_vpc: true
      region: us-west-2
      state: present
    register: eip
  - name: Test NAT gateway
    ec2_vpc_nat_gateway:
      state: present
      subnet_id: "{{ my-subnet }}"
      region: us-west-2
      allocation_id: "{{ eip.allocation_id }}"
      max_count: 1

The first time it works great, but all subsequent runs fail with:

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_args": {"allocation_id": "eipalloc-1cf9d478", "aws_access
_key": null, "aws_secret_key": null, "ec2_url": null, "max_count": 1, "nat_gateway_id": null, "profile": null, "region": "us-west-2", "security_to
ken": null, "state": "present", "subnet_id": "subnet-9f18bdc6", "validate_certs": true, "wait": true, "wait_timeout": 300}, "module_name": "ec2_vp
c_nat_gateway"}, "msg": "The maximum desired number of NAT Gateways specified in max_count (default: 1) has already been met."}

@jonhadfield
Copy link
Author

The logic flow is:

  • Check number of NAT GWs in specified subnet (if max_count exceeded - fail)
  • If number not exceeded, check to see if specified NAT GW already exists (if so, exit with changed=False)
  • If the NAT gateway doesn't already exist, then check that creation of a new one won't exceed max count
    ...

On your second run, you've specified a different EIP, so it will attempt to create a new NAT GW but fail because it's reached max_count. If the EIP was the same, then it would exit unchanged.

check for error code rather than look for matching substring.

combine module exit statements.

return all nat gateway details on creation.

update RETURN documentation.

increase version to 2.2 - modify RETURN output.

implement a max_count parameter on number of nat gateways.

tidy up exception handling.

NoCredentialsError instance does not return a response attribute, so drop.

improve logic.
@jonhadfield
Copy link
Author

Removed 'inline if' as suggested by @ryansb, re-tested scenarios and re-based.

-        num_nat_gateways = len(nat_gateways.get('NatGateways')) if nat_gateways.get('NatGateways') else 0
+        num_nat_gateways = len(nat_gateways.get('NatGateways', []))

@ryansb
Copy link
Contributor

ryansb commented Jul 5, 2016

I disagree with the logic in the module. Since you have to specify an allocation ID, how does having a max_count of greater than one make sense?

To have a playbook run on the first try, I have to:

  1. Provision an EIP as part of the playbook
  2. Store the EIP ID somewhere manually
  3. Provision a NAT Gateway with that ID

I think a sensible tweak would be:

  1. Check number of NAT GWs in specified subnet (if max_count equal to current number - exit changed=False)
  2. If max_count exceeded - exit with fail
  3. Check NAT GW (by EIP allocation ID) if so, exit with changed=False
  4. If the NAT gateway doesn't already exist, and the new one won't exceed max_count, then then check that creation of a new one won't exceed max count, create and exit changed=True

@jonhadfield
Copy link
Author

max_count is to limit the number of NAT Gateways per subnet, not per allocation ID (specified in document string).
My concern with item 1 you put above is that a user is defining a state based on subnet and allocation id combined. So, in a scenario where they said max_count: 1, subnet_id: s1, allocation_id: a1, and the current state was having a single NAT Gateway in the subnet s1 with a allocation_id a2, the desired state has not been met.
I could be more specific by changing the parameter to max_count_per_subnet, or modify as you suggest and keep an eye on issues/feedback if it gets merged? Just worried this might be adding further ambiguity (of what constitutes desired state) for a separate EIP allocation issue.

@gregdek
Copy link
Contributor

gregdek commented Jul 8, 2016

@jonhadfield A friendly reminder: this pull request has been marked as needing your action. If you still believe that this PR applies, and you intend to address the issues with this PR, just let us know in the PR itself and we will keep it open pending your changes. When you do address the issues, please respond with ready_for_review in your comment, so that we can notify the maintainer.

[This message brought to you by your friendly Ansibull-bot.]

@jonhadfield
Copy link
Author

No change made yet, but looking for feedback to previous comment.
ready_for_review

@gregdek
Copy link
Contributor

gregdek commented Jul 9, 2016

Thanks @jonhadfield for this new module. When this module receives 'shipit' comments from two community members and any 'needs_revision' comments have been resolved, we will mark for inclusion.

[This message brought to you by your friendly Ansibull-bot.]

@badmojo76
Copy link

Thanks @jonhadfield for putting this module together - and apologies in advance for the long post.

Just a thought - it appears that the "clientToken" parameter is intended to combat the issues with idempotency... So why not introduce it as a variable and let AWS's API handle any Idempotency mismatch issues (i.e., overlapping clientToken names in the same region)?

Based on AWS docs, the clientToken should only allow the request to be executed the FIRST time, given that no other settings change (i.e., subnet, EIP allocation, etc.), with each subsequent run simply returning the same values, but not actually taking action. Looks accurate, based on my testing:

AWS CLI command:

$ aws ec2 create-nat-gateway --subnet-id subnet-5defc605 --allocation-id eipalloc-3fadd858 --client-token this-is-my-nat-version-1

First run:

{
    "ClientToken": "this-is-my-nat-version-1",
    "NatGateway": {
        "NatGatewayAddresses": [
            {
                "NetworkInterfaceId": "eni-b36459ab",
                "AllocationId": "eipalloc-3fadd858",
                "PrivateIp": "10.200.5.96"
            }
        ],
        "VpcId": "vpc-f26de295",
        "State": "pending",               # Note the state of pending
        "NatGatewayId": "nat-00ee0eb8dc528ed3a",
        "SubnetId": "subnet-5defc605",
        "CreateTime": "2016-07-13T00:13:00.680Z"
    }
}

and once it is out of "pending" state, another run with the exact same configuration results in:

{
    "ClientToken": "this-is-my-nat-version-1",
    "NatGateway": {
        "NatGatewayAddresses": [
            {
                "PublicIp": "52.201.56.124",           # Public IP shows up
                "NetworkInterfaceId": "eni-b36459ab",
                "AllocationId": "eipalloc-3fadd858",
                "PrivateIp": "10.200.5.96"         
            }
        ],
        "VpcId": "vpc-f26de295",
        "State": "available",                          # NGW is available
        "NatGatewayId": "nat-00ee0eb8dc528ed3a",
        "SubnetId": "subnet-5defc605",
        "CreateTime": "2016-07-13T00:13:00.680Z"
    }
}

If an attempt to modify the EIP, Subnet, etc. is made, and the clientToken is not modified, AWS will return (in this example, subnet), or if someone mistakenly names 2 NGWs in the same region the same, they'll get:

A client error (IdempotentParameterMismatch) occurred when calling the CreateNatGateway operation: nat-00ee0eb8dc528ed3a already created with the same token but different vpc/subnet

If the gateway is subsequently deleted via DeleteNatGateway, the clientToken is reset on the AWS side as soon as the NGW state goes to "Deleted". So, for example, if a NGW is accidentally deleted, ansible will fail while the state is in "Deleting":

A client error (IdempotentParameterMismatch) occurred when calling the CreateNatGateway operation: nat-0c14bb00a3f077e92 was created with same token but is in an invalid state

...but will succeed and build a new NGW as soon as the old one goes to "Deleted" (a few mins).

Taking @ryansb example (modified for use with clientToken):

 - name: Test NAT gateway
    ec2_vpc_nat_gateway:
      state: present
      subnet_id: "{{ items.subnet_id }}"
      region: us-west-2
      allocation_id: "{{ items.eip_allocation_id }}"
      client_token: my-nat-gateway-{{ items.gateway_id }}
    with_items:
      - { gateway_id: 'dev', eip_allocation_id: '{{ eip1 }}', subnet_id: '{{ subnet1 }}' }
      - { gateway_id: 'test', eip_allocation_id: '{{ eip2 }}', subnet_id: '{{ subnet2 }}' }
      - { gateway_id: 'prod', eip_allocation_id: '{{ eip3 }}', subnet_id: '{{ subnet3 }}' }

and you'll get exactly 3 NGWs with 3 EIPs in 3 specific subnets, every time.

Delete functionality would need to remain as-is with the nat-id, I think. DescribeNatGateways doesn't return a clientToken, and you wouldn't want to create a NGW if it didn't exist, only to turn around and delete it, of course.

No need for arbitrary counts (max or exact). I would think the clientToken should be required, but I may not be thinking of a case where that's not ideal...

Thoughts?

K

@gregdek
Copy link
Contributor

gregdek commented Jul 13, 2016

@jonhadfield so it looks like we've got a situation where we've got two modules that do more or less the same things, and they're both close to the finish line: this one and #1882. An embarrassment of riches, in this case. Thanks to both you and @linuxdynasty, and apologies to both of you for not asking you to join forces sooner; we're still not very good at identifying potentially duplicated modules.

The one thing in favor of #1882, from my perspective, is that it appears to have tests with it.

@jonhadfield could I ask you to take a look at this and #1882 and let me know if there are substantial differences/improvements with this PR that we could incorporate into #1882? Or, if you think this PR is superior to #1882, I'd like to hear that as well.

@jonhadfield
Copy link
Author

jonhadfield commented Jul 14, 2016

@gregdek I first submitted this module 30th December 2015 and then discovered @Etherdaemon had previously submitted one to perform the same task. We worked together and @Etherdaemon then generously closed their PR so we could expedite the inclusion of this functionality into Ansible.

I've had a brief look at #1882 and I think the main difference is that our submission is less complex. For example, I intentionally don't use a public IP as a parameter as I think it unnecessary, but you could also argue that some users may prefer to specify a public IP. I think both modules achieve the same goal and are well written, so it's more of a subjective choice for yourselves and the community.

I appreciate the inclusion of tests for #1882 is a definite advantage. I'd be happy to write them for this module also if it's decided to continue with my PR.

Also happy to close this and help out with @linuxdynasty's submission as I'm more interested in getting the functionality merged than attribution for my efforts (assuming @Etherdaemon feels the same way as #1882 is based off their original module).

@gregdek
Copy link
Contributor

gregdek commented Jul 25, 2016

After discussion, it's time to make a decision, and the decision I'm making is that we're going with #1882 because it has tests. @jonhadfield and @Etherdaemon, thanks to both of you for your help and your patience throughout this process.

@gregdek gregdek closed this Jul 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet