Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be systematic about parsing and validating hostnames and addresses #12165

Merged
merged 3 commits into from Sep 11, 2015

Conversation

amenonsen
Copy link
Contributor

This adds a parse_address(pattern) utility function that returns
(host,port), and uses it wherever where we accept IPv4 and IPv6
addresses and hostnames (or host patterns).

This is a proof of concept for how I think it should be done. Some changes are needed: for example, we may want to be a bit more restrictive about hostnames (e.g. "foo-" should be disallowed), and a few other details may need to be cleaned up. But this is a solid foundation, and it should be backwards-compatible as well.

See also discussion in #11747

I welcome feedback/reviews/testing.

cc: @abadger @srvg @ryanpetrello

@amenonsen
Copy link
Contributor Author

See also the later discussion in #12108 regarding defining myUser@127.0.0.1:1234 in the inventory instead of setting ansible_ssh_user=myUser. This worked purely by accident earlier (the port was removed, and myUser@127.0.0.1 was passed to ssh(1) as-is.) Because ansible didn't know about the user, it would happily set -o User=otherUser if, for example, there were a conflicting group var setting.

@jimi-c said that we don't want to support this; the code in this PR already rejects the notation.

(Aside: if desired, however, "proper" support could easily be added on the basis of this PR, i.e. to detect the user@ prefix, remove it, and set ansible_ssh_user for the host.)

@jimi-c
Copy link
Member

jimi-c commented Aug 31, 2015

@amenonsen agreed, the support could be added if someone wanted to, in which case we would support it.

@amenonsen
Copy link
Contributor Author

I did a bit more testing and implemented more precise validation of hostname labels (e.g. to enforce the DNS rule that a label can't end with a hyphen, i.e. foo- isn't valid).

At this point, my suggestion would be to merge this during the 2.0 alpha period so that we can see if there are any more obscure/undocumented/broken inventory definitions in the wild (such as user@127.0.0.1:1234, discovered by accident recently) and take a decision about whether to support them, or warn about them in the changelog, or whatever. I'm confident that it's easier to extend this code to support odd cases than it would have been to do with the earlier code, and the test cases will keep us more honest about inventory hostname parsing in general.

@amenonsen amenonsen changed the title Be systematic about parsing and validating hostnames and addresses (proof of concept) Be systematic about parsing and validating hostnames and addresses Sep 3, 2015
@srgvg
Copy link
Contributor

srgvg commented Sep 3, 2015

FYI

I checked this PR on my large inventory by dumping everything into a json file. Dump from

  • ansible 2.0.0 (devel 7ece767) last updated 2015/09/03 19:40:16 (GMT +200)
  • and this PR
    yield an identical export

@amenonsen amenonsen force-pushed the address-parsing branch 2 times, most recently from 78ecc48 to fab850e Compare September 4, 2015 11:39
@amenonsen
Copy link
Contributor Author

@chrrrles I believe you were looking at this PR. Did you come to any conclusions?

range_tests = {
'192.0.2.[3:10]': ['192.0.2.[3:10]', None],
'192.0.2.[3:10]:23': ['192.0.2.[3:10]', 23],
'abcd:ef98::7654:[1:9]': ['abcd:ef98::7654:[1:9]', None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test ranges of ipv6 addresses with a port, ie:

        '[abcd:ef98::7654:[1:9]]:2222': ['[abcd:ef98::7654:[1:9]]:2222', None]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've added this test in the latest version.)

@amenonsen amenonsen force-pushed the address-parsing branch 2 times, most recently from 9f4b98a to 9d29bc9 Compare September 8, 2015 12:52
@amenonsen
Copy link
Contributor Author

I've squashed this PR down to three commits (but if it's merged, I request that it not be squashed further; I believe there's value in having the hostname parsing changes separated for better understanding and future debugging).

@amenonsen amenonsen force-pushed the address-parsing branch 2 times, most recently from e2bfc33 to b49f17f Compare September 9, 2015 09:04
@amenonsen
Copy link
Contributor Author

@chrrrles mentioned on IRC that this patch apparently caused an integration test failure. I had a quick look. The test in question is from test/integration/unicode.yml, which does the following:

    - add_host:
        name: '{{hostnames}}.{{item}}'
        groups: 'ĪīĬĭ'
        ansible_connection: local
        host_id: '{{item}}'
      with_sequence: start=1 end={{num_hosts}} format=%d

First, there's a trivial missing import in add_host.py; once that's fixed, we see that the '{{hostnames}}.{{item}}' syntax used for name: above doesn't actually work (and it appears to be the only place in the tests where this notation is used). It expands hostnames as an array, then adds .1, and passes the resulting string to add_host, which quite sensibly rejects it. So I think this is a clearly broken test.

If I change this to use, e.g. hostnames[2], the test passes. But note that the actual values in hostnames are a bit weird:

    hostnames:
        - 'host-#ϬϭϮϯϰ'
        - 'host-ͰͱͲͳʹ͵'
        - 'host-ΙΚΛΜΝΞ'
        - 'host-στυφχψ'
        - 'host-ϬϭϮϯϰϱ'

I can understand supporting Unicode word characters in inventory hostnames for convenience (though it's not very convenient in practice, because you have to set ansible_ssh_host for the names to be usable), but I see no reason to support Unicode punctuation characters in hostnames (e.g. hostnames[1] above, plus the # in hostnames[0]). The last three values all pass the test with a trivial two-line change to my parsing code in this PR (i.e. change a-z to \w and set the re.UNICODE flag).

@amenonsen
Copy link
Contributor Author

@chrrrles @abadger I've rebased, fixed the import error, made the a-z\w change (and added a couple of Unicode unit tests), squashed down to two clean commits, and updated the PR. I have also pushed an additional commit to fix the broken integration test.

(BTW, I can't help but feel a bit pleased that the sum total of changes required to make the new code support unicode inventory hostnames was two lines.)

AFAIK, I've addressed all the substantive comments on this PR, and it's ready to merge now.

This adds a parse_address(pattern) utility function that returns
(host,port), and uses it wherever where we accept IPv4 and IPv6
addresses and hostnames (or host patterns): the inventory parser
the the add_host action plugin.

It also introduces a more extensive set of unit tests that supersedes
the old add_host unit tests (which didn't actually test add_host, but
only the parsing function).
Labels must start with an alphanumeric character, may contain
alphanumeric characters or hyphens, but must not end with a hyphen.
We enforce those rules, but allow underscores wherever hyphens are
accepted, and allow alphanumeric ranges anywhere.

We relax the definition of "alphanumeric" to include Unicode characters
even though such inventory hostnames cannot be used in practice unless
an ansible_ssh_host is set for each of them.

We still don't enforce length restrictions—the fact that we have to
accept ranges makes it more complex, and it doesn't seem especially
worthwhile.
1. The test did "name: '{{hostnames}}.{{item}}'" inside a with_sequence
   loop, which didn't do what was intended: it expanded hostnames into
   an array, appended ".1", and set name to the resulting string. This
   can be converted to a simple with_items loop.

2. Some of the entries in hostnames contained punctuation characters,
   which I see no reason to support in inventory hostnames anyway.

3. Once the add_host failures are fixed, the playbook later fails when
   the unicode hostnames are interpolated into debug output in ssh.py
   due to an encoding error. This is only one of the many places that
   may fail when using unicode inventory hostnames; we work around it
   by providing an ansible_ssh_host setting.
chrrrles added a commit that referenced this pull request Sep 11, 2015
Hi @amenonsen - thanks for fixing up the hunting down the unicode bug and expanding test_addresses.  The code looks good, merging!-- Be systematic about parsing and validating hostnames and addresses
@chrrrles chrrrles merged commit ba7734b into ansible:devel Sep 11, 2015
@amenonsen amenonsen deleted the address-parsing branch December 11, 2015 02:32
@ansibot ansibot added feature This issue/PR relates to a feature request. bug This issue/PR relates to a bug. and removed feature_pull_request labels Mar 4, 2018
@ansible ansible locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug. feature This issue/PR relates to a feature request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants