Be systematic about parsing and validating hostnames and addresses #12165

Merged
merged 3 commits into from Sep 11, 2015

Conversation

Projects
None yet
5 participants
@amenonsen
Contributor

amenonsen commented Aug 31, 2015

This adds a parse_address(pattern) utility function that returns
(host,port), and uses it wherever where we accept IPv4 and IPv6
addresses and hostnames (or host patterns).

This is a proof of concept for how I think it should be done. Some changes are needed: for example, we may want to be a bit more restrictive about hostnames (e.g. "foo-" should be disallowed), and a few other details may need to be cleaned up. But this is a solid foundation, and it should be backwards-compatible as well.

See also discussion in #11747

I welcome feedback/reviews/testing.

cc: @abadger @srvg @ryanpetrello

@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Aug 31, 2015

Contributor

See also the later discussion in #12108 regarding defining myUser@127.0.0.1:1234 in the inventory instead of setting ansible_ssh_user=myUser. This worked purely by accident earlier (the port was removed, and myUser@127.0.0.1 was passed to ssh(1) as-is.) Because ansible didn't know about the user, it would happily set -o User=otherUser if, for example, there were a conflicting group var setting.

@jimi-c said that we don't want to support this; the code in this PR already rejects the notation.

(Aside: if desired, however, "proper" support could easily be added on the basis of this PR, i.e. to detect the user@ prefix, remove it, and set ansible_ssh_user for the host.)

Contributor

amenonsen commented Aug 31, 2015

See also the later discussion in #12108 regarding defining myUser@127.0.0.1:1234 in the inventory instead of setting ansible_ssh_user=myUser. This worked purely by accident earlier (the port was removed, and myUser@127.0.0.1 was passed to ssh(1) as-is.) Because ansible didn't know about the user, it would happily set -o User=otherUser if, for example, there were a conflicting group var setting.

@jimi-c said that we don't want to support this; the code in this PR already rejects the notation.

(Aside: if desired, however, "proper" support could easily be added on the basis of this PR, i.e. to detect the user@ prefix, remove it, and set ansible_ssh_user for the host.)

@jimi-c

This comment has been minimized.

Show comment
Hide comment
@jimi-c

jimi-c Aug 31, 2015

Member

@amenonsen agreed, the support could be added if someone wanted to, in which case we would support it.

Member

jimi-c commented Aug 31, 2015

@amenonsen agreed, the support could be added if someone wanted to, in which case we would support it.

@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Sep 3, 2015

Contributor

I did a bit more testing and implemented more precise validation of hostname labels (e.g. to enforce the DNS rule that a label can't end with a hyphen, i.e. foo- isn't valid).

At this point, my suggestion would be to merge this during the 2.0 alpha period so that we can see if there are any more obscure/undocumented/broken inventory definitions in the wild (such as user@127.0.0.1:1234, discovered by accident recently) and take a decision about whether to support them, or warn about them in the changelog, or whatever. I'm confident that it's easier to extend this code to support odd cases than it would have been to do with the earlier code, and the test cases will keep us more honest about inventory hostname parsing in general.

Contributor

amenonsen commented Sep 3, 2015

I did a bit more testing and implemented more precise validation of hostname labels (e.g. to enforce the DNS rule that a label can't end with a hyphen, i.e. foo- isn't valid).

At this point, my suggestion would be to merge this during the 2.0 alpha period so that we can see if there are any more obscure/undocumented/broken inventory definitions in the wild (such as user@127.0.0.1:1234, discovered by accident recently) and take a decision about whether to support them, or warn about them in the changelog, or whatever. I'm confident that it's easier to extend this code to support odd cases than it would have been to do with the earlier code, and the test cases will keep us more honest about inventory hostname parsing in general.

@amenonsen amenonsen changed the title from Be systematic about parsing and validating hostnames and addresses (proof of concept) to Be systematic about parsing and validating hostnames and addresses Sep 3, 2015

@srgvg

This comment has been minimized.

Show comment
Hide comment
@srgvg

srgvg Sep 3, 2015

Member

FYI

I checked this PR on my large inventory by dumping everything into a json file. Dump from

  • ansible 2.0.0 (devel 7ece767) last updated 2015/09/03 19:40:16 (GMT +200)
  • and this PR
    yield an identical export
Member

srgvg commented Sep 3, 2015

FYI

I checked this PR on my large inventory by dumping everything into a json file. Dump from

  • ansible 2.0.0 (devel 7ece767) last updated 2015/09/03 19:40:16 (GMT +200)
  • and this PR
    yield an identical export
@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Sep 8, 2015

Contributor

@chrrrles I believe you were looking at this PR. Did you come to any conclusions?

Contributor

amenonsen commented Sep 8, 2015

@chrrrles I believe you were looking at this PR. Did you come to any conclusions?

+ range_tests = {
+ '192.0.2.[3:10]': ['192.0.2.[3:10]', None],
+ '192.0.2.[3:10]:23': ['192.0.2.[3:10]', 23],
+ 'abcd:ef98::7654:[1:9]': ['abcd:ef98::7654:[1:9]', None],

This comment has been minimized.

@chrrrles

chrrrles Sep 8, 2015

Contributor

We should also test ranges of ipv6 addresses with a port, ie:

        '[abcd:ef98::7654:[1:9]]:2222': ['[abcd:ef98::7654:[1:9]]:2222', None]
@chrrrles

chrrrles Sep 8, 2015

Contributor

We should also test ranges of ipv6 addresses with a port, ie:

        '[abcd:ef98::7654:[1:9]]:2222': ['[abcd:ef98::7654:[1:9]]:2222', None]

This comment has been minimized.

@amenonsen

amenonsen Sep 8, 2015

Contributor

(I've added this test in the latest version.)

@amenonsen

amenonsen Sep 8, 2015

Contributor

(I've added this test in the latest version.)

@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Sep 8, 2015

Contributor

I've squashed this PR down to three commits (but if it's merged, I request that it not be squashed further; I believe there's value in having the hostname parsing changes separated for better understanding and future debugging).

Contributor

amenonsen commented Sep 8, 2015

I've squashed this PR down to three commits (but if it's merged, I request that it not be squashed further; I believe there's value in having the hostname parsing changes separated for better understanding and future debugging).

@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Sep 10, 2015

Contributor

@chrrrles mentioned on IRC that this patch apparently caused an integration test failure. I had a quick look. The test in question is from test/integration/unicode.yml, which does the following:

    - add_host:
        name: '{{hostnames}}.{{item}}'
        groups: 'ĪīĬĭ'
        ansible_connection: local
        host_id: '{{item}}'
      with_sequence: start=1 end={{num_hosts}} format=%d

First, there's a trivial missing import in add_host.py; once that's fixed, we see that the '{{hostnames}}.{{item}}' syntax used for name: above doesn't actually work (and it appears to be the only place in the tests where this notation is used). It expands hostnames as an array, then adds .1, and passes the resulting string to add_host, which quite sensibly rejects it. So I think this is a clearly broken test.

If I change this to use, e.g. hostnames[2], the test passes. But note that the actual values in hostnames are a bit weird:

    hostnames:
        - 'host-#ϬϭϮϯϰ'
        - 'host-ͰͱͲͳʹ͵'
        - 'host-ΙΚΛΜΝΞ'
        - 'host-στυφχψ'
        - 'host-ϬϭϮϯϰϱ'

I can understand supporting Unicode word characters in inventory hostnames for convenience (though it's not very convenient in practice, because you have to set ansible_ssh_host for the names to be usable), but I see no reason to support Unicode punctuation characters in hostnames (e.g. hostnames[1] above, plus the # in hostnames[0]). The last three values all pass the test with a trivial two-line change to my parsing code in this PR (i.e. change a-z to \w and set the re.UNICODE flag).

Contributor

amenonsen commented Sep 10, 2015

@chrrrles mentioned on IRC that this patch apparently caused an integration test failure. I had a quick look. The test in question is from test/integration/unicode.yml, which does the following:

    - add_host:
        name: '{{hostnames}}.{{item}}'
        groups: 'ĪīĬĭ'
        ansible_connection: local
        host_id: '{{item}}'
      with_sequence: start=1 end={{num_hosts}} format=%d

First, there's a trivial missing import in add_host.py; once that's fixed, we see that the '{{hostnames}}.{{item}}' syntax used for name: above doesn't actually work (and it appears to be the only place in the tests where this notation is used). It expands hostnames as an array, then adds .1, and passes the resulting string to add_host, which quite sensibly rejects it. So I think this is a clearly broken test.

If I change this to use, e.g. hostnames[2], the test passes. But note that the actual values in hostnames are a bit weird:

    hostnames:
        - 'host-#ϬϭϮϯϰ'
        - 'host-ͰͱͲͳʹ͵'
        - 'host-ΙΚΛΜΝΞ'
        - 'host-στυφχψ'
        - 'host-ϬϭϮϯϰϱ'

I can understand supporting Unicode word characters in inventory hostnames for convenience (though it's not very convenient in practice, because you have to set ansible_ssh_host for the names to be usable), but I see no reason to support Unicode punctuation characters in hostnames (e.g. hostnames[1] above, plus the # in hostnames[0]). The last three values all pass the test with a trivial two-line change to my parsing code in this PR (i.e. change a-z to \w and set the re.UNICODE flag).

@amenonsen

This comment has been minimized.

Show comment
Hide comment
@amenonsen

amenonsen Sep 10, 2015

Contributor

@chrrrles @abadger I've rebased, fixed the import error, made the a-z\w change (and added a couple of Unicode unit tests), squashed down to two clean commits, and updated the PR. I have also pushed an additional commit to fix the broken integration test.

(BTW, I can't help but feel a bit pleased that the sum total of changes required to make the new code support unicode inventory hostnames was two lines.)

AFAIK, I've addressed all the substantive comments on this PR, and it's ready to merge now.

Contributor

amenonsen commented Sep 10, 2015

@chrrrles @abadger I've rebased, fixed the import error, made the a-z\w change (and added a couple of Unicode unit tests), squashed down to two clean commits, and updated the PR. I have also pushed an additional commit to fix the broken integration test.

(BTW, I can't help but feel a bit pleased that the sum total of changes required to make the new code support unicode inventory hostnames was two lines.)

AFAIK, I've addressed all the substantive comments on this PR, and it's ready to merge now.

amenonsen added some commits Aug 31, 2015

Be systematic about parsing and validating hostnames and addresses
This adds a parse_address(pattern) utility function that returns
(host,port), and uses it wherever where we accept IPv4 and IPv6
addresses and hostnames (or host patterns): the inventory parser
the the add_host action plugin.

It also introduces a more extensive set of unit tests that supersedes
the old add_host unit tests (which didn't actually test add_host, but
only the parsing function).
Be stricter about parsing hostname labels
Labels must start with an alphanumeric character, may contain
alphanumeric characters or hyphens, but must not end with a hyphen.
We enforce those rules, but allow underscores wherever hyphens are
accepted, and allow alphanumeric ranges anywhere.

We relax the definition of "alphanumeric" to include Unicode characters
even though such inventory hostnames cannot be used in practice unless
an ansible_ssh_host is set for each of them.

We still don't enforce length restrictions—the fact that we have to
accept ranges makes it more complex, and it doesn't seem especially
worthwhile.
Fix broken integration test with unicode hostnames
1. The test did "name: '{{hostnames}}.{{item}}'" inside a with_sequence
   loop, which didn't do what was intended: it expanded hostnames into
   an array, appended ".1", and set name to the resulting string. This
   can be converted to a simple with_items loop.

2. Some of the entries in hostnames contained punctuation characters,
   which I see no reason to support in inventory hostnames anyway.

3. Once the add_host failures are fixed, the playbook later fails when
   the unicode hostnames are interpolated into debug output in ssh.py
   due to an encoding error. This is only one of the many places that
   may fail when using unicode inventory hostnames; we work around it
   by providing an ansible_ssh_host setting.

chrrrles added a commit that referenced this pull request Sep 11, 2015

Merge pull request #12165 from amenonsen/address-parsing
Hi @amenonsen - thanks for fixing up the hunting down the unicode bug and expanding test_addresses.  The code looks good, merging!-- Be systematic about parsing and validating hostnames and addresses

@chrrrles chrrrles merged commit ba7734b into ansible:devel Sep 11, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

amenonsen added a commit to amenonsen/ansible that referenced this pull request Sep 26, 2015

Update outdated comment
Since #12165 was merged, hostnames are properly validated.

@amenonsen amenonsen deleted the amenonsen:address-parsing branch Dec 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment