Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ipaddr() with ips_in_ranges() #4339

Merged
merged 2 commits into from Sep 27, 2019
Merged

Replace ipaddr() with ips_in_ranges() #4339

merged 2 commits into from Sep 27, 2019

Conversation

hjensas
Copy link
Contributor

@hjensas hjensas commented Aug 14, 2019

Replace ipaddr() with ips_in_ranges()

This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

NOTE:

  • This change also updates the RPM spec file so that
    the filter_plugins are copied to the buildroot.
  • Add a comment include 'ansible' in tests/requirements.txt
    so that netaddr is installed as requiement for the
    ceph-ansible-pr-syntax-check.

[1] commit: d67230b
[2] commit: 20e4852

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283

Closes: #4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås hjensas@redhat.com

@hjensas hjensas force-pushed the issue#4333 branch 4 times, most recently from 3129a78 to 0503b29 Compare August 19, 2019 14:27
@hjensas
Copy link
Contributor Author

hjensas commented Aug 19, 2019

I'm not sure what to do with this CI error:
https://2.jenkins.ceph.com/job/ceph-ansible-prs-dev-ubuntu-non_container-all_daemons/578/consoleFull

It's failing because netaddr is'nt on the master. How can that be when netaddr is in both requirement.txt and tests/requirements.txt?

@guits
Copy link
Collaborator

guits commented Aug 20, 2019

setting monitor_address_block at the inventory host-level should allow deploying monitors across different subnets.

@hjensas
Copy link
Contributor Author

hjensas commented Aug 21, 2019

setting monitor_address_block at the inventory host-level should allow deploying monitors across different subnets.

I'm investigating the possibility to set this at the inventory host-level in TripleO. I am checking with peers who have more experience with ansible services in TripleO.

AFICT TripleO use host-level vars very sparsely. They are all set (in tripleo_common/inventory.py) outside of tripleo-heat-templates where service configuration is maintained. All host-level vars used are vary node specific, such as the hostnames on each network of a node, ips, uniqe server id's etc. See [1] for the nitty gritty details. There is no per-service vars defined at all.

[1] https://github.com/openstack/tripleo-common/blob/61f60b2b735d76be34be0336405164b71e89ee2a/tripleo_common/inventory.py#L219-L275

@fultonj
Copy link
Contributor

fultonj commented Aug 28, 2019

@guits said:

setting monitor_address_block at the inventory host-level should allow deploying monitors across different subnets.

The ansible inventory is generated by TripleO with a script. We don't insert parameters there and host_vars would be more appropriate.

However, we can't use host_vars for the ceph-ansible run because they are only [1] populated by the node-specific overrides [2]; i.e. we don't want to risk two tasks modifying the same host_vars file. Requiring someone to always use node-specific overrides for multiple monitor address blocks seems a regression however. Because ceph-ansible used to support passing the list like the one that produced the bug report [3], it seems a problem to now ask customers to use the complicated node-specific overrrides procedure [2]. Mapping the appropriate mon_address_block to each host is complicated.

The ceph.conf already support a comma separated list of CIDRs for the public and cluster network options. As ceph-ansible used to support this and we have working code here (though yes it has a new filter) it still seems a better solution to me. Do we know what broke this from working earlier?

Bug 1740283 might be a blocker for downstream Stein.

@gfidente any ideas?

[1] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo-ceph-uuid/tasks/prepare.yml#L48

[2] http://tripleo.org/install/advanced_deployment/node_specific_hieradata.html

[3] https://bugs.launchpad.net/tripleo/+bug/1840030

@hjensas
Copy link
Contributor Author

hjensas commented Sep 4, 2019

@guits @fultonj @gfidente

I have been testing a WIP patch to fix this in tripleo[1], however now the deployment fails in a different place. With the above patch the deployment now fail with:

"TASK [ceph-facts : set grafana_server_addr fact] *******************************",
"task path: /usr/share/ceph-ansible/roles/ceph-facts/tasks/facts.yml:293",
"Wednesday 04 September 2019  15:19:21 +0000 (0:00:00.857)       0:01:36.505 *** ",
"fatal: [overcloud-controller-0]: FAILED! => ",
"  msg: 'ipaddr: unknown filter type: 172.120.3.0/24,172.117.3.0/24,172.118.3.0/24,172.119.3.0/24'",

This happens since this change[2] started to use the ansible ipaddr module to get the first ip address in 'public_network'. I.e we can no longer pass a comma separated list of IP's as the public_network.

This is a regression! We have documented using a comma separated list for the public_network and cluster_network in Red Hat's OSP documentation starting with OSP13. See the CephAnsibleExtraConfig in example[3], step 3 in the procedure.

We used to be able to do:

CephAnsibleExtraConfig:
  public_network: '172.16.0.0/24,172.16.1.0/24,172.16.2.0/24'
  cluster_network: '172.17.0.0/24,172.17.1.0/24,172.17.2.0/24'

Doing so, now fails with the ipaddr: unknown filter type error seen above.
The filter intoduced in this patch can be used by set grafana_server_addr fact as well.

Update: Change [4] also uses ipaddr(public_network) which would fail.

[1] https://review.opendev.org/#/c/679060/
[2] d67230b
[3] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/spine_leaf_networking/index#assigning-routes-for-roles
[4] 20e4852

@hjensas hjensas changed the title Allow multiple monitor address ranges Replace ipaddr() with ips_in_ranges() Sep 4, 2019
@fultonj
Copy link
Contributor

fultonj commented Sep 5, 2019

Talked to @dsavineau and we would probably need to update the following to deal with multiple IPs:

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-infra/tasks/configure_firewall.yml#L25
https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/templates/ceph-mon.service.j2#L39

Also, we should add a test for this in the CI.

Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with some TripleO guys, we will need some changes in that PR.

Out of the box multiple values in public_network doesn't work.
There's at least two cases where we need update the code

  • firewall configuration as John mentioned. Because the source parameter of the firewalld module doesn't support multiple values. TripleO doesn't use this part of the code, that's why you don't see the faillure [1].
  • the public_network is also used by the mon container and I need to take a look at that part of the code [2][3]

On the ansible filter side, I would prefer to have the filter directory located under the plugins/filter directory and reference it in the ansible.cfg file via the filter_plugins key. This will avoid to modify the spec file.

Also it would be great to have tests for this filter.

[1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-infra/tasks/configure_firewall.yml
[2] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/templates/ceph-mon.service.j2#L39
[3] https://github.com/ceph/ceph-container/blob/master/src/daemon/start_mon.sh#L83-L119

@@ -5,8 +5,8 @@ pytest-xdist==1.28.0
pytest>=4.4,<4.5
notario>=0.0.13
ansible>=2.8,<2.9
netaddr
netaddr # For the ansible filter: ips_in_ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to update this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the CI won't install netaddr. Because a grep is used here:
https://github.com/ceph/ceph-build/blob/08b569d032e8d67643fdbbb33fb6006b3f8d635e/ceph-ansible-pr-syntax-check/build/build#L11

Adding a comment containing the "ansible" string is more like a workaround to ensure that CI job passes. (Adding the "ansible" string include that line in the grep result and thus the netaddr requirement is installed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we install the deps via tox https://github.com/ceph/ceph-ansible/blob/master/tox.ini#L418 otherwise the ipaddr filter would have never work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the ceph-ansible-pr-syntax-check job, it installs deps via tox like this:
"$VENV"/pip install -r <(grep ansible "$WORKSPACE"/ceph-ansible/tests/requirements.txt)

Effectively passing only tests/requirements.txt#L7 containing ansible>=2.8,<2.9.

The ceph-ansible-pr-syntax-check never exercice any code that use the ipaddr filter, and since ansible does conditional loading it will work until the filter is actually used.

I do however agree that me adding that comment in requirements.txt is a workaround. I will remove it and then we can solve the problem in ceph/ceph-build instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the CI log showing that dependencies are not properly installed:
https://2.jenkins.ceph.com/job/ceph-ansible-pr-syntax-check/1204/console

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed this PR ceph/ceph-build#1391 which I belive should fix the problem with netaddr not being pulled in as a dependency in the ceph-ansible-pr-syntax-check job.

Copy link
Contributor Author

@hjensas hjensas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can move the plugin and update the config instead. I will try to figure out a test as well.

I was'nt aware of the firewall issue. I would suggest addressing that in a separate commit?

Regarding the mon container, it looks like the code[1] already expect a comma separated list as it replaces whitespace? (-e CEPH_PUBLIC_NETWORK={{ public_network | regex_replace(' ', '') }}) The other pice of code[2] would be good to investigate, but looking at history most of it is 2 years old. If TripleO is using that code then it did'nt fail during OSP13/OSP14 testing.

[1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/templates/ceph-mon.service.j2#L39
[2] https://github.com/ceph/ceph-container/blob/master/src/daemon/start_mon.sh#L83-L119

@@ -5,8 +5,8 @@ pytest-xdist==1.28.0
pytest>=4.4,<4.5
notario>=0.0.13
ansible>=2.8,<2.9
netaddr
netaddr # For the ansible filter: ips_in_ranges
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the CI won't install netaddr. Because a grep is used here:
https://github.com/ceph/ceph-build/blob/08b569d032e8d67643fdbbb33fb6006b3f8d635e/ceph-ansible-pr-syntax-check/build/build#L11

Adding a comment containing the "ansible" string is more like a workaround to ensure that CI job passes. (Adding the "ansible" string include that line in the grep result and thus the netaddr requirement is installed.)

@dsavineau
Copy link
Contributor

I was'nt aware of the firewall issue. I would suggest addressing that in a separate commit?

Yes we can.

Regarding the mon container, it looks like the code[1] already expect a comma separated list as it replaces whitespace? (-e CEPH_PUBLIC_NETWORK={{ public_network | regex_replace(' ', '') }}) The other pice of code[2] would be good to investigate, but looking at history most of it is 2 years old. If TripleO is using that code then it did'nt fail during OSP13/OSP14 testing.

Yes this seems only to be useful for network auto detect which isn't used in ceph-ansible.

@hjensas hjensas force-pushed the issue#4333 branch 2 times, most recently from dc639fa to e9c5689 Compare September 6, 2019 17:41
hjensas added a commit to hjensas/ceph-build that referenced this pull request Sep 6, 2019
This change removes the grep for 'ansible' when isntalling
requirements in ceph-ansible-pr-syntax-check. The grep
cause onlye the ansible requirement to be installed, ignoring
the other packages. This causes an issue in PR[1] which
introduce a filter plugin that depend on the netaddr package.

[1] ceph/ceph-ansible#4339

Closes: ceph#1390

Signed-off-by: Harald Jensås <hjensas@redhat.com>
hjensas added a commit to hjensas/ceph-build that referenced this pull request Sep 9, 2019
This change removes the grep for 'ansible' when installing
requirements in ceph-ansible-pr-syntax-check. The grep
cause onlye the ansible requirement to be installed, ignoring
the other packages. This causes an issue in PR[1] which
introduce a filter plugin that depend on the netaddr package.

[1] ceph/ceph-ansible#4339

Closes: ceph#1390

Signed-off-by: Harald Jensås <hjensas@redhat.com>
@dsavineau
Copy link
Contributor

jenkins test pipeline

@dsavineau
Copy link
Contributor

Could you modify the .travis.yml file and add the plugins directory in the pytest command ? otherwise the filter tests won't be executed.
Also trying to run the tests on my side and wasn't able until I created the __init__.py file in the same directory.

$ pytest -vvvv plugins/
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.7.3, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- /home/ds/Workspace/.venv/pytest/bin/python3
cachedir: .pytest_cache
rootdir: /home/ds/Documents/repo/ceph/ceph-ansible
collected 0 items / 1 errors                                                                                                                                                                                                                 

=================================================================================================================== ERRORS ===================================================================================================================
_________________________________________________________________________________________ ERROR collecting plugins/filter/test_ipaddrs_in_ranges.py __________________________________________________________________________________________
ImportError while importing test module '/home/ds/Documents/repo/ceph/ceph-ansible/plugins/filter/test_ipaddrs_in_ranges.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/ds/Workspace/.venv/pytest/lib/python3.7/site-packages/_pytest/python.py:501: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
/home/ds/Workspace/.venv/pytest/lib/python3.7/site-packages/py/_path/local.py:701: in pyimport
    __import__(modname)
/home/ds/Workspace/.venv/pytest/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:140: in exec_module
    exec(co, module.__dict__)
plugins/filter/test_ipaddrs_in_ranges.py:1: in <module>
    from . import ipaddrs_in_ranges
E   ImportError: attempted relative import with no known parent package
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================================================== 1 error in 0.10s ==============================================================================================================

Also looking at the CI failure I think we need to modify all tox files and add the ANSIBLE_FILTER_PLUGINS env variable otherwise the playbooks located in the infrastructure-playbook directory won't find the custom filter. You can take a look at the ANSIBLE_ACTION_PLUGINS configuration as an exemple.

@dsavineau
Copy link
Contributor

dsavineau commented Sep 9, 2019

Also trying to run the tests on my side and wasn't able until I created the __init__.py file in the same directory.

Sorry I didn't see that was already handled in the PR.

@hjensas
Copy link
Contributor Author

hjensas commented Sep 12, 2019

I don't understand why but the update scenarios are failing on the ipaddrs_in_ranges filter [1]

This is during the initial deployment using the stable-3.2 branch (which doesn't contain the filter or any references) so this filter wouldn't be present at this point (only during the upgrade part).
I'm investigating this atm but nothing relevant. I thought it could be fix by [2] but without success.

[1] https://2.jenkins.ceph.com/job/ceph-ansible-prs-dev-centos-container-update/206/consoleFull#8768837504574abcf-086e-4053-9702-6f4b34086ae8
[2] a874002

I think this happens because we set ANSIBLE_FILTER_PLUGINS = {toxinidir}/plugins/filter in tox-update.ini. This makes the plugin in master available for the update step.

I think we need to set these: ANSIBLE_ACTION_PLUGINS, ANSIBLE_CALLBACK_PLUGINS and ANSIBLE_FILTER_PLUGINS as the test is run first pointing to the stable-3.2 ceckout in {envdir}/tmp/ceph-ansible/ then reset them to {toxinidir} before running {toxinidir}/infrastructure-playbooks/rolling_update.yml.

I will test this locally and update the patch.

@dsavineau
Copy link
Contributor

dsavineau commented Sep 12, 2019

Looking quickly on this, I think we can just remove ANSIBLE_ACTION_PLUGINS, ANSIBLE_CALLBACK_PLUGINS and ANSIBLE_FILTER_PLUGINS env variables from the tox-update.ini file (and probably in the other files too).
We already set those keys in the ansible.cfg file and it seems to work on my side. But a874002 was required to point to the right ansible config file when deploying on stable-3.2 branch.
I think the tox env variables override the values defined in ansible.cfg

@hjensas
Copy link
Contributor Author

hjensas commented Sep 12, 2019

Looking quickly on this, I think we can just remove ANSIBLE_ACTION_PLUGINS, ANSIBLE_CALLBACK_PLUGINS and ANSIBLE_FILTER_PLUGINS env variables from the tox-update.ini file (and probably in the other files too).

This makes sense, trying that now ...

@dsavineau
Copy link
Contributor

jenkins test dev-centos-container-update

@dsavineau
Copy link
Contributor

Also I noticed that the filter tests aren't working with py27 even if the travis job reports a success

=============================== warnings summary ===============================
plugins/filter/test_ipaddrs_in_ranges.py:4
  /home/travis/build/ceph/ceph-ansible/plugins/filter/test_ipaddrs_in_ranges.py:4: PytestWarning: cannot collect test class 'TestIpaddrsInRanges' because it has a __init__ constructor
    class TestIpaddrsInRanges(object):

https://travis-ci.org/ceph/ceph-ansible/jobs/583575339

@hjensas
Copy link
Contributor Author

hjensas commented Sep 12, 2019

jenkins test dev-centos-container-update

@hjensas
Copy link
Contributor Author

hjensas commented Sep 15, 2019

jenkins test pipeline

@guits guits added the DNM Do NOT merge label Sep 16, 2019
@hjensas
Copy link
Contributor Author

hjensas commented Sep 18, 2019

@guits Why is the DNM label back on this? Are you working on an alternative solution?

I have worked with @dsavineau and some related fixes have already been merged. (#4433 and ceph/ceph-build#1391).

We attempted a fix in TripleO, see [1], [2] and [3]. It would be possible for the monitor_address_range. However when we tested that fix ceph-ansible failed on the next fact: ceph-facts : set grafana_server_addr fact with basically the same error. grafana_server_addr fact is derived from the public_network. The value for public_network must be the comma separated list of IP subnets in ceph.conf. I.e setting public_network to a single IP subnet at the inventory host-level will not work.

See #4339 (comment) which have more details.

[1] https://review.opendev.org/679410
[2] https://review.opendev.org/679412
[3] https://review.opendev.org/679060

@hjensas
Copy link
Contributor Author

hjensas commented Sep 18, 2019

jenkins test pipeline

@hjensas
Copy link
Contributor Author

hjensas commented Sep 18, 2019

jenkins test dev-centos-container-update

@hjensas
Copy link
Contributor Author

hjensas commented Sep 18, 2019

jenkins test dev-centos-container-update

@hjensas
Copy link
Contributor Author

hjensas commented Sep 19, 2019

jenkins test pipeline

2 similar comments
@hjensas
Copy link
Contributor Author

hjensas commented Sep 19, 2019

jenkins test pipeline

@hjensas
Copy link
Contributor Author

hjensas commented Sep 19, 2019

jenkins test pipeline

@fultonj
Copy link
Contributor

fultonj commented Sep 26, 2019

My understanding is that @guits wants to review this further with @dsavineau and then he'll probably request other changes or remove the DNM. He just doesn't want it to merge without his review.

@guits
Copy link
Collaborator

guits commented Sep 26, 2019

jenkins test pipeline

This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

[1] commit: d67230b
[2] commit: 20e4852

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283

Closes: ceph#4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås <hjensas@redhat.com>
@guits
Copy link
Collaborator

guits commented Sep 26, 2019

My understanding is that @guits wants to review this further with @dsavineau and then he'll probably request other changes or remove the DNM. He just doesn't want it to merge without his review.

yup, that's correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants