Introduce a cluster option for workshops #891

termlen0 · 2020-05-27T22:02:32Z

SUMMARY

This PR introduces the option of running any of the workshops with Tower as a cluster.

Users will need to add the create_cluster: yes option to their vars file for this to work.

ISSUE TYPE

Feature Pull Request

COMPONENT NAME

provisioner

ADDITIONAL INFORMATION

This option will allow SAs to build even better demos around scaling/loadbalancing(RBAC).
Additionally, when provisioned for workshops, it will help highlight Tower scaling/clustering features to the customer.

CC: @IPvSean

provisioner/roles/manage_ec2_instances/tasks/provision.yml

provisioner/group_vars/all/vpc_rules.yml

cigamit · 2020-05-27T22:31:04Z

I made a few comments inline just from things I saw. I like this option, but a few things I would fix. I will run a few tests tonight to see what else I notice. Sean isn't a fan of loops for building VMs, so we may have to think outside the box on how to do them all at once, but tag each one separately for which node it is.

provisioner/group_vars/all/vpc_rules.yml

cloin · 2020-06-02T15:23:01Z

Fails in security workshop deploy and rhel-verify and I think that's fine:

[2020-05-29T13:36:37.658Z] TASK [manage_ec2_instances : provision workshop instances] *********************

[2020-05-29T13:36:37.658Z] ERROR! Unexpected Exception, this is probably a bug: 'NoneType' object has no attribute 'rfind'

[2020-05-29T13:36:37.658Z] to see the full traceback, use -vvv

[2020-05-29T14:06:09.263Z] TASK [Test access by exporting assets] *****************************************

[2020-05-29T14:06:09.263Z] skipping: [student2-node1]

[2020-05-29T14:06:09.263Z] skipping: [student2-node2]

[2020-05-29T14:06:09.263Z] skipping: [student2-node3]

[2020-05-29T14:06:09.531Z] skipping: [student1-node1]

[2020-05-29T14:06:09.531Z] skipping: [student1-node3]

[2020-05-29T14:06:09.531Z] skipping: [student1-node2]

[2020-05-29T14:06:10.468Z] fatal: [student2-ansible-1]: FAILED! => changed=false 

[2020-05-29T14:06:10.468Z]   assets: null

[2020-05-29T14:06:10.468Z]   message: |-

[2020-05-29T14:06:10.468Z]     There was a network error of some kind trying to connect to Tower.

[2020-05-29T14:06:10.468Z]   

[2020-05-29T14:06:10.468Z]     The most common  reason for this is a settings issue; is your "host" value in `tower-cli config` correct?

[2020-05-29T14:06:10.468Z]     Right now it is: "student2-1.tqe-rhel-tower370-PR-891-3.rhdemo.io".

[2020-05-29T14:06:10.468Z]   msg: Receive Failed

[2020-05-29T14:06:10.468Z] fatal: [student1-ansible-1]: FAILED! => changed=false 

[2020-05-29T14:06:10.468Z]   assets: null

[2020-05-29T14:06:10.468Z]   message: |-

[2020-05-29T14:06:10.468Z]     There was a network error of some kind trying to connect to Tower.

[2020-05-29T14:06:10.468Z]   

[2020-05-29T14:06:10.468Z]     The most common  reason for this is a settings issue; is your "host" value in `tower-cli config` correct?

[2020-05-29T14:06:10.468Z]     Right now it is: "student1-1.tqe-rhel-tower370-PR-891-3.rhdemo.io".

[2020-05-29T14:06:10.468Z]   msg: Receive Failed

Spredzy · 2020-06-02T20:16:56Z

recheck

cloin · 2020-06-05T01:01:45Z

Recheck

termlen0 · 2020-06-05T15:41:51Z

@cloin / @Spredzy Let me know what I can do to help move this PR forward.

cloin · 2020-06-08T21:21:06Z

The check failures don’t seem to be related to changes on this PR. @liquidat how do you feel about this PR? Can you please review?

liquidat

Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7.

provisioner/group_vars/all/vpc_rules.yml

provisioner/roles/control_node/templates/tower_cluster_install.j2

provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2

liquidat · 2020-06-08T21:31:12Z

@goetzrieger What do you think? Maybe we can start the advanced Tower lab right off of this RPR? Could simplify a lot of work!

termlen0 · 2020-06-08T22:30:25Z

I’m counting on our tests. I verified manually, the linux and network workshops.

…

Sent from my iPhone

On Jun 8, 2020, at 17:29, Roland Wolters ***@***.***> wrote: @liquidat requested changes on this pull request. Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7. In provisioner/group_vars/all/vpc_rules.yml: > @@ -60,6 +60,26 @@ workshops: from_port: 514 cidr_ip: 0.0.0.0/0 rule_desc: WinRM + - proto: tcp + to_port: 5432 + from_port: 5432 + cidr_ip: 0.0.0.0/0 + rule_desc: Cluster option DB port + - proto: tcp + to_port: 4369 + from_port: 4369 + cidr_ip: 0.0.0.0/0 + rule_desc: Cluster option RabbitMQ Do we still need the rules if we only use internal IPs? In provisioner/group_vars/all/vpc_rules.yml: > + rule_desc: Cluster option DB port + - proto: tcp + to_port: 4369 + from_port: 4369 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ + - proto: tcp + to_port: 25672 + from_port: 25672 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ + - proto: tcp + to_port: 5672 + from_port: 5672 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ As mentioned above, we don't have RabbitMQ anymore, so please remove the RabbitMQ rules. In provisioner/roles/control_node/templates/tower_cluster_install.j2: > + + + +pg_host='ansible-4' +pg_port='5432' + +pg_database='awx' +pg_username='awx' +pg_password='{{admin_password}}' + +rabbitmq_port=5672 +rabbitmq_vhost=tower +rabbitmq_username=tower +rabbitmq_password='{{admin_password}}' +rabbitmq_cookie=cookiemonster + We don't use RabbitMQ with Tower 3.7 anymore, can you please update this to Tower 3.7? In provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2: > @@ -21,6 +21,6 @@ {% for vm in ansible_node_facts.instances %} {% if 'student' + item == vm.tags.Student %} -{{ vm.public_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }} +{{ vm.private_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }} So, with this the nodes don't have public IPs anymore, am I right? We need confirmation for each workshop individually if the exercises really work with private IPs. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

termlen0 · 2020-06-08T22:32:28Z

As for Tower 3.7, these rules will allow workshops to be spun up with older versions. As an SA sometimes we might use this provisioner to simulate customer environments. This does not affect deployment of 3.7

…

Sent from my iPhone

On Jun 8, 2020, at 17:29, Roland Wolters ***@***.***> wrote: @liquidat requested changes on this pull request. Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7. In provisioner/group_vars/all/vpc_rules.yml: > @@ -60,6 +60,26 @@ workshops: from_port: 514 cidr_ip: 0.0.0.0/0 rule_desc: WinRM + - proto: tcp + to_port: 5432 + from_port: 5432 + cidr_ip: 0.0.0.0/0 + rule_desc: Cluster option DB port + - proto: tcp + to_port: 4369 + from_port: 4369 + cidr_ip: 0.0.0.0/0 + rule_desc: Cluster option RabbitMQ Do we still need the rules if we only use internal IPs? In provisioner/group_vars/all/vpc_rules.yml: > + rule_desc: Cluster option DB port + - proto: tcp + to_port: 4369 + from_port: 4369 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ + - proto: tcp + to_port: 25672 + from_port: 25672 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ + - proto: tcp + to_port: 5672 + from_port: 5672 + cidr_ip: 172.16.0.0/14 + rule_desc: Cluster option RabbitMQ As mentioned above, we don't have RabbitMQ anymore, so please remove the RabbitMQ rules. In provisioner/roles/control_node/templates/tower_cluster_install.j2: > + + + +pg_host='ansible-4' +pg_port='5432' + +pg_database='awx' +pg_username='awx' +pg_password='{{admin_password}}' + +rabbitmq_port=5672 +rabbitmq_vhost=tower +rabbitmq_username=tower +rabbitmq_password='{{admin_password}}' +rabbitmq_cookie=cookiemonster + We don't use RabbitMQ with Tower 3.7 anymore, can you please update this to Tower 3.7? In provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2: > @@ -21,6 +21,6 @@ {% for vm in ansible_node_facts.instances %} {% if 'student' + item == vm.tags.Student %} -{{ vm.public_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }} +{{ vm.private_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }} So, with this the nodes don't have public IPs anymore, am I right? We need confirmation for each workshop individually if the exercises really work with private IPs. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

goetzrieger · 2020-06-09T09:06:56Z

First for @termlen0 : This is something we really need. @cbolz and me built something like this for Summit to run our Advanced Tower lab on (advanced_tower branch). But we didn't find the time to get it into devel (not talking about master) proper so it would be a lot of work now (there where a lot of changes in the mean time...). But we need a clustered env in master to get the lab into RHPDS...

I just had the time for a brief look into what you have done but will try to test, maybe @cbolz can have a look, too.

It would be great to get this working and into devel/master. The Advanced Tower lab itself doesn't have a lot of requirements regarding the lab env but a three-node cluster, but we need to check.

liquidat · 2020-06-09T19:45:42Z

As for Tower 3.7, these rules will allow workshops to be spun up with older versions. As an SA sometimes we might use this provisioner to simulate customer environments.

Maybe, but it would be news to me that this is a desired feature of the workshops?!

And while it does not effect the actual deployment, it introduces an entire set of legacy code: rabbitmq configuration in the inventory, various firewall rules, etc.

That is not a deal breaker for me, but I would very much prefer to focus on up2date Tower releases. We also do not cater to people anymore who want to use RHEL 7, or other older Tower or Ansible releases.

goetzrieger · 2020-06-10T16:42:10Z

I did some tests today with the RHEL workshop:

The Epel $releasever fix #892 fix is missing in your fork so failed in Install EPEL
The workbench info on the landing page has (the same) entries for all 4 control nodes... :)
Apart from that looks good so far, but I didn't test anything but deploy yet

termlen0 · 2020-06-11T13:11:41Z

Thanks for this. I’ll refactor next week to include #892 and update the landing page template.

…

Sent from my iPhone

On Jun 10, 2020, at 11:42, Götz Rieger ***@***.***> wrote: I did some tests today with the RHEL workshop: The #892 fix is missing in your fork so failed in Install EPEL The workbench info on the landing page has (the same) entries for all 4 control nodes, guess the template has to be changed... :) Apart from that looks good so far, but I didn't test anything but deploy yet — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

goetzrieger · 2020-06-11T13:55:09Z

Some more thoughts:

To avoid creating instances in a loop the most pragmatic way might be to just include a cluster_instances.yml or a single_instance.yml (or so) file in manage_ec2_instances
As far as I can see you still have all control nodes in the control_nodes group and then you separate tasks by running them either on ansible-1 or ansible-2 to ansible-4 (instead of using groups) in provision_lab.yml. IMHO this is asking for trouble, the group control_nodes could accidentally be used somewhere else, renaming nodes is hard etc. Having a dedicated group for the cluster nodes might be cleaner. Disclaimer: we did this for the Summit env and it gave us loads of fun/pain.
This is more cosmetic, but I would like names like tower-1, tower-2 better so students don't get confused (too easily ;).

I know it's hard to get clustering into the Workshops as unintrusively as possible. I'm happy to help if I can.

… node

termlen0 · 2020-06-18T10:38:53Z

Per review from @goetzrieger , I've re-based to accommodate PR #892. I've also updated the landing page J2 to only display the main control node details. Tested with RHEL workshop.

termlen0 · 2020-06-18T10:46:18Z

Some more thoughts:

* To avoid creating instances in a loop the most pragmatic way might be to just include a cluster_instances.yml or a single_instance.yml (or so) file in manage_ec2_instances

This is what is currently being done. See:

- name: Create the control clusters
  include_tasks: cluster_instances.yml
  loop: "{{ range(1, control_nodes|default(1) + 1 ) | list }}"
  loop_control:
    loop_var: sequence

IMHO this is asking for trouble, the group control_nodes could accidentally be used somewhere else, renaming nodes is hard etc. Having a dedicated group for the cluster nodes might be cleaner. Disclaimer: we did this for the Summit env and it gave us loads of fun/pain.

While I agree that it might be a cleaner approach, I fail to see how using the control group for anything else can cause an issue. In the spirit of getting a cluster option in place, I suggest we table this for a future PR.

* This is more cosmetic, but I would like names like tower-1, tower-2 better so students don't get confused (too easily ;).

I'm personally not opinionated one way or the other about the naming. Will let others comment on it. But again, not a show-stopper for this PR, IMO.

cbolz · 2020-06-19T11:23:57Z

I like the approach - it's less intrusive then what we hacked together for Summit.

Personally, I would probably split out changes like switching from private to public IPs and upgrading to 3.7 into separate PR's, but I guess that's something up for debate and is just different ways of working.

I also agree with @liquidat that the purpose of the workshop is to have the latest ansible releases and not to carry technical debt to support all sorts of old releases. IMHO that's out of scope for this project.

goetzrieger · 2020-07-15T06:31:01Z

@cloin @liquidat ?

We're just going through the pain of using our old cluster provisioner for Summit Open House. It'd be great to have this setup going before the next event... :)

liquidat · 2020-07-15T11:37:31Z

@termlen0 This is shaping up really nicely, I love it! One small thing missing: we need a sample vars file or at least entry for each new option we bring in, can you add this?

Also, while I would say @goetzrieger is a bit too cautious with the thoughts around control_nodes, we should at leas make sure that the other labs are working on it. So we need to fix these lines where stuff is install on control_nodes:

workshops/provisioner/windows.yml

Line 17 in 2ff691f

hosts: control_nodes
workshops/provisioner/roles/workshop_attendance/templates/workshop.sql.j2

Line 19 in 2ff691f

{% for host in groups['control_nodes'] %}
workshops/provisioner/security.yml

Line 111 in 2ff691f

hosts: control_nodes

Number one and three can just be rewritten to the ansible-1, but I am not sure about number 2, maybe @cloin can help?

goetzrieger · 2020-07-17T08:35:22Z

Ajay, I know this is a bit selfish, but while you are at it:

Could we have four managed nodes, ideally node1/node2, then isonode and remotenode?

Then this environment would line up with the Advanced Tower lab [1] perfectly. If this is asked too much I'll give it a shot later.

[1] https://people.redhat.com/grieger/summit2020_labs/ansible-tower-advanced/8-isolated-nodes/

termlen0 · 2020-07-17T14:02:26Z

Ajay, I know this is a bit selfish, but while you are at it:

Could we have four managed nodes, ideally node1/node2, then isonode and remotenode?

Then this environment would line up with the Advanced Tower lab [1] perfectly. If this is asked too much I'll give it a shot later.

[1] https://people.redhat.com/grieger/summit2020_labs/ansible-tower-advanced/8-isolated-nodes/
I'll try. For now, path of least resistance for me to get this PR into devel is to try and address @liquidat 's 3 items. I'm going to try and get that in first and once the PR is merged into devel, start a new feature branch to refactor it per your suggestion. Hope that works. :)
Cheers.

termlen0 · 2020-07-17T15:57:37Z

@termlen0 This is shaping up really nicely, I love it! One small thing missing: we need a sample vars file or at least entry for each new option we bring in, can you add this?

Also, while I would say @goetzrieger is a bit too cautious with the thoughts around control_nodes, we should at leas make sure that the other labs are working on it. So we need to fix these lines where stuff is install on control_nodes:
* https://github.com/ansible/workshops/blob/2ff691f0471a0f3f2d9a4753dc395510b9e98471/provisioner/windows.yml#L17

* https://github.com/ansible/workshops/blob/2ff691f0471a0f3f2d9a4753dc395510b9e98471/provisioner/roles/workshop_attendance/templates/workshop.sql.j2#L19

* https://github.com/ansible/workshops/blob/2ff691f0471a0f3f2d9a4753dc395510b9e98471/provisioner/security.yml#L111
Number one and three can just be rewritten to the ansible-1, but I am not sure about number 2, maybe @cloin can help?

I've made changes to address all 3 items. I've tested against the windows workshop. As for the sample vars, I'll update all existing samplevars files with the create_cluster boolean var and set it to "no" by default.

liquidat · 2020-08-05T15:27:07Z

recheck

liquidat · 2020-08-11T12:11:11Z

recheck

liquidat · 2020-08-11T13:26:25Z

We have builds failing.

First, RHEL verify is failing:

[2020-08-07T18:08:43.603Z] TASK [Test access by exporting assets] *****************************************
[2020-08-07T18:08:43.882Z] skipping: [student1-node1]
[2020-08-07T18:08:43.882Z] skipping: [student1-node2]
[2020-08-07T18:08:43.882Z] skipping: [student1-node3]
[2020-08-07T18:08:43.882Z] skipping: [student2-node1]
[2020-08-07T18:08:43.882Z] skipping: [student2-node2]
[2020-08-07T18:08:44.142Z] skipping: [student2-node3]
[2020-08-07T18:08:45.510Z] FAILED - RETRYING: Test access by exporting assets (60 retries left).
[...]
[2020-08-07T18:12:40.901Z] FAILED - RETRYING: Test access by exporting assets (1 retries left).
[2020-08-07T18:12:44.177Z] fatal: [student2-ansible-1]: FAILED! => changed=false 
[2020-08-07T18:12:44.177Z]   assets: null
[2020-08-07T18:12:44.177Z]   attempts: 60
[2020-08-07T18:12:44.177Z]   message: |-
[2020-08-07T18:12:44.177Z]     There was a network error of some kind trying to connect to Tower.
[2020-08-07T18:12:44.177Z]   
[2020-08-07T18:12:44.177Z]     The most common  reason for this is a settings issue; is your "host" value in `tower-cli config` correct?
[2020-08-07T18:12:44.177Z]     Right now it is: "student2-1.tqe-rhel-tower371-PR-891-32.rhdemo.io".
[2020-08-07T18:12:44.177Z]   msg: Receive Failed
[2020-08-07T18:12:44.433Z] fatal: [student1-ansible-1]: FAILED! => changed=false 
[2020-08-07T18:12:44.433Z]   assets: null
[2020-08-07T18:12:44.433Z]   attempts: 60
[2020-08-07T18:12:44.433Z]   message: |-
[2020-08-07T18:12:44.433Z]     There was a network error of some kind trying to connect to Tower.
[2020-08-07T18:12:44.433Z]   
[2020-08-07T18:12:44.433Z]     The most common  reason for this is a settings issue; is your "host" value in `tower-cli config` correct?
[2020-08-07T18:12:44.433Z]     Right now it is: "student1-1.tqe-rhel-tower371-PR-891-32.rhdemo.io".
[2020-08-07T18:12:44.433Z]   msg: Receive Failed

I think we need to modify this line in the testing script:

workshops/provisioner/tests/rhel_verify.yml

Line 21 in 916bc62

    
           tower_host: "{{ inventory_hostname|regex_replace('-ansible', '') }}.{{ workshop_name }}.rhdemo.io"

Second, security deployment fails:

[2020-08-07T17:38:26.714Z] TASK [manage_ec2_instances : provision workshop instances] *********************
[2020-08-07T17:38:26.714Z] ERROR! Unexpected Exception, this is probably a bug: expected str, bytes or os.PathLike object, not NoneType

This would be this line:

workshops/provisioner/roles/manage_ec2_instances/tasks/provision.yml

Line 87 in 916bc62

include_tasks: 'instances/instances_{{ workshop_type }}.yml'

Honestly, I have no idea what is going on. I will try to provision on my own from this branch and see if I can replicate the problem.

goetzrieger · 2020-08-11T16:47:38Z

Easiest fix for RHEL verify fail IMO (if we want to stay with Ajay's naming convention):

file: provisioner/tests/rhel_verify.yml

tower_host: "{{ inventory_hostname|regex_replace('-ansible-1', '') }}.{{ workshop_name }}.rhdemo.io"
[...]
when: '"ansible-1" in inventory_hostname'

Tested with cluster and non-cluster RHEL WS.

@termlen0

liquidat · 2020-08-17T13:12:29Z

@termlen0 Can you please include @goetzrieger 's patch and also rebase? After the rebase I can track down the new bug, but right now without a rebase it is rather hard.

termlen0 · 2020-08-17T13:39:20Z

Will do. Last 2 weeks were tough. Will try and get this in this week. Thanks!

…

Sent from my iPhone

On Aug 17, 2020, at 09:12, Roland Wolters ***@***.***> wrote: @termlen0 Can you please include @goetzrieger 's patch and also rebase? After the rebase I can track down the new bug, but right now without a rebase it is rather hard. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

liquidat · 2020-08-17T17:25:45Z

Build still fails here, this time with more information:

TASK [manage_ec2_instances : provision workshop instances] ***********************************************************************************************************************************
task path: /home/rwolters/gits/github/termlen0-linklight/provisioner/roles/manage_ec2_instances/tasks/provision.yml:47
ERROR! Unexpected Exception, this is probably a bug: expected str, bytes or os.PathLike object, not NoneType
the full traceback was:

Traceback (most recent call last):
  File "/home/rwolters/development/venv_ansible_2.9/bin/ansible-playbook", line 123, in <module>
    exit_code = cli.run()
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/cli/playbook.py", line 127, in run
    results = pbex.run()
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/executor/playbook_executor.py", line 169, in run
    result = self._tqm.run(play=play)
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/executor/task_queue_manager.py", line 241, in run
    play_return = strategy.run(iterator, play_context)
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/plugins/strategy/linear.py", line 359, in run
    new_blocks = self._load_included_file(included_file, iterator=iterator)
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/plugins/strategy/__init__.py", line 890, in _load_included_file
    block_list = load_list_of_blocks(
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/playbook/helpers.py", line 70, in load_list_of_blocks
    Block.load(
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/playbook/block.py", line 94, in load
    return b.load_data(data, variable_manager=variable_manager, loader=loader)
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/playbook/base.py", line 235, in load_data
    self._attributes[target_name] = method(name, ds[name])
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/playbook/block.py", line 122, in _load_block
    return load_list_of_tasks(
  File "/home/rwolters/development/venv_ansible_2.9/lib64/python3.8/site-packages/ansible/playbook/helpers.py", line 191, in load_list_of_tasks
    parent_include_dir = os.path.dirname(templar.template(parent_include.args.get('_raw_params')))
  File "/usr/lib64/python3.8/posixpath.py", line 152, in dirname
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

liquidat · 2020-08-17T22:13:07Z

So, this looks like an evil Ansible bug. For me, this change made it working:

diff --git a/provisioner/provision_lab.yml b/provisioner/provision_lab.yml
index 2f86777f..50487323 100644
--- a/provisioner/provision_lab.yml
+++ b/provisioner/provision_lab.yml
@@ -18,15 +18,14 @@
   connection: local
   become: false
   gather_facts: false
-  tasks:
+  pre_tasks:
     - name: Cluster nodes
       set_fact:
         control_nodes: 4
       when: create_cluster is defined and create_cluster|bool
 
-    - name: Manage EC2
-      include_role:
-        name: manage_ec2_instances
+  roles:
+    - manage_ec2_instances
 
 - name: wait for all nodes to have SSH reachability
   hosts: "managed_nodes:control_nodes:attendance"

@termlen0 You mentioned in chat you don't see this behavior. I tested with ansible version 2.9.9 and 2.9.12, both times I see the same problem.
Any reason why we could not adopt the change mentioned above? Besides the fact that it is ugly?

liquidat · 2020-08-19T01:10:15Z

@termlen0 Security deployment is mostly fine. The RHLE verify script still fails as mentioned above, @goetzrieger had a patch, did you include it?

There is an error with the security verify as well, but I'd like a recheck to be sure that this is not a fluke.

termlen0 · 2020-08-19T01:43:04Z

@termlen0 Security deployment is mostly fine. The RHLE verify script still fails as mentioned above, @goetzrieger had a patch, did you include it?

There is an error with the security verify as well, but I'd like a recheck to be sure that this is not a fluke.

Just committed.

liquidat · 2020-08-19T15:11:47Z

@termlen0 We missed something in the security workshop: the checkpoint stuff isn't even called and thus the test fails. Can you please add this patch?

diff --git a/provisioner/roles/cp_setup/tasks/main.yml b/provisioner/roles/cp_setup/tasks/main.yml
index 60b2105f..1a68a42a 100644
--- a/provisioner/roles/cp_setup/tasks/main.yml
+++ b/provisioner/roles/cp_setup/tasks/main.yml
@@ -1,7 +1,7 @@
 ---
 - name: login, get SID
   uri:
-    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_mgmt')]['private_ip'] }}/web_api/login"
+    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_mgmt')]['private_ip'] }}/web_api/login"
     method: POST
     body:
       user: admin
@@ -15,7 +15,7 @@
 
 - name: Add NGFW to MGMT
   uri:
-    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_mgmt')]['private_ip'] }}/web_api/add-simple-gateway"
+    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_mgmt')]['private_ip'] }}/web_api/add-simple-gateway"
     validate_certs: false
     method: POST
     headers:
@@ -24,7 +24,7 @@
     body_format: json
     body:
       name: myngfw
-      ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_gw')]['private_ip'] }}"
+      ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_gw')]['private_ip'] }}"
       one-time-password: admin123
       firewall: true
       version: R80.30
@@ -34,7 +34,7 @@
           anti-spoofing-settings:
             action: prevent
           name: "eth0"
-          ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_gw')]['private_ip'] }}"
+          ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_gw')]['private_ip'] }}"
           network-mask: "255.255.0.0"
           ipv4-mask-length: 16
           security-zone: false
@@ -45,7 +45,7 @@
           anti-spoofing-settings:
             action: prevent
           name: "eth1"
-          ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_gw')]['private_ip2'] }}"
+          ip-address: "{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_gw')]['private_ip2'] }}"
           network-mask: "255.255.0.0"
           ipv4-mask-length: 16
           security-zone: false
@@ -53,7 +53,7 @@
 
 - name: Publish
   uri:
-    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible', 'checkpoint_mgmt')]['private_ip'] }}/web_api/publish"
+    url: "https://{{ hostvars[inventory_hostname|regex_replace('ansible-1', 'checkpoint_mgmt')]['private_ip'] }}/web_api/publish"
     validate_certs: false
     method: POST
     headers:
@@ -67,7 +67,7 @@
   ec2_instance_info:
     region: "{{ ec2_region }}"
     filters:
-      "tag:Name": "{{ inventory_hostname|regex_replace('ansible', 'checkpoint_gw') }}"
+      "tag:Name": "{{ inventory_hostname|regex_replace('ansible-1', 'checkpoint_gw') }}"
       "instance-state-name": running
   register: gw_inst
   delegate_to: localhost
diff --git a/provisioner/security.yml b/provisioner/security.yml
index 53dcbc38..c3e4b25b 100644
--- a/provisioner/security.yml
+++ b/provisioner/security.yml
@@ -108,6 +108,6 @@
     - role: cp_fix_mgmt
 
 - name: SETUP CHECKPOINT ENVIRONMENT
-  hosts: ansible-1
+  hosts: '*ansible-1'
   roles:
     - role: cp_setup

With this patch my tests are all good.

Introduce a cluster option for workshops

f6de7c6

cigamit reviewed May 27, 2020

View reviewed changes

provisioner/roles/manage_ec2_instances/tasks/provision.yml Outdated Show resolved Hide resolved

cigamit reviewed May 27, 2020

View reviewed changes

provisioner/group_vars/all/vpc_rules.yml Outdated Show resolved Hide resolved

cigamit reviewed May 27, 2020

View reviewed changes

provisioner/group_vars/all/vpc_rules.yml Outdated Show resolved Hide resolved

termlen0 added 3 commits May 28, 2020 11:48

Update yaml file to yml

98e544b

Update SG rules to only permit internal IP addresses

3050889

Update /etc/hosts to use private ips for cluster nodes

86ccfaf

Fix linting errors

fb3200e

cloin mentioned this pull request Jun 8, 2020

add retries in rhel_verify.yml #909

Merged

liquidat suggested changes Jun 8, 2020

View reviewed changes

termlen0 added 2 commits June 16, 2020 10:21

Merge remote-tracking branch 'upstream/devel' into cluster_feature

bea5281

Fix webpage template to only display the main primary cluster control…

9ee788f

… node

Merge remote-tracking branch 'upstream/devel' into cluster_feature

617f3aa

termlen0 requested a review from liquidat July 7, 2020 21:11

Merge remote-tracking branch 'upstream/devel' into cluster_feature

3396d00

updates based on Roland's feedback

782b2fb

termlen0 added 2 commits July 17, 2020 12:06

Updated sample_workshops with create_cluster var

cd65784

fix tox-linter

f52b680

termlen0 mentioned this pull request Jul 27, 2020

Placeholder to update control node to ansible-1 from ansible network-automation/toolkit#9

Closed

termlen0 added 3 commits August 17, 2020 10:53

Merge remote-tracking branch 'upstream/devel' into cluster_feature

9e31b38

update rhel_verify for cluster PR

882d662

fix typo

221465f

using pre_tasks to support security workshop

90505a2

patch to fix rhel_verify per @goetzrieger

7f46f5e

termlen0 added 2 commits August 19, 2020 11:27

Merge remote-tracking branch 'upstream/devel' into cluster_feature

e3ac21c

security patches

4d42f57

liquidat merged commit 66a6f07 into ansible:devel Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a cluster option for workshops #891

Introduce a cluster option for workshops #891

termlen0 commented May 27, 2020

cigamit commented May 27, 2020

cloin commented Jun 2, 2020

Spredzy commented Jun 2, 2020

cloin commented Jun 5, 2020

termlen0 commented Jun 5, 2020

cloin commented Jun 8, 2020 •

edited

liquidat left a comment

liquidat commented Jun 8, 2020

termlen0 commented Jun 8, 2020 via email

termlen0 commented Jun 8, 2020 via email

goetzrieger commented Jun 9, 2020

liquidat commented Jun 9, 2020

goetzrieger commented Jun 10, 2020 •

edited

termlen0 commented Jun 11, 2020 via email

goetzrieger commented Jun 11, 2020 •

edited

termlen0 commented Jun 18, 2020

termlen0 commented Jun 18, 2020

cbolz commented Jun 19, 2020

goetzrieger commented Jul 15, 2020 •

edited

liquidat commented Jul 15, 2020

goetzrieger commented Jul 17, 2020

termlen0 commented Jul 17, 2020

termlen0 commented Jul 17, 2020

liquidat commented Aug 5, 2020

liquidat commented Aug 11, 2020

liquidat commented Aug 11, 2020

goetzrieger commented Aug 11, 2020 •

edited

liquidat commented Aug 17, 2020

termlen0 commented Aug 17, 2020 via email

liquidat commented Aug 17, 2020

liquidat commented Aug 17, 2020

liquidat commented Aug 19, 2020

termlen0 commented Aug 19, 2020

liquidat commented Aug 19, 2020

Introduce a cluster option for workshops #891

Introduce a cluster option for workshops #891

Conversation

termlen0 commented May 27, 2020

SUMMARY

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

cigamit commented May 27, 2020

cloin commented Jun 2, 2020

Spredzy commented Jun 2, 2020

cloin commented Jun 5, 2020

termlen0 commented Jun 5, 2020

cloin commented Jun 8, 2020 • edited

liquidat left a comment

Choose a reason for hiding this comment

liquidat commented Jun 8, 2020

termlen0 commented Jun 8, 2020 via email

termlen0 commented Jun 8, 2020 via email

goetzrieger commented Jun 9, 2020

liquidat commented Jun 9, 2020

goetzrieger commented Jun 10, 2020 • edited

termlen0 commented Jun 11, 2020 via email

goetzrieger commented Jun 11, 2020 • edited

termlen0 commented Jun 18, 2020

termlen0 commented Jun 18, 2020

cbolz commented Jun 19, 2020

goetzrieger commented Jul 15, 2020 • edited

liquidat commented Jul 15, 2020

goetzrieger commented Jul 17, 2020

termlen0 commented Jul 17, 2020

termlen0 commented Jul 17, 2020

liquidat commented Aug 5, 2020

liquidat commented Aug 11, 2020

liquidat commented Aug 11, 2020

goetzrieger commented Aug 11, 2020 • edited

liquidat commented Aug 17, 2020

termlen0 commented Aug 17, 2020 via email

liquidat commented Aug 17, 2020

liquidat commented Aug 17, 2020

liquidat commented Aug 19, 2020

termlen0 commented Aug 19, 2020

liquidat commented Aug 19, 2020

cloin commented Jun 8, 2020 •

edited

goetzrieger commented Jun 10, 2020 •

edited

goetzrieger commented Jun 11, 2020 •

edited

goetzrieger commented Jul 15, 2020 •

edited

goetzrieger commented Aug 11, 2020 •

edited