-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a cluster option for workshops #891
Conversation
I made a few comments inline just from things I saw. I like this option, but a few things I would fix. I will run a few tests tonight to see what else I notice. Sean isn't a fan of loops for building VMs, so we may have to think outside the box on how to do them all at once, but tag each one separately for which node it is. |
Fails in security workshop deploy and rhel-verify and I think that's fine:
|
recheck |
Recheck |
The check failures don’t seem to be related to changes on this PR. @liquidat how do you feel about this PR? Can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7.
provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2
Show resolved
Hide resolved
@goetzrieger What do you think? Maybe we can start the advanced Tower lab right off of this RPR? Could simplify a lot of work! |
I’m counting on our tests. I verified manually, the linux and network workshops.
…Sent from my iPhone
On Jun 8, 2020, at 17:29, Roland Wolters ***@***.***> wrote:
@liquidat requested changes on this pull request.
Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7.
In provisioner/group_vars/all/vpc_rules.yml:
> @@ -60,6 +60,26 @@ workshops:
from_port: 514
cidr_ip: 0.0.0.0/0
rule_desc: WinRM
+ - proto: tcp
+ to_port: 5432
+ from_port: 5432
+ cidr_ip: 0.0.0.0/0
+ rule_desc: Cluster option DB port
+ - proto: tcp
+ to_port: 4369
+ from_port: 4369
+ cidr_ip: 0.0.0.0/0
+ rule_desc: Cluster option RabbitMQ
Do we still need the rules if we only use internal IPs?
In provisioner/group_vars/all/vpc_rules.yml:
> + rule_desc: Cluster option DB port
+ - proto: tcp
+ to_port: 4369
+ from_port: 4369
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
+ - proto: tcp
+ to_port: 25672
+ from_port: 25672
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
+ - proto: tcp
+ to_port: 5672
+ from_port: 5672
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
As mentioned above, we don't have RabbitMQ anymore, so please remove the RabbitMQ rules.
In provisioner/roles/control_node/templates/tower_cluster_install.j2:
> +
+
+
+pg_host='ansible-4'
+pg_port='5432'
+
+pg_database='awx'
+pg_username='awx'
+pg_password='{{admin_password}}'
+
+rabbitmq_port=5672
+rabbitmq_vhost=tower
+rabbitmq_username=tower
+rabbitmq_password='{{admin_password}}'
+rabbitmq_cookie=cookiemonster
+
We don't use RabbitMQ with Tower 3.7 anymore, can you please update this to Tower 3.7?
In provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2:
> @@ -21,6 +21,6 @@
{% for vm in ansible_node_facts.instances %}
{% if 'student' + item == vm.tags.Student %}
-{{ vm.public_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }}
+{{ vm.private_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }}
So, with this the nodes don't have public IPs anymore, am I right? We need confirmation for each workshop individually if the exercises really work with private IPs.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
As for Tower 3.7, these rules will allow workshops to be spun up with older versions. As an SA sometimes we might use this provisioner to simulate customer environments. This does not affect deployment of 3.7
…Sent from my iPhone
On Jun 8, 2020, at 17:29, Roland Wolters ***@***.***> wrote:
@liquidat requested changes on this pull request.
Please check if an update to Tower 3.7 is possible. It seems to be a missed opportunity if we do not start with 3.7 and have to do updates again for 3.7.
In provisioner/group_vars/all/vpc_rules.yml:
> @@ -60,6 +60,26 @@ workshops:
from_port: 514
cidr_ip: 0.0.0.0/0
rule_desc: WinRM
+ - proto: tcp
+ to_port: 5432
+ from_port: 5432
+ cidr_ip: 0.0.0.0/0
+ rule_desc: Cluster option DB port
+ - proto: tcp
+ to_port: 4369
+ from_port: 4369
+ cidr_ip: 0.0.0.0/0
+ rule_desc: Cluster option RabbitMQ
Do we still need the rules if we only use internal IPs?
In provisioner/group_vars/all/vpc_rules.yml:
> + rule_desc: Cluster option DB port
+ - proto: tcp
+ to_port: 4369
+ from_port: 4369
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
+ - proto: tcp
+ to_port: 25672
+ from_port: 25672
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
+ - proto: tcp
+ to_port: 5672
+ from_port: 5672
+ cidr_ip: 172.16.0.0/14
+ rule_desc: Cluster option RabbitMQ
As mentioned above, we don't have RabbitMQ anymore, so please remove the RabbitMQ rules.
In provisioner/roles/control_node/templates/tower_cluster_install.j2:
> +
+
+
+pg_host='ansible-4'
+pg_port='5432'
+
+pg_database='awx'
+pg_username='awx'
+pg_password='{{admin_password}}'
+
+rabbitmq_port=5672
+rabbitmq_vhost=tower
+rabbitmq_username=tower
+rabbitmq_password='{{admin_password}}'
+rabbitmq_cookie=cookiemonster
+
We don't use RabbitMQ with Tower 3.7 anymore, can you please update this to Tower 3.7?
In provisioner/roles/manage_ec2_instances/templates/etchosts/etchosts_rhel.j2:
> @@ -21,6 +21,6 @@
{% for vm in ansible_node_facts.instances %}
{% if 'student' + item == vm.tags.Student %}
-{{ vm.public_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }}
+{{ vm.private_ip_address }} {{vm.tags.Student}}.{{ec2_name_prefix|lower}}.{{workshop_dns_zone}} {{ vm.tags.short_name }}
So, with this the nodes don't have public IPs anymore, am I right? We need confirmation for each workshop individually if the exercises really work with private IPs.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
First for @termlen0 : This is something we really need. @cbolz and me built something like this for Summit to run our Advanced Tower lab on (advanced_tower branch). But we didn't find the time to get it into devel (not talking about master) proper so it would be a lot of work now (there where a lot of changes in the mean time...). But we need a clustered env in master to get the lab into RHPDS... I just had the time for a brief look into what you have done but will try to test, maybe @cbolz can have a look, too. It would be great to get this working and into devel/master. The Advanced Tower lab itself doesn't have a lot of requirements regarding the lab env but a three-node cluster, but we need to check. |
Maybe, but it would be news to me that this is a desired feature of the workshops?! And while it does not effect the actual deployment, it introduces an entire set of legacy code: rabbitmq configuration in the inventory, various firewall rules, etc. That is not a deal breaker for me, but I would very much prefer to focus on up2date Tower releases. We also do not cater to people anymore who want to use RHEL 7, or other older Tower or Ansible releases. |
I did some tests today with the RHEL workshop:
|
Thanks for this. I’ll refactor next week to include #892 and update the landing page template.
…Sent from my iPhone
On Jun 10, 2020, at 11:42, Götz Rieger ***@***.***> wrote:
I did some tests today with the RHEL workshop:
The #892 fix is missing in your fork so failed in Install EPEL
The workbench info on the landing page has (the same) entries for all 4 control nodes, guess the template has to be changed... :)
Apart from that looks good so far, but I didn't test anything but deploy yet
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Some more thoughts:
I know it's hard to get clustering into the Workshops as unintrusively as possible. I'm happy to help if I can. |
Per review from @goetzrieger , I've re-based to accommodate PR #892. I've also updated the landing page J2 to only display the main control node details. Tested with RHEL workshop. |
This is what is currently being done. See:
While I agree that it might be a cleaner approach, I fail to see how using the control group for anything else can cause an issue. In the spirit of getting a cluster option in place, I suggest we table this for a future PR.
I'm personally not opinionated one way or the other about the naming. Will let others comment on it. But again, not a show-stopper for this PR, IMO. |
I like the approach - it's less intrusive then what we hacked together for Summit. Personally, I would probably split out changes like switching from private to public IPs and upgrading to 3.7 into separate PR's, but I guess that's something up for debate and is just different ways of working. I also agree with @liquidat that the purpose of the workshop is to have the latest ansible releases and not to carry technical debt to support all sorts of old releases. IMHO that's out of scope for this project. |
@termlen0 This is shaping up really nicely, I love it! One small thing missing: we need a sample vars file or at least entry for each new option we bring in, can you add this? Also, while I would say @goetzrieger is a bit too cautious with the thoughts around control_nodes, we should at leas make sure that the other labs are working on it. So we need to fix these lines where stuff is install on control_nodes:
Number one and three can just be rewritten to the ansible-1, but I am not sure about number 2, maybe @cloin can help? |
Ajay, I know this is a bit selfish, but while you are at it: Could we have four managed nodes, ideally node1/node2, then isonode and remotenode? Then this environment would line up with the Advanced Tower lab [1] perfectly. If this is asked too much I'll give it a shot later. [1] https://people.redhat.com/grieger/summit2020_labs/ansible-tower-advanced/8-isolated-nodes/ |
|
I've made changes to address all 3 items. I've tested against the windows workshop. As for the sample vars, I'll update all existing samplevars files with the create_cluster boolean var and set it to "no" by default. |
recheck |
1 similar comment
recheck |
We have builds failing. First, RHEL verify is failing:
I think we need to modify this line in the testing script:
Second, security deployment fails:
This would be this line:
Honestly, I have no idea what is going on. I will try to provision on my own from this branch and see if I can replicate the problem. |
Easiest fix for RHEL verify fail IMO (if we want to stay with Ajay's naming convention): file: provisioner/tests/rhel_verify.yml
Tested with cluster and non-cluster RHEL WS. |
@termlen0 Can you please include @goetzrieger 's patch and also rebase? After the rebase I can track down the new bug, but right now without a rebase it is rather hard. |
Will do. Last 2 weeks were tough. Will try and get this in this week. Thanks!
…Sent from my iPhone
On Aug 17, 2020, at 09:12, Roland Wolters ***@***.***> wrote:
@termlen0 Can you please include @goetzrieger 's patch and also rebase? After the rebase I can track down the new bug, but right now without a rebase it is rather hard.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Build still fails here, this time with more information:
|
So, this looks like an evil Ansible bug. For me, this change made it working:
@termlen0 You mentioned in chat you don't see this behavior. I tested with ansible version 2.9.9 and 2.9.12, both times I see the same problem. |
@termlen0 Security deployment is mostly fine. The RHLE verify script still fails as mentioned above, @goetzrieger had a patch, did you include it? There is an error with the security verify as well, but I'd like a recheck to be sure that this is not a fluke. |
Just committed. |
@termlen0 We missed something in the security workshop: the checkpoint stuff isn't even called and thus the test fails. Can you please add this patch?
With this patch my tests are all good. |
SUMMARY
This PR introduces the option of running any of the workshops with Tower as a cluster.
create_cluster: yes
option to their vars file for this to work.ISSUE TYPE
COMPONENT NAME
ADDITIONAL INFORMATION
This option will allow SAs to build even better demos around scaling/loadbalancing(RBAC).
Additionally, when provisioned for workshops, it will help highlight Tower scaling/clustering features to the customer.
CC: @IPvSean