[catalog-next] on staging #2004

adborden · 2020-08-08T02:11:55Z

This makes a few significant changes.

Refactor groups for catalog-next
Sandbox now uses a static inventory
ansible-inventory-diff for comparing inventory changes
debug.yml playbook to check individual variable resolution

catalog-next Ansible groups

The biggest change is the refactoring of catalog-next groups for the Ansible
inventory. I was having a lot of trouble reasoning about the different roles
within catalog/catalog-next, so catalog-next groups are very different than
catalog classic. I left catalog classic as-is.

Groups now follow a tree structure within an app (catalog-next) and then have
specializations and sub-specializations. Variable interhitence works with each
child group overriding some variables from the parent. Ansible isn't really
meant to work this way, so this feels kind of like a hack, but at least it's
predictable and easier to reason about.

$ ansible-inventory -i inventories/staging --graph catalog-next
@catalog-next:
  |--@catalog-next-web:
  |  |--@catalog-next-web-a:
  |  |  |--catalogweb1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-web-admin:
  |  |  |--catalogpubweb1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-web-b:
  |  |  |--catalogweb2d.dev-ocsit.bsp.gsa.gov
  |--@catalog-next-worker:
  |  |--@catalog-next-worker-main:
  |  |  |--catalogharvester1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-worker-misc:
  |  |  |--catalogharvester2d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-worker-qa:

Sandbox static inventory

Sandbox was using a dynamic
inventory
to fetch hosts from the AWS API at runtime. Since changes to Sandbox are made
via Terraform, in theory, Ansible can dynamically pick up those changes without
any change to the hosts file. However in practice, we often had to tweak groups
in sandbox/hosts to match Terraform and when a conflict arose, it was painful
to resolve it.

Additionally, you need access to the AWS API to use ansible-inventory, which
I found annoying, and contributes to environment drift with the BSP
environments (we should be able to use the same development tools in all
environments).

This PR removes the dynamic inventory and uses a static hosts file like the
other environments. If a new host is added via Terraform, or a hostname
changes, we'll have to update inventories/sandbox/hosts. We can now lint the
sandbox inventory, like staging/produciton and this also enables us to use
ansible-inventory without AWS credentials and the tools/tricks described
below.

ansible-inventory-diff tool

One gotcha with Ansible is that it's easy to change a variable in a way that
has a much broader effect than intended. I've often used ansible-inventory --host $host to manually inspect variables. However, what I really want is a
way to diff the ansible-inventory output.

That is what ansible-inventory-diff hopes to be. This is a work in progress
and there are probably a few bugs to work out. ansible-inventory-diff will
generate what I call an "inventory manifest" from the current working directory
then generate a second inventory manifest from another git commit and then
diff those inventory manifests. The inventory manifest is just a giant JSON
output of all of our environments' ansible-inventory output, including all
variables and vault secrets.

$ bin/ansible-inventory-diff origin/develop

You'll get a diff against origin/develop in your favorite difftool.

What else can we do with this? We can use this in CI to enforce that any
changes to the inventory have been explicitly reviewed by a developer, similar
to how we review the Ansible vault locally when reviewing PRs.

How? We could hash the inventory manifest and commit that hash to git. In CI,
we can compute the inventory hash and compare it with the known hash. If they
match, all good. If they don't, it means something in the inventory has changed
and a human should manually review the inventory manifest to make sure the
change was intented. Assuming the change is correct, they can update the
inventory manifest hash.

Anyway, this is pretty early, but I found this useful as-is for refactoring
inventory variables so I wanted to share.

debug.yml playbook

This playbook can be used in local develepment to check the resolution for a
single variable for a specific host in a specific inventory.

$ ansible-playbook debug.yml -i inventories/staging -e debug_variable=catalog_db_user --limit catalogharvester1d.dev-ocsit.bsp.gsa.gov

PLAY [Debug] ******************************************************************

TASK [debug variable] *********************************************************
ok: [catalogharvester1d.dev-ocsit.bsp.gsa.gov -> localhost] => {
    "msg": "catalog_db_user=ckan"
}

PLAY RECAP ********************************************************************
catalogharvester1d.dev-ocsit.bsp.gsa.gov : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Creates new "-next" ansible groups for catalog-next on staging and adds them to the playbook. This enables catalog-next in staging and ensures configuration from catalog classic don't get inherited accidentally.

Allows settings to be inherited to both harvester and web instances.

Create a group for both web and workers (harvesters) for catalog-next. This allows us to share common settings between them in a single file. Also adds a -a and -b web group to better assign variables for different web partitions.

This is a bit of a hack. All group names (besides `all`) have the same variable precedence. This means that if you define a variable `foo` in both catalog-admin and catalog-web, the last one wins. We can work around this by using group names that match the hierarchy when sorted alphabetically e.g. catalog-next -> catalog-next-web -> catalog-next-web-admin.

- Remove catalog-next admin host catalogpubweb1p which shouldn't be enabled yet. - Set newrelic_app_name as a host variable for catalog-admin

Originally the plan was to move pycsw to v2 with catalog, however with the catalog-next rollout, that might not work as planned. Removing v2 hosts for now and we'll come back to pycsw.

This is a work in progress but has been helpful for analyzing changes to the Ansible inventory. This wraps git and ansible-inventory to generate a JSON representation of the Ansible inventory at two different commits and then diff them. What you get is a diff of any changes to group assignments and variables, making it much easier to spot when a variable is being inherited incorrectly.

While not in service, we still need to track, configure, and patch the catalog-next hosts.

Dynamic hosts is neat, but it's environment drift from production/staging which are handled with static hosts. By moving to a static hosts, we can use the same tools to lint and validate the sandbox inventory that we use with production/staging.

The plugin isn't installed via requirements.txt, so it shouldn't be enabled. Fixes catalog-next-web starting up in sandbox.

These vars should get cleaned up at some point. Probably the right name is ckan_db_name.

#1730

Web instances have a read-only connection to the database and therefore cannot write sessions to the database. On the admin instance(s), we use the database store where sessions matter and we have a read/write connection.

Waiting for BSP to install TLS certificates, but by disabling catalog-next, this code can ship.

adborden · 2020-08-10T21:57:58Z

I think this is ready for review. We're still waiting on BSP to install the certificates, so I've set the catalog-next hosts to datagov_in_service: false which makes this PR merge-able. Once the TLS certs are installed, we can re-enable them in another PR.

jbrown-xentity

Major changes, but did a deep dive and it all looks good to me!

adborden · 2020-08-10T22:03:02Z

👍 yes, it's kinda big. I'm happy to walk through this with anyone else if they'd like.

adborden · 2020-08-11T20:53:34Z

ansible/group_vars/catalog-next/vars.yml

@@ -0,0 +1,32 @@
+---


note to self: add a comment about inventory catalog-next vars won't override these.

woodt

This looks quite reasonable and definitely an improvement.

adborden added 17 commits August 10, 2020 14:34

[catalog-next] add staging database

d316a13

[catalog-next] update playbooks for staging

4ea6827

Creates new "-next" ansible groups for catalog-next on staging and adds them to the playbook. This enables catalog-next in staging and ensures configuration from catalog classic don't get inherited accidentally.

[catalog-next] add catalog-next ansible group

28e0cff

Allows settings to be inherited to both harvester and web instances.

[catalog-next] consolidate settings in sandbox

851866b

[catalog-next] hoist up ckan plugin list

bbadaa5

[solr-next] enable solr-next hosts

587b248

[catalog-next] redis configuration

9086fde

[catalog-next] create new catalog-next group

d28d088

Create a group for both web and workers (harvesters) for catalog-next. This allows us to share common settings between them in a single file. Also adds a -a and -b web group to better assign variables for different web partitions.

[catalog-next] enable hosts

39157c9

[catalog] fix admin newrelic name

5aa983a

- Remove catalog-next admin host catalogpubweb1p which shouldn't be enabled yet. - Set newrelic_app_name as a host variable for catalog-admin

[pycsw] remove pycsw v2 hosts

2e1d504

Originally the plan was to move pycsw to v2 with catalog, however with the catalog-next rollout, that might not work as planned. Removing v2 hosts for now and we'll come back to pycsw.

[catalog-next] rename harvesters -> workers

8a6de9b

[catalog-next] add production hosts

4e97c49

While not in service, we still need to track, configure, and patch the catalog-next hosts.

[catalog-next] add firewall policy

8d692ea

adborden force-pushed the feature/catalog-next-staging branch from 4d63a2b to d8d350f Compare August 10, 2020 21:38

adborden added 10 commits August 10, 2020 14:52

[catalog-next] add admin service url

0607e0d

[catalog-next] remove googleanalyticsbasic plugin

ca4f7c8

The plugin isn't installed via requirements.txt, so it shouldn't be enabled. Fixes catalog-next-web starting up in sandbox.

[catalog-next] set legacy variable names

e4aec11

These vars should get cleaned up at some point. Probably the right name is ckan_db_name.

Debug variable playbook

8e3a59d

[catalog-next] disable saml2 authentication

49e7b47

[catalog-next] disable user creation by default

2eba39e

[catalog-next] disable TLS to redis

8a95726

#1730

[catalog-next] set in-memory session store

87d2dae

Web instances have a read-only connection to the database and therefore cannot write sessions to the database. On the admin instance(s), we use the database store where sessions matter and we have a read/write connection.

[catalog-next] fix db admin username

dce3442

[catalog-next] temporarily disable staging

3cfb1bf

Waiting for BSP to install TLS certificates, but by disabling catalog-next, this code can ship.

adborden force-pushed the feature/catalog-next-staging branch from e60cf63 to 3cfb1bf Compare August 10, 2020 21:55

adborden marked this pull request as ready for review August 10, 2020 21:56

adborden requested review from a team and woodt August 10, 2020 21:56

jbrown-xentity approved these changes Aug 10, 2020

View reviewed changes

adborden commented Aug 11, 2020

View reviewed changes

woodt approved these changes Aug 12, 2020

View reviewed changes

jbrown-xentity mentioned this pull request Aug 12, 2020

Inventory-next for Sandbox #2020

Merged

mogul assigned adborden Aug 12, 2020

adborden merged commit 3f5fba1 into develop Aug 12, 2020

adborden deleted the feature/catalog-next-staging branch August 12, 2020 18:04

adborden mentioned this pull request Aug 12, 2020

Simplify the Ansible groups GSA/datagov-infrastructure-live#81

Closed

woodt mentioned this pull request Aug 18, 2020

update catalog.data.gov (catalog-next) README to reflect ansible changes GSA/catalog.data.gov#104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[catalog-next] on staging #2004

[catalog-next] on staging #2004

adborden commented Aug 8, 2020 •

edited

Loading

adborden commented Aug 10, 2020 •

edited

Loading

jbrown-xentity left a comment

adborden commented Aug 10, 2020

adborden Aug 11, 2020

woodt left a comment

[catalog-next] on staging #2004

[catalog-next] on staging #2004

Conversation

adborden commented Aug 8, 2020 • edited Loading

catalog-next Ansible groups

Sandbox static inventory

ansible-inventory-diff tool

debug.yml playbook

adborden commented Aug 10, 2020 • edited Loading

jbrown-xentity left a comment

Choose a reason for hiding this comment

adborden commented Aug 10, 2020

adborden Aug 11, 2020

Choose a reason for hiding this comment

woodt left a comment

Choose a reason for hiding this comment

adborden commented Aug 8, 2020 •

edited

Loading

adborden commented Aug 10, 2020 •

edited

Loading