Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[catalog-next] on staging #2004

Merged
merged 27 commits into from
Aug 12, 2020
Merged

[catalog-next] on staging #2004

merged 27 commits into from
Aug 12, 2020

Conversation

adborden
Copy link
Contributor

@adborden adborden commented Aug 8, 2020

GSA/datagov-ckan-multi#296

This makes a few significant changes.

  • Refactor groups for catalog-next
  • Sandbox now uses a static inventory
  • ansible-inventory-diff for comparing inventory changes
  • debug.yml playbook to check individual variable resolution

catalog-next Ansible groups

The biggest change is the refactoring of catalog-next groups for the Ansible
inventory. I was having a lot of trouble reasoning about the different roles
within catalog/catalog-next, so catalog-next groups are very different than
catalog classic. I left catalog classic as-is.

Groups now follow a tree structure within an app (catalog-next) and then have
specializations and sub-specializations. Variable interhitence works with each
child group overriding some variables from the parent. Ansible isn't really
meant to work this way, so this feels kind of like a hack, but at least it's
predictable and easier to reason about.

$ ansible-inventory -i inventories/staging --graph catalog-next
@catalog-next:
  |--@catalog-next-web:
  |  |--@catalog-next-web-a:
  |  |  |--catalogweb1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-web-admin:
  |  |  |--catalogpubweb1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-web-b:
  |  |  |--catalogweb2d.dev-ocsit.bsp.gsa.gov
  |--@catalog-next-worker:
  |  |--@catalog-next-worker-main:
  |  |  |--catalogharvester1d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-worker-misc:
  |  |  |--catalogharvester2d.dev-ocsit.bsp.gsa.gov
  |  |--@catalog-next-worker-qa:

Sandbox static inventory

Sandbox was using a dynamic
inventory

to fetch hosts from the AWS API at runtime. Since changes to Sandbox are made
via Terraform, in theory, Ansible can dynamically pick up those changes without
any change to the hosts file. However in practice, we often had to tweak groups
in sandbox/hosts to match Terraform and when a conflict arose, it was painful
to resolve it.

Additionally, you need access to the AWS API to use ansible-inventory, which
I found annoying, and contributes to environment drift with the BSP
environments (we should be able to use the same development tools in all
environments).

This PR removes the dynamic inventory and uses a static hosts file like the
other environments. If a new host is added via Terraform, or a hostname
changes, we'll have to update inventories/sandbox/hosts. We can now lint the
sandbox inventory, like staging/produciton and this also enables us to use
ansible-inventory without AWS credentials and the tools/tricks described
below.

ansible-inventory-diff tool

One gotcha with Ansible is that it's easy to change a variable in a way that
has a much broader effect than intended. I've often used ansible-inventory --host $host to manually inspect variables. However, what I really want is a
way to diff the ansible-inventory output.

That is what ansible-inventory-diff hopes to be. This is a work in progress
and there are probably a few bugs to work out. ansible-inventory-diff will
generate what I call an "inventory manifest" from the current working directory
then generate a second inventory manifest from another git commit and then
diff those inventory manifests. The inventory manifest is just a giant JSON
output of all of our environments' ansible-inventory output, including all
variables and vault secrets.

$ bin/ansible-inventory-diff origin/develop

You'll get a diff against origin/develop in your favorite difftool.

Screen Shot 2020-08-07 at 7 04 55 PM

What else can we do with this? We can use this in CI to enforce that any
changes to the inventory have been explicitly reviewed by a developer, similar
to how we review the Ansible vault locally when reviewing PRs.

How? We could hash the inventory manifest and commit that hash to git. In CI,
we can compute the inventory hash and compare it with the known hash. If they
match, all good. If they don't, it means something in the inventory has changed
and a human should manually review the inventory manifest to make sure the
change was intented. Assuming the change is correct, they can update the
inventory manifest hash.

Anyway, this is pretty early, but I found this useful as-is for refactoring
inventory variables so I wanted to share.

debug.yml playbook

This playbook can be used in local develepment to check the resolution for a
single variable for a specific host in a specific inventory.

$ ansible-playbook debug.yml -i inventories/staging -e debug_variable=catalog_db_user --limit catalogharvester1d.dev-ocsit.bsp.gsa.gov

PLAY [Debug] ******************************************************************

TASK [debug variable] *********************************************************
ok: [catalogharvester1d.dev-ocsit.bsp.gsa.gov -> localhost] => {
    "msg": "catalog_db_user=ckan"
}

PLAY RECAP ********************************************************************
catalogharvester1d.dev-ocsit.bsp.gsa.gov : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Creates new "-next" ansible groups for catalog-next on staging and adds them to
the playbook. This enables catalog-next in staging and ensures configuration
from catalog classic don't get inherited accidentally.
Allows settings to be inherited to both harvester and web instances.
Create a group for both web and workers (harvesters) for catalog-next. This
allows us to share common settings between them in a single file.

Also adds a -a and -b web group to better assign variables for different web
partitions.
This is a bit of a hack. All group names (besides `all`) have the same variable
precedence. This means that if you define a variable `foo` in both catalog-admin and
catalog-web, the last one wins. We can work around this by using group names
that match the hierarchy when sorted alphabetically e.g. catalog-next ->
catalog-next-web -> catalog-next-web-admin.
- Remove catalog-next admin host catalogpubweb1p which shouldn't be enabled yet.
- Set newrelic_app_name as a host variable for catalog-admin
Originally the plan was to move pycsw to v2 with catalog, however with the
catalog-next rollout, that might not work as planned. Removing v2 hosts for now
and we'll come back to pycsw.
This is a work in progress but has been helpful for analyzing changes to the
Ansible inventory. This wraps git and ansible-inventory to generate a JSON
representation of the Ansible inventory at two different commits and then diff
them.

What you get is a diff of any changes to group assignments and variables, making
it much easier to spot when a variable is being inherited incorrectly.
While not in service, we still need to track, configure, and patch the
catalog-next hosts.
Dynamic hosts is neat, but it's environment drift from production/staging which
are handled with static hosts. By moving to a static hosts, we can use the same
tools to lint and validate the sandbox inventory that we use with
production/staging.
@adborden adborden force-pushed the feature/catalog-next-staging branch from 4d63a2b to d8d350f Compare August 10, 2020 21:38
The plugin isn't installed via requirements.txt, so it shouldn't be enabled.
Fixes catalog-next-web starting up in sandbox.
These vars should get cleaned up at some point. Probably the right name is
ckan_db_name.
Web instances have a read-only connection to the database and therefore cannot
write sessions to the database.

On the admin instance(s), we use the database store where sessions matter and we
have a read/write connection.
Waiting for BSP to install TLS certificates, but by disabling catalog-next, this
code can ship.
@adborden adborden force-pushed the feature/catalog-next-staging branch from e60cf63 to 3cfb1bf Compare August 10, 2020 21:55
@adborden adborden marked this pull request as ready for review August 10, 2020 21:56
@adborden adborden requested review from a team and woodt August 10, 2020 21:56
@adborden
Copy link
Contributor Author

adborden commented Aug 10, 2020

I think this is ready for review. We're still waiting on BSP to install the certificates, so I've set the catalog-next hosts to datagov_in_service: false which makes this PR merge-able. Once the TLS certs are installed, we can re-enable them in another PR.

Copy link
Contributor

@jbrown-xentity jbrown-xentity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major changes, but did a deep dive and it all looks good to me!

@adborden
Copy link
Contributor Author

👍 yes, it's kinda big. I'm happy to walk through this with anyone else if they'd like.

@@ -0,0 +1,32 @@
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: add a comment about inventory catalog-next vars won't override these.

Copy link
Contributor

@woodt woodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite reasonable and definitely an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants