Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on how to add a new column to our Fastly logs and Athena #2222

Open
wants to merge 1 commit into
base: master
from

Conversation

@issyl0
Copy link
Member

issyl0 commented Jan 10, 2020

  • We did this recently and it went wrong, so let's document how to do it correctly for the next people!
@issyl0 issyl0 requested a review from kevindew Jan 10, 2020
@karlbaker02 karlbaker02 temporarily deployed to govuk-developer-docs-f-pr-2222 Jan 10, 2020 Inactive
@issyl0 issyl0 force-pushed the adding-a-new-column-to-fastly-logs branch from 58566b9 to 0ec1a80 Jan 10, 2020
@karlbaker02 karlbaker02 temporarily deployed to govuk-developer-docs-f-pr-2222 Jan 10, 2020 Inactive
Copy link
Member

kevindew left a comment

Thanks for putting this together Issy 👍

A concern I have is that we're ending up with two different pages with information about CDN logs (this one and https://docs.publishing.service.gov.uk/manual/query-cdn-logs.html) where we should consolidate to one. I'd suggest we use this logging page as an introduction and link off to the more thorough page for the in-depth information.

These steps look good and accurate (though I have been through them with a fine tooth comb), where I think we could use some more information on what to do is for the process around making deploys and verifying changes work.

For example:

  • The first step is probably to put the work into terraform, which does the job of both defining the column for AWS glue/athena and defining exactly what will change in Fastly logs, dev should raise a PR.
  • The next step should be to determine that the proposed syntax for Fastly works and should be tested in a non production environment (I did most of my testing sending logs to the AWS test environment) - this should be done by checking Fastly writing this changed data to a bucket and that the JSON records of each line are valid JSON (since all hell breaks loose if we have a minor JSON error)
  • When we're happy with that then we're good to plan and apply terraform for staging environment
  • We should make the changes to Fastly Staging and wait until the data comes through (can take up to 30 mins from change being done) and may require running the AWS Glue Crawler for Athena to recognise it.
  • Once we're happy with that proceed to production where after deploy we'll also check that the data is queryable.
@issyl0

This comment has been minimized.

Copy link
Member Author

issyl0 commented Jan 13, 2020

Agreed. There's a load of duplication between these two pages - they both show different example queries, for example. I'll write up a card for consolidating them and put it in Platform Health's backlog. 😂

As for the rest of the changes here, I'll re-arrange the doc to suggest Terraform and non-Prod environments first. Thanks!

@issyl0

This comment has been minimized.

Copy link
Member Author

issyl0 commented Jan 14, 2020

I'm almost there on re-writing these docs. What's the AWS Glue Crawler? I can't see any existing references to it.

EDIT: Found it and force-pushed the updates.

@issyl0 issyl0 force-pushed the adding-a-new-column-to-fastly-logs branch from 0ec1a80 to a0f7512 Jan 14, 2020
@karlbaker02 karlbaker02 temporarily deployed to govuk-developer-docs-f-pr-2222 Jan 14, 2020 Inactive
@issyl0 issyl0 force-pushed the adding-a-new-column-to-fastly-logs branch from a0f7512 to f22649a Jan 14, 2020
Co-authored-by: Bevan Loon <bevan.loon@digital.cabinet-office.gov.uk>
@kevindew

This comment has been minimized.

Copy link
Member

kevindew commented Jan 14, 2020

I'm almost there on re-writing these docs. What's the AWS Glue Crawler? I can't see any existing references to it.

It's a job that runs regularly (I think every 4 hours) that verifies if the files in s3 meet the schema defined for the table and identifying any new partitions, which it uses to decide whether a particular s3 file matches the schema and whether it needs to either create new tables / delete old ones (or other operations that may cause scary results)

There's a bunch of docs on it: https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html but you can learn a good amount about it by looking at the outcome of a run (https://eu-west-1.console.aws.amazon.com/glue/home?region=eu-west-1#catalog:tab=crawlers while assuming a GOV.UK role)

Copy link
Member

kevindew left a comment

This is looking great Issy thanks for working on this. I've added a few comments. Just a heads up that I'm on leave now until Monday but I'd be happy for someone else to approve it.

Adding a new field to the CDN logs is a half manual, half automated
process and is [tracked as tech debt](https://trello.com/c/7pAvfM8R/167).

You should do this in the Staging environment first and wait until you see the

This comment has been minimized.

Copy link
@kevindew

kevindew Jan 15, 2020

Member

It's also in integration so we probably suggest that approach first. Though it's worth noting that bouncer logs only exists in prod 😨.

you will manually put into the Fastly web UI:

1. Edit the [`infra-fastly-logs` Terraform](https://github.com/alphagov/govuk-aws/blob/master/terraform/projects/infra-fastly-logs/main.tf).
1. Find the `aws_glue_catalog_table` resource for the Fastly logs you want to add a column to (`govuk_www`, `govuk_assets` or `govuk_bouncer`).

This comment has been minimized.

Copy link
@kevindew

kevindew Jan 15, 2020

Member

s/govuk_bouncer/bouncer/


Both these sets of steps must be done! Check the S3 bucket and
query Athena to see the added column and confirm that there's data for
it. It can take up to half an hour for Athena to recognise that there have

This comment has been minimized.

Copy link
@kevindew

kevindew Jan 15, 2020

Member

I think you may have got this half an hour from me from something different I was referring too. It can take up to half an hour for Fastly to send us through a log file with the changed data (we've set it to send them to us every 15 minutes but they seem to often come 15 minutes after the time they stopped logging).

For Athena to recognise the change in table that could potentially take up to 4 hours as that is how often the crawler worker runs. I don't know off the top of my head what happens though at the point of you running the terraform to add the new column. It might be that it is instantaneous and finds that in any data that has it or that it has to wait for the crawler to run which would be within 4 hours.

Sorry for vagueness I've not actually added a new column myself to know what problems it may cause. Simplest thing may be to just suggest that someone runs the crawler worker once they know Fastly is sending the right data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.