Update Terraform configuration to deploy staging `vw_pin_sale` view #55

jeancochrane · 2023-10-20T17:15:33Z

In working on #53, @wagnerlmichael discovered that the testing version of script won't work as written because it pulls from the production vw_pin_sale view, which itself pulls from the production sale.flag table instead of the staging version of that table. In order to support removing the joins on sale.flag from the script, we'll need to update our Terraform config to deploy a staging version of vw_pin_sale that will pull data from the staging sale.flag table.

The text was updated successfully, but these errors were encountered:

jeancochrane · 2023-10-20T20:25:37Z

A hiccup with this effort that I should have thought of sooner: Unlike sale.flag, which is generated by code in this repo, the vw_pin_sale view is actually created by code that lives in in data-architecture. This means that if we wanted to deploy a staging version of it from the context of model-sales-val, we would need to either duplicate the view query in model-sales-val or somehow reference the version of it stored in thedata-architecture repo, neither of which is a particularly appealing choice to me.

Two options off the top of my head:

Revert this section of #54 and maintain a duplicative join between vw_pin_sale to sale.flag in the context of the Glue script
- Pro: Dead simple and easy to maintain
- Con: Less efficient and potentially more confusing query due to the duplication
Update the deployment workflow and Terraform config so that on staging deployments we download the vw_pin_sale query from the main branch of data-architecture and use it to populate an aws_glue_catalog_table resource
- Pro: Eliminates duplication in the Glue job query
- Con: Fragile for a number of reasons:
  - Must be updated if the file structure of data-architecture ever changes
  - Makes local testing of the Terraform setup more complex
  - Still requires some duplication with data-architecture, since I confirmed experimentally that this Terraform resource requires each column in the view be specified

Option 1 is my preference but I'm curious if @wagnerlmichael or @dfsnow have other ideas!

wagnerlmichael · 2023-10-20T20:39:00Z

Between these two options I also like option 1. The revert of #54 should be doing the exact same thing. If we add that, testing is basically good to go, and the docs are already set up for that situation, I think (copying data from buckets to mimic prod).

jeancochrane · 2023-10-23T15:21:57Z

One thought that @dfsnow posted in Teams that's worth preserving:

I think a simpler solution might be:

All runs write directly to the prod table, but we add an additional column like type that is either "test" or "prod"

vw_pin_sale takes only "prod" flags, but we can query directly to see the result of test runs

Terraform still instantiates the glue job, but just only targets the prod assets with the changes outlined. All CI runs target type = "test" by default. We also remove the scheduling component of the job if there is one

When testing is finished, we manually trigger the terraform workflow to instantiate and run the glue job, with an input variable that changes the type to "prod"

We discussed this out of band and decided that in the immediate term we're just going to add back in the join that was removed in #54 to make the existing deployment setup work again. Then in the medium term I'm going to think more about Glue job deployment architecture as a whole while I work on setting up a model deployment pipeline in ccao-data/model-res-avm#23, and I'll plan to keep an eye out for opportunities to align these two approaches to deploying a model. Then we'll revisit the design and hopefully have a plan to improve it when we tackle permit automation in ccao-data/data-architecture#158, which will require a similar type of scheduled data processing job.

jeancochrane self-assigned this Oct 20, 2023

dfsnow mentioned this issue Oct 20, 2023

Remove flag table integration from recurring script #54

Merged

dfsnow mentioned this issue Oct 23, 2023

Revert query changes to use sale.flag #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Terraform configuration to deploy staging `vw_pin_sale` view #55

Update Terraform configuration to deploy staging `vw_pin_sale` view #55

jeancochrane commented Oct 20, 2023

jeancochrane commented Oct 20, 2023

wagnerlmichael commented Oct 20, 2023

jeancochrane commented Oct 23, 2023

Update Terraform configuration to deploy staging vw_pin_sale view #55

Update Terraform configuration to deploy staging vw_pin_sale view #55

Comments

jeancochrane commented Oct 20, 2023

jeancochrane commented Oct 20, 2023

wagnerlmichael commented Oct 20, 2023

jeancochrane commented Oct 23, 2023

Update Terraform configuration to deploy staging `vw_pin_sale` view #55

Update Terraform configuration to deploy staging `vw_pin_sale` view #55