GCE Deploy #1627

zaneselvans · 2022-05-07T00:13:58Z

Set up automated nightly builds on GCP.

bendnorman · 2022-06-03T22:47:17Z

Great idea to make this a squash merge. A couple of thoughts and questions before I wrap this up:

@zaneselvans Should the container create the CEMS partitioned directory of parquet files and the single parquet file for the intake catalogs? If it needs to create both, do I need to run the CEMS pipeline twice?
The github workflow is now a single job because github does not allow you to pass env vars between jobs. The two previous jobs need the same values for CHECKOUT_BRANCH, GCE_INSTANCE, and ACTION_SHA.

.github/workflows/tox-pytest.yml

.github/workflows/build-deploy-pudl.yml

zaneselvans · 2022-06-06T19:29:37Z

On the partitioned vs. monolithic CEMS issue, the pudl-catalog currently expects both to exist, but I don't think that's the long-term plan. We want a single version which is fast and space efficient, which I think will be the partitioned version, but we need to play around with how the metadata is stored to make that work. Right now I'm using the monolithic version in the examples because querying it is faster, and the caching mechanism downloads every file from the partitioned version whenever anything is queried because it has to look at the metadata inside each file to know where the right data is.

So either I should disable the partitioned version of the data in the catalog for the moment, or we should generate both outputs.

The epacems_to_parquet script should be able to generate the partitioned output with the --partition output after the normal ETL has run.

bendnorman · 2022-06-06T20:11:33Z

Ok, I'll just run epacems_to_parquet --partition after the full ETL is run we have the partitioned and monolithic versions.

bendnorman · 2022-06-14T00:23:33Z

Changes:

Create a Docker image that installs PUDL and runs the ETL.
Create a Github Action that builds the PUDL image, pushes it to Docker Hub, runs the ETL on a GCP VM and copies the outputs to intake GCS buckets on a schedule and tags.
Add a --loglevel arg to the package entrypoint commands.
Add GoogleCloudStorageCache support to ferc1_to_sqlite and censusdp1tract_to_sqlite commands and pytest.
Allow users to create monolithic and partitioned EPA CEMS outputs without having to clobber or move any existing CEMS outputs.
Add requester pays support to GoogleCloudStorageCache.

bendnorman · 2022-06-14T22:52:33Z

Ok! The YAML formatting has been sorted out and the unit and CI tests have been re-enabled. I think this is good to merge in.

zaneselvans

Can you add a page to the documentation in the Development section that explains how the nightly builds and data deployment works in general and what the moving parts are so we can all have a shared understanding of it?
It would also be good to get your bulletized summary of the code changes into the release notes.
How are local_pudl_etl.sh and gcp_pudl_etl.sh different? When do we use the local one?
In some places we're using the abbreviation GCE (GCE_INSTANCE) elsewhere it's GCP (gcp_pudl_etl.sh) and else-elsewhere it's GCLOUD (GCLOUD_BILLING_PROJECT) and it's not always clear to me why it's one and not the other. I think it'll be easier to remember these names without looking them up if we make them as consistent as possible. Or maybe I just don't understand what's differentiating them?
github.ref_name / $GITHUB_REF will be whatever the branch or tag are right? But the only case in which it runs on a branch is for dev because of the on: push trigger.
We wanted to condense all these commits into a single squash-merge right? That still seems like a good idea to me.

.github/workflows/build-deploy-pudl.yml

docker/local_pudl_etl.sh

zaneselvans · 2022-06-21T20:20:40Z

Looks good to me. Weird that you apparently can't transfer ownership of a PR to someone else. Since I created the initial PR I can't "approve".

bendnorman · 2022-06-21T20:22:11Z

Hmm weird. I'll squash and merge it in.

zaneselvans added 30 commits May 1, 2022 20:47

Enable docker build with Hello World container.

f043c52

Fix typo in pip install

388071b

Add Dockerfile linting and debugging output

573fdb5

Fix Dockerfile for running automatically

581b1e7

Use docker actions instead of crazy-max

f1f4300

Simplify docker build push workflow

f665ae6

Use catalystcoop/pudl-etl repo; add diagnostic output.

7931eb7

Explicitly copy .git into the docker container

589a0f0

Are trailing slashes meaningful in docker COPY commands?

c18c9a5

Set mount type before setting source/target in RUN

00b5c8d

Temporarily disable tox-pytest and list runner build context

6fcb54f

Go back to using git checkout

be5e76b

disable docker buildx

ef7a0bc

Switch back to dumb explicit copying of .git into container

83e2187

Change COPY command

a555464

Explicitly set context to be the checked-out source code

acbf5bd

Re-enable buildx and fix tox-pytest workflow syntax.

940112e

Bind mount .git temporarily rather than copying into image.

82ccc45

Use Docker metadata to generate tags and labels

b8e04cb

Make docker metadata IDs match

0fdaba6

Run on all push and pull_request. Remove extraneous SCM fetch

8369f50

Try switching to mamba env create

a2af83c

Try creating with mamba, updating with mamba env

c56e28c

Remove --yes which doesn't work on mamba env upcate

3ab9c5d

Make docker-metadata flavor and tag behavior explicit.

98fb9ee

Turn off tox-pytest for dev branch as well to avoid spurious builds.

057139d

Initial docker-compose.yml pointing to local data

9041c84

Add bind mounts to local data directories

5142cdb

Change docker-compose to run CI using tox, use image instead of build.

1c08473

Remove now obsolete Dockerfiles and other old dockerization stuff

da854df

bendnorman added 5 commits June 3, 2022 10:39

Echo ACTION_SHA

7b6a072

Echo ACTION_SHA

1b62cf3

Print action vars

cd71b88

Separate echo commands

1f81dda

Combine into one job to share env vars

c0399bc

Set loglevel to DEBUG

c5486c1

zaneselvans commented Jun 6, 2022

View reviewed changes

.github/workflows/tox-pytest.yml Outdated Show resolved Hide resolved

zaneselvans commented Jun 6, 2022

View reviewed changes

.github/workflows/build-deploy-pudl.yml Show resolved Hide resolved

bendnorman added 5 commits June 6, 2022 15:28

Add partitioned epacems run to full etl

5bdf0f6

Add clobber option for partitioned and monolith epacems

4e2801b

Add start up script to remove old images

882df08

Merge dev into gce-deploy

6a04732

Reactivate pytest CI

a1d5057

Merge dev into gce-deploy

8dbf0d6

bendnorman marked this pull request as ready for review June 14, 2022 22:51

Start instance after update image

9340576

zaneselvans commented Jun 15, 2022

View reviewed changes

.github/workflows/build-deploy-pudl.yml Outdated Show resolved Hide resolved

.github/workflows/build-deploy-pudl.yml Outdated Show resolved Hide resolved

docker/local_pudl_etl.sh Show resolved Hide resolved

bendnorman added 4 commits June 16, 2022 15:54

Release notes, GCP variable clean up

c134230

Add nightly data build documentation

f1de5c0

Add nightly builds docs to index

5c74cc3

Update nightly build docs

ad8a1be

bendnorman merged commit b8fb80e into dev Jun 21, 2022

zaneselvans deleted the gce-deploy branch October 26, 2022 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCE Deploy #1627

GCE Deploy #1627

zaneselvans commented May 7, 2022

bendnorman commented Jun 3, 2022

zaneselvans commented Jun 6, 2022

bendnorman commented Jun 6, 2022

bendnorman commented Jun 14, 2022

bendnorman commented Jun 14, 2022

zaneselvans left a comment

zaneselvans commented Jun 21, 2022

bendnorman commented Jun 21, 2022

GCE Deploy #1627

GCE Deploy #1627

Conversation

zaneselvans commented May 7, 2022

bendnorman commented Jun 3, 2022

zaneselvans commented Jun 6, 2022

bendnorman commented Jun 6, 2022

bendnorman commented Jun 14, 2022

bendnorman commented Jun 14, 2022

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans commented Jun 21, 2022

bendnorman commented Jun 21, 2022