Skip to content

Commit

Permalink
Set up reproducible Python environments with conda-lock (#2968)
Browse files Browse the repository at this point in the history
* Add a conda-lock setup for discussion.
* Move python-snappy into project.dependencies in pyproject.toml
* Remove sphinx-autoapi from pypi deps, and no longer required snappy-python
* Switch to using conda-forge version of recordlinkage v0.16
* Update conda-lock.yml now that all dependencies are available on conda-forge
* Consolidate conda env files under environments/ dir
* Add a GitHub action to relock dependencies
* Quote the pip install command
* Remove pip install of pudl from environment.yml
* Rename workflow
* Only build lockfile from pyproject.toml, don't install extras.
* Just install conda-lock, not pudl, before running conda-lock.
* install conda-lock with pip
* Move all remaining dev-environment.yml deps to pyproject.toml
* Add other platforms; make draft PR against dev.
* Comment out dev base branch for now.
* Remove pandas extras and recordlinkage deps from pyproject.toml
* Use conda-lock --micromamba rather than --mamba
* Don't specify grpcio, or specific recordlinkage version
* Render platform-specific environment files in github action
* Fix paths relative to environments directory
* Add some comment notes to workflow
* Render environment for Read The Docs.
* Use environment not explicit rendered lockfile
* Add readthedocs specific sphinx extension
* Don't render explicit conda env for RTD since it can't read it.
* Build linux-aarch64 lockfile. Use conda-lock.yml in workflows.
* Comment out non-working linux-aarch64 platform for now.
* Switch to using rendered lockfiles.
* Remove deprecated environment files
* Switch to using a micromamba docker image
* Install git into the docker image.
* Use micromamba and unrendered multi-platform lockfile.
* Add main category to micromamba environment creation.
* Use conda-lock not base as env name
* Add a conda-lock setup for discussion.
* Move python-snappy into project.dependencies in pyproject.toml
* Remove sphinx-autoapi from pypi deps, and no longer required snappy-python
* Add linux-aarch64 platform back into conda-lock settings.

- Remove deprecated API_KEY_EIA envvar
- Add .gitignore in new environments directory
- Create Makefile and migrat tox.ini logic into it
- Replace spaces in Makefile with tabs
- Install pudl via pip install --no-deps to avoid contaminating
  the locked environment
- Move pip install and coverage XML logic from GHA workflow into
  the Makefile
- Increase the minimum versions of some dependencies.
- Move update-lockfile GHA workflow logic into Makefile
- Attempt to run slow tests in parallel using "wait" to prevent
  premature exit of the shell

* Use micromamba not conda in Dockerfile CMD, also use pip install --no-deps
* Use micromamba not conda in command passed to build container
* Use default mambauser rather than catalyst in docker container
* Remove --no-capture-output which isn't supported by micromamba. Is this a problem?
* Remove uninterpolated vars in .env and more --no-capture-output
* Separate ETL and pytest commands.
* Stop trying to run tests in parallel. Sigh.
* Add google cloud sdk to conda environment.
* Install Google Cloud SDK from conda-forge.
* Add back in the making of required directories. Oops.
* Attempt to have micromamba run pass through output
* Use prettier to standardize formatting of lockfiles.
* Add dagster (server startup) target to Makefile
* Update conda lockfile and rerender environment files
* Attempt to trigger update-lockfile when pyproject.toml is changed.
* Remove non-required upper bounds on dependency versions.
* Use correct branch name for update-lockfile trigger.
* Fix incorrect nbformat minimum version
* Update Makefile in response to PR feedback

* Remove dagster-postgres version to avoid PyPI conda-forge conflict.
* Break up and distribute the nuke target.
* Resolve issues with pandera dependency. Reorganize makefile.
* Add triggers and commit/PR for workflow_dispatch, pull_request, schedule
  and set appropriate GITHUB_REF values for each case.
* Use push instead of pull_request to trigger on path. This avoids re-locking
  the dependencies every single time you push to a PR that had a change
  to pyproject.toml *somewhere* in it.
* Also trigger based on path if .github/workflows/update-lockfile.yml changes.
* Update conda-lock.yml and rendered conda environment files.

* Move previous dev, test, and datasette optional dependencies into
  the required dependencies to simplify application installation.
* Test make nuke; parallelize --live-dbs tests
* Move prettier into conda-only dependencies
* Update conda-lock.yml and rendered conda environment files.
* Remove action test file hashlog
* Remove merge markers.
* Remove transitive astroid dependency that's now correctly included in solve.
* Use the real immature library version of dagster-postgres (0.21.6)
  rather than the accidentally packaged 1.5.6 version found in conda.
  We'll need to keep an eye out for when dagster-postgres graduates
  to the stable versioning and update it. This is a bit of a mess
  because of some broken automation in the conda packaging for dagster
  which has now been fixed.
* Update "make pudl" to remove the old PUDL DB and reinitialize with
  alembic, rather than writing to the DB that already exists.
* Fixed some groupby.agg() deprecation warnings.
* Fix dagster-postgres version (again).
* Update username in path to settings file
* Avoid bugs in ferc_to_sqlite --clobber; don't use cache_dir for pip install.
* Make FERC extraction output removal more specific.
* Bump numpy and numba minimum versions.
* Bump black version in pre-commit
* Bump ruff pre-commit hook version

* Rename tox-pytest and update-lockfile workflows.
* Make scheduled PRs against dev rather than conda-lockfile branch.
* Update to pyarrow 14 and grpcio 1.59 b/c security
* Update release notes
* Add CI targets in Makefile. Docs cleanup.
* Update conda-lock.yml and rendered conda environment files.

---------

Co-authored-by: zaneselvans <zaneselvans@users.noreply.github.com>
  • Loading branch information
zaneselvans and zaneselvans committed Nov 15, 2023
1 parent b783809 commit c8a0560
Show file tree
Hide file tree
Showing 42 changed files with 26,329 additions and 907 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/bot-auto-merge.yml
Expand Up @@ -4,7 +4,7 @@ name: bot-auto-merge
on:
workflow_run:
types: [completed]
workflows: ["tox-pytest"]
workflows: ["pytest"]

jobs:
bot-auto-merge:
Expand Down
11 changes: 6 additions & 5 deletions .github/workflows/build-deploy-pudl.yml
Expand Up @@ -94,11 +94,12 @@ jobs:
gcloud compute instances update-container "$GCE_INSTANCE" \
--zone "$GCE_INSTANCE_ZONE" \
--container-image "docker.io/catalystcoop/pudl-etl:${{ env.GITHUB_REF }}" \
--container-command "conda" \
--container-command "micromamba" \
--container-arg="run" \
--container-arg="--no-capture-output" \
--container-arg="-p" \
--container-arg="/home/catalyst/env" \
--container-arg="--prefix" \
--container-arg="/home/mambauser/env" \
--container-arg="--attach" \
--container-arg='' \
--container-arg="bash" \
--container-arg="./docker/gcp_pudl_etl.sh" \
--container-env-file="./docker/.env" \
Expand All @@ -116,8 +117,8 @@ jobs:
--container-env DAGSTER_PG_PASSWORD="$DAGSTER_PG_PASSWORD" \
--container-env DAGSTER_PG_HOST="104.154.182.24" \
--container-env DAGSTER_PG_DB="dagster-storage" \
--container-env PUDL_SETTINGS_YML="/home/catalyst/src/pudl/package_data/settings/etl_full.yml" \
--container-env FLY_ACCESS_TOKEN=${{ secrets.FLY_ACCESS_TOKEN }} \
--container-env PUDL_SETTINGS_YML="/home/mambauser/src/pudl/package_data/settings/etl_full.yml" \
# Start the VM
- name: Start the deploy-pudl-vm
Expand Down
59 changes: 23 additions & 36 deletions .github/workflows/tox-pytest.yml → .github/workflows/pytest.yml
@@ -1,5 +1,5 @@
---
name: tox-pytest
name: pytest

on:
pull_request:
Expand All @@ -10,19 +10,15 @@ on:
- ready_for_review

env:
PUDL_OUTPUT: /home/runner/pudl-work/output
PUDL_INPUT: /home/runner/pudl-work/data/
PUDL_OUTPUT: /home/runner/pudl-work/output/
PUDL_INPUT: /home/runner/pudl-work/input/
DAGSTER_HOME: /home/runner/pudl-work/dagster_home/

jobs:
ci-static:
ci-docs:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
tox-env:
- linters
- docs
defaults:
run:
shell: bash -l {0}
Expand All @@ -32,16 +28,12 @@ jobs:
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand All @@ -51,13 +43,13 @@ jobs:
conda config --show
printenv | sort
- name: Build ${{ matrix.tox-env}} with Tox
- name: Lint and build PUDL documentation with Sphinx
run: |
tox -e ${{ matrix.tox-env }}
pip install --no-deps --editable .
make docs-build
- name: Upload coverage
uses: actions/upload-artifact@v3
if: ${{ matrix.tox-env == 'docs' }}
with:
name: coverage-docs
path: coverage.xml
Expand All @@ -75,16 +67,12 @@ jobs:
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand All @@ -99,9 +87,10 @@ jobs:
which sqlite3
sqlite3 --version
- name: Run unit tests with Tox
- name: Run PUDL unit tests and collect test coverage
run: |
tox -e unit -- --durations 0
pip install --no-deps --editable .
make pytest-unit
- name: Upload coverage
uses: actions/upload-artifact@v3
Expand Down Expand Up @@ -131,13 +120,9 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict

- name: Log environment details
run: |
Expand Down Expand Up @@ -180,7 +165,8 @@ jobs:

- name: Run integration tests, trying to use GCS cache if possible
run: |
tox -e integration -- --gcs-cache-path=gs://zenodo-cache.catalyst.coop --durations 0
pip install --no-deps --editable .
make pytest-integration
- name: Upload coverage
uses: actions/upload-artifact@v3
Expand All @@ -194,13 +180,13 @@ jobs:
ci-coverage:
runs-on: ubuntu-latest
needs:
- ci-docs
- ci-unit
- ci-integration
- ci-static
steps:
- uses: actions/checkout@v4
- name: Download coverage
id: download-unit
id: download-coverage
uses: actions/download-artifact@v3
with:
path: coverage
Expand All @@ -216,6 +202,7 @@ jobs:
runs-on: ubuntu-latest
if: ${{ always() }}
needs:
- ci-docs
- ci-unit
- ci-integration
steps:
Expand Down
84 changes: 84 additions & 0 deletions .github/workflows/update-conda-lockfile.yml
@@ -0,0 +1,84 @@
---
name: update-conda-lockfile

on:
workflow_dispatch:
schedule:
- cron: "0 9 * * 1-5" # Weekdays at 9AM UTC
push:
paths:
- "pyproject.toml"
- "environments/*"
- ".github/workflows/update-conda-lockfile.yml"

# What branch does this action run on?
# - workflow_dispatch: Whatever branch it was run against.
# - schedule: Always the same branch (will be dev or main)
# - push: Base branch of the PR.

jobs:
update-conda-lockfile:
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.actor != 'pudlbot') || (github.event_name == 'schedule' && github.repository == 'catalyst-cooperative/pudl') || (github.event_name == 'workflow_dispatch') }}
defaults:
run:
shell: bash -l {0}
steps:
- name: Set GITHUB_REF for use with workflow_dispatch
if: ${{ (github.event_name == 'workflow_dispatch') }}
run: |
echo "GITHUB_REF="${{ github.ref_name }} >> $GITHUB_ENV
- name: Set GITHUB_REF for use with schedule
if: ${{ (github.event_name == 'schedule') }}
run: |
echo "GITHUB_REF=dev" >> $GITHUB_ENV
- name: Set GITHUB_REF for use with push
if: ${{ (github.event_name == 'push') }}
run: |
echo "GITHUB_REF="${{ github.ref_name }} >> $GITHUB_ENV
- name: Log final value of GITHUB_REF
run: |
echo "Final GITHUB_REF:" ${{ env.GITHUB_REF }}
- uses: actions/checkout@v4
with:
token: ${{ secrets.PUDL_BOT_PAT }}
ref: ${{ env.GITHUB_REF }}
- name: Install Micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-name: conda-lock
create-args: >-
python=3.11
conda-lock
prettier
- name: Run conda-lock to recreate lockfile from scratch
run: |
make conda-clean
make conda-lock.yml
- name: Commit updated conda lockfiles to branch
# If running on push due to dependency changes, commit directly to the base
# branch of the existing PR. Don't trigger the workflow again if we're already
# running it as pudlbot (to avoid infinite recursion).
if: ${{ (github.event_name == 'push' && github.actor != 'pudlbot') }}
uses: stefanzweifel/git-auto-commit-action@v5
with:
file_pattern: "environments/*"
commit_message: "Update conda-lock.yml and rendered conda environment files."
- name: Make a PR to merge updated conda lockfiles
# If we are relocking dependencies on a schedule or workflow_dispatch, we need
# to make our own PR to check whether the updated environment actually solves
# and the tests pass.
if: ${{ (github.event_name == 'schedule' && github.repository == 'catalyst-cooperative/pudl') || (github.event_name == 'workflow_dispatch') }}
uses: peter-evans/create-pull-request@v5
with:
commit-message: "Update conda-lock.yml and rendered conda environment files."
title: Update Lockfile
body: >
This pull request relocks the dependencies with conda-lock.
It is triggered by [update-conda-lockfile](https://github.com/catalyst-cooperative/pudl/blob/main/.github/workflows/update-conda-lockfile.yml).
labels: dependencies, conda-lock
reviewers: zaneselvans
branch: update-conda-lockfile
base: ${{ env.GITHUB_REF }}
draft: true
delete-branch: true
57 changes: 0 additions & 57 deletions .github/workflows/update-lockfile.yml

This file was deleted.

11 changes: 4 additions & 7 deletions .github/workflows/zenodo-cache-sync.yml
Expand Up @@ -14,7 +14,7 @@ env:
PUBLIC_ZENODO_CACHE_BUCKET: gs://zenodo-cache.catalyst.coop
GITHUB_REF: ${{ github.ref_name }} # This is changed to dev if running on a schedule
PUDL_OUTPUT: ~/pudl-work/output
PUDL_INPUT: ~/pudl-work/data/
PUDL_INPUT: ~/pudl-work/input/

jobs:
zenodo-cache-sync:
Expand Down Expand Up @@ -47,13 +47,10 @@ jobs:
- name: Install Conda environment using mamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: test/test-environment.yml
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true
condarc: |
channels:
- conda-forge
- defaults
channel_priority: strict
create-args: --category main dev docs test datasette

- name: Log environment details
run: |
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Expand Up @@ -23,8 +23,8 @@ codecov.sh
.env_pudl/
*wheel-metadata
dask-worker-space*
devtools/user-requirements.txt
devtools/user-environment.yml
environments/user-requirements.txt
environments/user-environment.yml
.vscode/*
commit.txt
devtools/profiles/
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Expand Up @@ -13,7 +13,7 @@ repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files # Don't accidentally commit giant files.
# - id: check-added-large-files # Don't accidentally commit giant files.
- id: check-merge-conflict # Watch for lingering merge markers.
- id: check-yaml # Validate all YAML files.
- id: check-case-conflict # Avoid case sensitivity in file names.
Expand Down
6 changes: 2 additions & 4 deletions .readthedocs.yaml
Expand Up @@ -10,11 +10,11 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: mambaforge-4.10
python: mambaforge-22.9

# Define the python environment using conda / mamba
conda:
environment: docs/docs-environment.yml
environment: environments/conda-linux-64.lock.yml

# Build documentation in the docs/ directory with Sphinx
sphinx:
Expand All @@ -27,5 +27,3 @@ python:
install:
- method: pip
path: .
extra_requirements:
- doc

0 comments on commit c8a0560

Please sign in to comment.