Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PUDL to use Python 3.12 #3413

Merged
merged 3 commits into from
Mar 24, 2024
Merged

Update PUDL to use Python 3.12 #3413

merged 3 commits into from
Mar 24, 2024

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Feb 20, 2024

Overview

Switch PUDL over to using Python 3.12, which has been out since October 2nd, 2023. All of our current dependencies now support Python 3.12.

However, I note that theconda-forge package python-duckdb has a Python 3.12 build issue. I think the issue has been resolved for Linux and MacOS so I've poked the maintainer. While duckdb isn't a current PUDL dependency, it will become one as soon as @katie-lamb's Splink PR #3302 is merged, so we should probably hold off on merging this PR until the python-duckdb package is updated on conda-forge.

Closes #3327

Testing

To-do list

@zaneselvans zaneselvans added the dependencies Pull requests that update a dependency file label Feb 20, 2024
pyproject.toml Outdated
@@ -45,7 +45,7 @@ dependencies = [
"nbconvert>=7",
"nbformat>=5.9",
"networkx>=3.2",
"numba>=0.58", # pandas[performance]
"numba>=0.59", # pandas[performance]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.59 is the first numba version that works with Python 3.12

pyproject.toml Outdated
Comment on lines 25 to 26
"dagster>=1.6.5",
"dagster-postgres>=0.22.5,<1", # Update when dagster-postgres graduates to 1.x
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1.6.5 (0.22.5) is the first Dagster version that works with Python 3.12

pyproject.toml Outdated
@@ -327,12 +327,12 @@ channel-priority = "strict"
name = "pudl-dev"

[tool.conda-lock.dependencies]
google-cloud-sdk = ">=452"
google-cloud-sdk = ">=464"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

464 is the first version of the google-cloud-sdk that works with Python 3.12 (thanks to my prodding...)

@zaneselvans
Copy link
Member Author

@katie-lamb & @zschira I did a full ETL using Python 3.12 locally and it almost completed. The only error that came up was in out_pudl__yearly_assn_eia_ferc1_plant_parts.ferc_to_eia.get_model_predictions during some splinking. Error below. Any idea what might be up?

IndexError: list index out of range
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_utils/__init__.py", line 463, in iterate_with_context
    next_output = next(iterator)
                  ^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 131, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
             ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 125, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
                                                                    ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/src/pudl/analysis/record_linkage/eia_ferc1_record_linkage.py", line 236, in get_model_predictions
    preds_df = linker.predict(threshold_match_probability=threshold_prob)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/linker.py", line 1766, in predict
    sqls = predict_from_comparison_vectors_sqls(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/predict.py", line 21, in predict_from_comparison_vectors_sqls
    select_cols = settings_obj._columns_to_select_for_bayes_factor_parts
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/settings.py", line 262, in _columns_to_select_for_bayes_factor_parts
    cols.extend(cc._columns_to_select_for_bayes_factor_parts)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/comparison.py", line 254, in _columns_to_select_for_bayes_factor_parts
    sqls = [cl._tf_adjustment_sql for cl in self.comparison_levels]
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/comparison_level.py", line 584, in _tf_adjustment_sql
    u_prob_exact_match = self._u_probability_corresponding_to_exact_match
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/comparison_level.py", line 535, in _u_probability_corresponding_to_exact_match
    if not level._is_exact_match:
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/comparison_level.py", line 503, in _is_exact_match
    if not _is_exact_match(expr):
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/splink/comparison_level.py", line 45, in _is_exact_match
    if identifiers[0] == identifiers[1]:

@katie-lamb
Copy link
Member

@zaneselvans See moj-analytical-services/splink#2072 , the issue seems to be from incompatibility of splink with the latest version of the SQLGlot dependency. See moj-analytical-services/splink#2076 (comment) , which suggests pinning sqlglot==22.5.0. Seems like the issue will be fixed imminently by moj-analytical-services/splink#2079 , so I think we could just wait too.

@zaneselvans
Copy link
Member Author

Ah great, thanks for looking into it! I'll pin the version and see if all is well.

@zaneselvans
Copy link
Member Author

Woo! This fixed it. Will do a full build on the branch and merge into main tonight if it succeeds.

@zaneselvans zaneselvans marked this pull request as ready for review March 23, 2024 18:51
@zaneselvans zaneselvans requested review from pudlbot and removed request for jdangerx March 23, 2024 18:51
@zaneselvans zaneselvans added this pull request to the merge queue Mar 24, 2024
Merged via the queue into main with commit e4281a9 Mar 24, 2024
13 checks passed
@zaneselvans zaneselvans deleted the py312 branch March 24, 2024 05:12
cmgosnell pushed a commit that referenced this pull request Mar 26, 2024
* Update pudl-dev environment to use Python 3.12

* Use 4 CPUs in pytest Makefile targets.  Update release notes.

* Pin sqlglot<23 to avoid splink incompatibility
github-merge-queue bot pushed a commit that referenced this pull request Mar 27, 2024
…n-service table name (#3450)

* add total to subtotal correction record and fix hard coded table name

* redo migrations

* enable plant_in_service table to have subtotal corrections - WIP: metric checks fail rn.

* update metric checks with new null records in diff values

* add reported value -> null calculation compoents corrections

* add the correction records more explicitly at end of calc comp process and add unit tests

* redo migrations... again

* link main migrations to my reset migration

* small tag change to reduce conflicting tags

* defensive checks, break out unit tests and add pruned unit test

* minor unit test fix

* Update test/unit/transform/ferc1_test.py

Co-authored-by: E. Belfer <37471869+e-belfer@users.noreply.github.com>

* Update src/pudl/transform/ferc1.py

Co-authored-by: E. Belfer <37471869+e-belfer@users.noreply.github.com>

* Add PUDL citation for Grid Strategies load growth report. (#3483)

* Clean EIA 860 and 923 FGD operation and maintenance data (#3403)

* Stash changes

* Stash changes to 923 FGD table

* Fix drop NA behavior

* Fix docstrings

* Add fields to PUDL metadata

* Fix drop nas, encode columns, fix booleans and combine raw maintenance columns

* Add alembic migration for harvested FGD table

* Add FGD operational status to encoding FKs

* Update alembic

* Add table to core and address PR comments

* Stash changes

* Change EIA 923 FGD table to _core, write asset checks, fix for fast ETL

* Add WIP 860 transform

* Stash changes

* Add fields, rename raw vars to conform to existing fields better

* Add 923 and 860 to pudl.sqlite, encode all tables, fix PK and dtype issues

* Update 860 docstrings

* Restore environments, update release notes, update schedule in 860 description

* Add sorbent type coding table to PUDL

* Fix FK for fgd operational status

* Fix docs build indentation failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* Clean up migrations, remove crufty resource def

* Fix encoding, fix typos, spell out FGD

* Fix docs build ref failure

* Add year range to ratio helper, add unit tests for helper functions, trim returns

* No cover for asset checks

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add FK exclusion for eia plants table (#3491)

* Add manual GridPath RA Toolkit renewable profile data source. (#3489)

* Add manual GridPath RA Toolkit renewable profile data source.

* Update Zenodo sandbox DOI to point at working archive

* Update to production gridpathratk Zenodo DOI.

* Update release notes

* Bump version of mamba Docker image to 1.5.7

* Add some more keywords to the GridPath RA TK source.

* Add logline that tells us more about BadZipFile. (#3493)

* eia860 solar: extract (#3482)

* eia860 solar: extract step wahoo

* tweak column names

* first pass of extracting 860 wind

* fix column name and page map

* rename retired turbine model ID

* rename the renamed model column

* Update PUDL to use Python 3.12 (#3413)

* Update pudl-dev environment to use Python 3.12

* Use 4 CPUs in pytest Makefile targets.  Update release notes.

* Pin sqlglot<23 to avoid splink incompatibility

* Update conda-lock.yml and rendered conda environment files. (#3496)

Co-authored-by: zaneselvans <596279+zaneselvans@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#3500)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.3.3 → v0.3.4](astral-sh/ruff-pre-commit@v0.3.3...v0.3.4)
- [github.com/pre-commit/mirrors-prettier: v3.1.0 → v4.0.0-alpha.8](pre-commit/mirrors-prettier@v3.1.0...v4.0.0-alpha.8)

* Update .pre-commit-config.yaml

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>

* udpate transform docs and fix enum

* udpate test docs

* adjust calculation metric tolerances for fast etl

* convert leafy data merge to an inner merge so we don't get nulls

---------

Co-authored-by: E. Belfer <37471869+e-belfer@users.noreply.github.com>
Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Dazhong Xia <dazhong.xia@catalyst.coop>
Co-authored-by: PUDL Bot <74792863+pudlbot@users.noreply.github.com>
Co-authored-by: zaneselvans <596279+zaneselvans@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Update PUDL to use Python 3.12
3 participants