New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring the Census DP1 to SQLite ETL into dagster #2621
Merged
Merged
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
da77770
WIP: stashing changes
e-belfer 2c5af18
Add non-spatial dataframe to dagster
e-belfer f4395fc
Merge branch 'dev' into 714-861-dagster
e-belfer f0060a7
Add tables into PUDL metadata
e-belfer 6b2c432
Merge branch 'dev' into 714-861-dagster
e-belfer ea3ee34
Fix metadata and fk error for compiled_geom tables
e-belfer ecb5826
Revert spatial test to match new geopandas output
e-belfer efa2cd3
Add 714 outputs to default io mgr
e-belfer e3672fb
Add georef resp and counties, summarized_demand_ferc714
e-belfer b74dfe0
Merge branch 'dev' into 714-861-dagster
zaneselvans 06fe62e
Deduplicate particulate_control_id_eia
zaneselvans 539dbb0
Add FERC714 tables to metadata, confirm all tables identical
e-belfer 1e2136b
Merge branch '714-861-dagster' of https://github.com/catalyst-coopera…
e-belfer 70f6fa0
Fix migrations, start tests, add 714 to PUDL
e-belfer 6610ace
Merge branch 'dev' into 714-861-dagster
e-belfer 706c80a
Fix FK errors
e-belfer 442305c
Add validation and integration tests, add state_demand output into PU…
e-belfer c90dff1
Functional Census dagster integration, first pass
e-belfer 48dbf82
Updated working Census ETL that pickles all output layers
e-belfer ecdfb1f
Merge branch 'dev' into census_dagster
e-belfer 2bce0ef
Fix read-in of census layers, multi-asset -> asset factory
e-belfer a51c6f1
Merge branch 'census_dagster' of https://github.com/catalyst-cooperat…
e-belfer 49b771d
Merge branch 'dev' into census_dagster
e-belfer e265f10
Remove census from CLI test, fix GH runner
e-belfer e87844e
Merge branch 'dev' into census_dagster
e-belfer 6dc3499
Materialize census outputs in FERC714 tests
e-belfer 82cde06
Merge branch 'dev' into census_dagster
e-belfer ff2e979
Merge branch 'census_dagster' into 714-861-dagster
e-belfer ff117c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5a5bd81
update migrations and census read-in
e-belfer 6fb63ef
Merge branch '714-861-dagster' of https://github.com/catalyst-coopera…
e-belfer bcfc4ca
Update state_demand.py
e-belfer e78951b
Clean up testing environment from WIP census read-ins
e-belfer 8130143
Merge branch '714-861-dagster' of https://github.com/catalyst-coopera…
e-belfer 581a195
Merge branch 'dev' into census_dagster
e-belfer 7d185d5
Merge branch 'census_dagster' into 714-861-dagster
e-belfer 4168a6d
Remove accidental field change, add release notes
e-belfer 03355e9
Remove outdated args in function
e-belfer df1edc0
Address first round of PR comments
e-belfer 68f9c59
Prune intermediate assets
e-belfer f8cc104
Add type hints and clean up docs
e-belfer 00df2d9
Rename asset groups, expand docstring
e-belfer 423dd39
Merge pull request #2550 from catalyst-cooperative/714-861-dagster
e-belfer 55c0564
Merge branch 'dev' into census_dagster
e-belfer ddd289d
Merge branch 'dev' into census_dagster
e-belfer 5b3aedf
Update alembic migrations
e-belfer 929518b
Merge branch 'dev' into census_dagster
e-belfer acf6a4f
Merge branch 'census_dagster' of https://github.com/catalyst-cooperat…
e-belfer eb9bab6
Update release notes
e-belfer 9987ca4
Update release notes to more accurately reflect revisions
e-belfer File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
"""Dagster definitions for the FERC to SQLite process.""" | ||
|
||
from dagster import Definitions, graph | ||
|
||
import pudl | ||
from pudl.convert.censusdp1tract_to_sqlite import censusdp1tract_to_sqlite | ||
from pudl.resources import datastore | ||
|
||
logger = pudl.logging_helpers.get_logger(__name__) | ||
|
||
|
||
@graph | ||
def census_to_sqlite(): | ||
"""Clone the Census DP1 database into SQLite.""" | ||
censusdp1tract_to_sqlite() | ||
|
||
|
||
default_resources_defs = { | ||
"datastore": datastore, | ||
} | ||
|
||
census_to_sqlite = census_to_sqlite.to_job( | ||
resource_defs=default_resources_defs, | ||
name="census_to_sqlite", | ||
) | ||
|
||
defs: Definitions = Definitions(jobs=[census_to_sqlite]) | ||
"""A collection of dagster assets, resources, IO managers, and jobs for the FERC to | ||
SQLite ETL.""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
"""A script for cloning the Census DP1 database into SQLite. | ||
|
||
This script generates a SQLite database that is a clone/mirror of the original | ||
Census DP1 database. We use this cloned database as the starting point for the | ||
main PUDL ETL process. The underlying work in the script is being done in | ||
:mod:`pudl.extract.ferc1`. | ||
""" | ||
import argparse | ||
import sys | ||
from collections.abc import Callable | ||
|
||
from dagster import ( | ||
DagsterInstance, | ||
JobDefinition, | ||
build_reconstructable_job, | ||
execute_job, | ||
) | ||
|
||
import pudl | ||
from pudl import census_to_sqlite | ||
from pudl.settings import EtlSettings | ||
|
||
# Create a logger to output any messages we might have... | ||
logger = pudl.logging_helpers.get_logger(__name__) | ||
|
||
|
||
def parse_command_line(argv): | ||
"""Parse command line arguments. See the -h option. | ||
|
||
Args: | ||
argv (str): Command line arguments, including caller filename. | ||
|
||
Returns: | ||
dict: Dictionary of command line arguments and their parsed values. | ||
""" | ||
parser = argparse.ArgumentParser(description=__doc__) | ||
parser.add_argument( | ||
"settings_file", type=str, default="", help="path to YAML settings file." | ||
) | ||
parser.add_argument( | ||
"--logfile", | ||
default=None, | ||
type=str, | ||
help="If specified, write logs to this file.", | ||
) | ||
parser.add_argument( | ||
"-c", | ||
"--clobber", | ||
action="store_true", | ||
help="""Clobber existing sqlite database if it exists. If clobber is | ||
not included but the sqlite databse already exists the _build will | ||
fail.""", | ||
default=False, | ||
) | ||
parser.add_argument( | ||
"--sandbox", | ||
action="store_true", | ||
default=False, | ||
help="Use the Zenodo sandbox rather than production", | ||
) | ||
parser.add_argument( | ||
"--gcs-cache-path", | ||
type=str, | ||
help="Load datastore resources from Google Cloud Storage. Should be gs://bucket[/path_prefix]", | ||
) | ||
parser.add_argument( | ||
"--loglevel", | ||
help="Set logging level (DEBUG, INFO, WARNING, ERROR, or CRITICAL).", | ||
default="INFO", | ||
) | ||
arguments = parser.parse_args(argv[1:]) | ||
return arguments | ||
|
||
|
||
def census_to_sqlite_job_factory( | ||
logfile: str | None = None, loglevel: str = "INFO" | ||
) -> Callable[[], JobDefinition]: | ||
"""Factory for parameterizing a reconstructable census_to_sqlite job. | ||
|
||
Args: | ||
loglevel: The log level for the job's execution. | ||
logfile: Path to a log file for the job's execution. | ||
|
||
Returns: | ||
The job definition to be executed. | ||
""" | ||
|
||
def get_census_to_sqlite_job(): | ||
"""Module level func for creating a job to be wrapped by reconstructable.""" | ||
return census_to_sqlite.census_to_sqlite.to_job( | ||
resource_defs=census_to_sqlite.default_resources_defs, | ||
name="census_to_sqlite_job", | ||
) | ||
|
||
return get_census_to_sqlite_job | ||
|
||
|
||
def main(): # noqa: C901 | ||
"""Clone the Census database into SQLite.""" | ||
args = parse_command_line(sys.argv) | ||
|
||
# Display logged output from the PUDL package: | ||
pudl.logging_helpers.configure_root_logger( | ||
logfile=args.logfile, loglevel=args.loglevel | ||
) | ||
|
||
etl_settings = EtlSettings.from_yaml(args.settings_file) | ||
|
||
# Set PUDL_INPUT/PUDL_OUTPUT env vars from .pudl.yml if not set already! | ||
pudl.workspace.setup.get_defaults() | ||
|
||
census_to_sqlite_reconstructable_job = build_reconstructable_job( | ||
"pudl.census_to_sqlite.cli", | ||
"census_to_sqlite_job_factory", | ||
reconstructable_kwargs={"loglevel": args.loglevel, "logfile": args.logfile}, | ||
) | ||
|
||
result = execute_job( | ||
census_to_sqlite_reconstructable_job, | ||
instance=DagsterInstance.get(), | ||
run_config={ | ||
"resources": { | ||
"census_to_sqlite_settings": { | ||
"config": etl_settings.census_to_sqlite_settings.dict() | ||
}, | ||
"datastore": { | ||
"config": { | ||
"sandbox": args.sandbox, | ||
"gcs_cache_path": args.gcs_cache_path | ||
if args.gcs_cache_path | ||
else "", | ||
}, | ||
}, | ||
}, | ||
}, | ||
raise_on_error=True, | ||
) | ||
|
||
# Workaround to reliably getting full stack trace | ||
if not result.success: | ||
for event in result.all_events: | ||
if event.event_type_value == "STEP_FAILURE": | ||
raise Exception(event.event_specific_data.error) | ||
|
||
|
||
if __name__ == "__main__": | ||
sys.exit(main()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I guess we could get rid of the CLI interface here completely? Or refactor it to just wrap around however
dagster asset materialize
works? 🤷 Maybe for a different PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're going to have trouble providing the
datastore
resource to the CLI while we're still using legacy resources (see here). I've gone ahead and gotten rid of the CLI.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥