-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
Add some assertions to check that the parquet file is being created during ETL. This is getting to be a bit much with the Zarr and database stuff as well. If this works out, we should consider using just the Parquet files for diag data and removing Zarr and Postgres from the equation.
This is needed to write to Parquet files.
I think following pytest's recommended pattern for multiple asserts makes the tests a little easier to follow. Each test is a parameterized method, grouped by what action is being tested. The "act" step of the test is encompassed in a fixture on the class. Each method then only has to assert the state it expects, so it's a little more clear what each expected outcome is. See: https://docs.pytest.org/en/7.3.x/how-to/fixtures.html#running-multiple-assert-statements-safely
We need to ensure that the parquet file is being updated correctly, and that the database has both analyses.
This needs to be present to allow us to concatenate variables into a single Parquet file.
Instead of combining all of the variables into a single parquet file for the model, I've decided to separate them. Some of the variables, such as wind, have columns that are unique to that variable; I think this is reason enough to store them separately, otherwise all of the other variables will just NaN values in those columns, which seems silly. Some preliminary testing suggests this might actually save space.
This seems like a silly type hint from pandas-stubs, because the tuple works just fine, but I don't think it's causing any harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - Does the parquet file get saved out locally as well if we're using the file://
prefix?
@pytest.fixture | ||
@pytest.fixture(scope="session") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call. If this was function-scoped it seems like the factory
function wasn't really being used before - it'd need to recreate the data
each time it was invoked.
class TestSaveNew: | ||
@pytest.fixture(scope="class", autouse=True) | ||
def dataset(self, model, diag_dataset, session, zarr_file): | ||
(mdl, system, domain, background, frequency) = model | ||
ps = diag_dataset( | ||
"ps", | ||
"2022-05-05T14:00", | ||
"anl", | ||
mdl, | ||
system, | ||
domain, | ||
frequency, | ||
background, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice way to consolidate setup and handle multiple assert
's. Thanks for the doc link!
Yeah, the Parquet file should get written locally as well. |
Add storage of diag data as partitioned Parquet files. This should be more suitable for our time series visualizations.