Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore the time series to the variable summaries #408

Merged

Conversation

esheehan-gsl
Copy link
Contributor

@esheehan-gsl esheehan-gsl commented Oct 11, 2023

Restore the time series charts to the diagnostics display using the Parquet files.

Update the tests and the implementation for retrieving historical data.
In the process, I simplified the response type. I think, in the future,
if we want to select unadjusted O - F or just the historical
observations, we should build that into the URL, rather than sending all
of the data back to the client and letting the client pick what to view.
This cuts down on data transfer and allows us to simplify our components
when they read data, since they don't have to parse a nested JavaScript
object.
Removed some of the dataclasses and functions that had been part of the
history API which are no longer used, now that we have the Parquet
implementation.
The diag.history function fails when there's a file:// prefix in the
diag file path because apparently Pandas gets a little confused and
tries to pass it to urllib instead of just opening the file.
Instead of deriving the Parquet file path from the Zarr path, the route
reads in the FLASK_DIAG_PARQUET variable from the environment and passes
that explicitly to diag.history. diag.history uses that to find the
Parquet file for this model, and strips out `file://` if it's present
because Pandas is a bit silly about that.
The charts really need to be refactored (in part) to eliminate these
hard-coded property accesses, but for now we can just update the
properties that the time series component expects so that we can include
test out the time series data.
Instead of deriving the path to save parquet fixtures for tests, we use
tmp_path, basically the same way we would for production.
It seems like pathlib.Path is having an issue with the `file://`
protocol on our environment variables, although I don't think it used
to. It's easy enough to strip this, although this may require more
robust path/uri handling in our tests.
When we process diag files in our pipeline, we convert the integers in
the diag files for is_used into booleans, so should treat them the same
way in our test fixtures. I think this is a sign that our test fixtures
are poorly set up, since they can get out of sync with reality. The
result of this problem was that either the application worked, or our
tests passed. Prior to this change, we needed to compare is_used to a
boolean for Parquet files generated with our ETL code, but to an integer
for test files generated with our fixtures.
@esheehan-gsl esheehan-gsl linked an issue Oct 11, 2023 that may be closed by this pull request
@github-actions
Copy link

Code Coverage

Package Line Rate Branch Rate Health
unified_graphics 84% 72%
unified_graphics.etl 97% 96%
utils.s3 68% 69%
Summary 86% (412 / 479) 82% (98 / 120)

Minimum allowed line rate is 60%

@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 11, 2023 17:27 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl self-assigned this Oct 11, 2023
@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 11, 2023 17:31 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl marked this pull request as ready for review October 20, 2023 20:37
Copy link
Collaborator

@ian-noaa ian-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I agree that it's disquieting that the test fixtures can drift from reality. An immediate solution doesn't come to mind. Though there is part of me that is a little surprised that Python's typing didn't catch this for us at some point.

@esheehan-gsl esheehan-gsl merged commit 0280a9c into main Oct 23, 2023
14 checks passed
@esheehan-gsl esheehan-gsl deleted the 350-restore-the-time-series-to-the-variable-summaries branch October 23, 2023 16:14
@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 23, 2023 16:14 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 23, 2023 16:14 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore the time series to the variable summaries
2 participants