Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ngen.cal support for t-route's netcdf, parquet, and 3 csv output formats variants #162

Merged

Conversation

aaraney
Copy link
Member

@aaraney aaraney commented Aug 12, 2024

Depends on #161 #161 now merged

See #153 for more background

Additions

  • Support for reading t-route's netcdf, parquet, pickle, and 3 csv output formats variants
  • Add netcdf extra require feature flag

TODO:

  • bump ngen.cal version
  • determine if t-route still supports original hdf5 writing capabilities; add support if still feature. To clarify, t-route will still write a netcdf file, but the file contains no features. I likely don't have things configured correctly though.

@aaraney aaraney added enhancement New feature or request ngen.cal Related to ngen.cal package labels Aug 12, 2024
@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch 2 times, most recently from ebf943d to 3cefc3e Compare August 12, 2024 21:58
@aaraney aaraney self-assigned this Aug 12, 2024
@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch 2 times, most recently from 6831aba to ee990e6 Compare August 12, 2024 22:09
@hellkite500
Copy link
Member

hellkite500 commented Aug 14, 2024

determine if t-route still supports original hdf5 writing capabilities; add support if still feature

I believe it technically does, but the configuration/code path to get to it is such that it probably isn't worth the effort currently.

@hellkite500
Copy link
Member

It looks like this is based on another branch, with all the PR's in review, I'm going to wait on this one till we can sort through the dependent branch(es) and rebase this before I review completely.

@aaraney
Copy link
Member Author

aaraney commented Aug 14, 2024

It looks like this is based on another branch, with all the PR's in review, I'm going to wait on this one till we can sort through the dependent branch(es) and rebase this before I review completely.

Yeah If you set the following in your t-route config, then t-route will output a netcdf file. I tried this with the LowerColorado_TX_v4 test case and got a netcdf file that does not contain any features.

output_parameters:
  chanobs_output:
    chanobs_output_directory: output/
    chanobs_filepath: output.nc

Ill follow up with the t-route team b.c. I don't think this should be the case.

@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch from ee990e6 to 6ae9574 Compare August 15, 2024 18:21
@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch from 6ae9574 to 0e52dad Compare August 15, 2024 19:03
except FileNotFoundError:
print(f"{self._output_file} not found. Current working directory is {Path.cwd()}")
def get_output(self, id: str) -> pd.Series | None:
assert self._ngen_realization is not None, "ngen realization required"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert seems redundant, which leads me to wonder if/how ngen_cal_model_configure is guaranteed to be called before the output hook. If it is, then definitely redundant, if it isn't then the error should probably indicate the need to ensure the dependent hook has been executed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could certainly see it as redundant but I see it more as enforcing / plainly saying the invariants of the method. The cost benefit of the price paid for the check vs. what you visually get when reading the code / the improved static analysis feedback you get from a tool like mypy or some other linter IMO far out weights having a redundant check (plus there is always python -O if you want to remove asserts for performance reasons).

I also added this as a testing precaution. Ive not written a testing harness for the plugin system yet, so testing plugins without enforcing invariants at the hook level can easily lead to false positives in unit tests. I thought this was the easiest way to get the best of both worlds until a harness is added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not concerned about having the redundant assert, was just thinking through how it could possible be triggered and particularly if there was a better error message/description we use to indicate how it got into that bad state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main way in my mind that this could happen is in a unit test that doesn't call ngen_cal_model_configure / pass a valid ngen.config.NgenRealization instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broader context, i've picked up an appreciation for assertions having reviewed some of tiger beetle's codebase / style guide.

Assert all function arguments and return values, pre/postconditions and invariants. A
function must not operate blindly on data it has not checked. The purpose of a function is to
increase the probability that a program is correct. Assertions within a function are part of how
functions serve this purpose. The assertion density of the code must average a minimum of two
assertions per function.

fn = _stream_output_netcdf_v1(self._output_file)
elif filetype == ".parquet":
fn = _stream_output_parquet_v1(self._output_file)
elif filetype == ".pkl":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop pkl support. I think one human readable format and one binary is sufficient. Parquet is probably the preferable binary format, but I suppose we can use netcdf as a nicety...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, plus pkl serialization can change based on the python version (although I think that is more or less a non-issue now). Im fine with dropping pkl support and we can always add it in the future if there is a real need.

return int(r_dt)


def _csv_output_v1(p: Path, realization: NgenRealization) -> _NgenCalModelOutputFn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the biggest concerns about using these as module level functions is the lack of conceptual encapsulation. ngen_output is module for ngen output, not just t-route output, and may (probably should) extend its capabilities to use/provide other ngen model outputs, which could get a bit messy having these as module scoped funtions with seemingly generic names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can empathize with the names feeling a little generic. I mirrored the naming with how they are named in t-route's config (see here for more info). So for example, this function maps to a t-route config that looks like:

output_parameters:
  csv_output:
    csv_output_folder: output/

Other example:

output_parameters:
  stream_output: 
      stream_output_directory: output/
      stream_output_time: 10000000 # [hr]
      stream_output_type: '.csv'
      stream_output_internal_frequency: 60

@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch from 0e52dad to 439e39c Compare August 21, 2024 14:44
@aaraney aaraney force-pushed the t-route-fix-reading-channel-files branch from 439e39c to ac13288 Compare August 21, 2024 15:07
@hellkite500 hellkite500 changed the title ngen.cal support for t-route's netcdf, parquet, pickle, and 3 csv output formats variants ngen.cal support for t-route's netcdf, parquet, ~pickle~, and 3 csv output formats variants Aug 21, 2024
@hellkite500 hellkite500 changed the title ngen.cal support for t-route's netcdf, parquet, ~pickle~, and 3 csv output formats variants ngen.cal support for t-route's netcdf, parquet, and 3 csv output formats variants Aug 21, 2024
@hellkite500 hellkite500 merged commit 5e28b09 into NOAA-OWP:master Aug 21, 2024
13 checks passed
@aaraney aaraney deleted the t-route-fix-reading-channel-files branch August 21, 2024 16:53
@aaraney
Copy link
Member Author

aaraney commented Aug 21, 2024

Thanks, @hellkite500!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ngen.cal Related to ngen.cal package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants