-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ngen.cal
support for t-route
's netcdf
, parquet
, and 3 csv
output formats variants
#162
ngen.cal
support for t-route
's netcdf
, parquet
, and 3 csv
output formats variants
#162
Conversation
ebf943d
to
3cefc3e
Compare
6831aba
to
ee990e6
Compare
I believe it technically does, but the configuration/code path to get to it is such that it probably isn't worth the effort currently. |
It looks like this is based on another branch, with all the PR's in review, I'm going to wait on this one till we can sort through the dependent branch(es) and rebase this before I review completely. |
Yeah If you set the following in your output_parameters:
chanobs_output:
chanobs_output_directory: output/
chanobs_filepath: output.nc Ill follow up with the |
ee990e6
to
6ae9574
Compare
6ae9574
to
0e52dad
Compare
except FileNotFoundError: | ||
print(f"{self._output_file} not found. Current working directory is {Path.cwd()}") | ||
def get_output(self, id: str) -> pd.Series | None: | ||
assert self._ngen_realization is not None, "ngen realization required" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert seems redundant, which leads me to wonder if/how ngen_cal_model_configure is guaranteed to be called before the output hook. If it is, then definitely redundant, if it isn't then the error should probably indicate the need to ensure the dependent hook has been executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could certainly see it as redundant but I see it more as enforcing / plainly saying the invariants of the method. The cost benefit of the price paid for the check vs. what you visually get when reading the code / the improved static analysis feedback you get from a tool like mypy
or some other linter IMO far out weights having a redundant check (plus there is always python -O
if you want to remove asserts
for performance reasons).
I also added this as a testing precaution. Ive not written a testing harness for the plugin system yet, so testing plugins without enforcing invariants at the hook
level can easily lead to false positives in unit tests. I thought this was the easiest way to get the best of both worlds until a harness is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not concerned about having the redundant assert, was just thinking through how it could possible be triggered and particularly if there was a better error message/description we use to indicate how it got into that bad state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main way in my mind that this could happen is in a unit test that doesn't call ngen_cal_model_configure
/ pass a valid ngen.config.NgenRealization
instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broader context, i've picked up an appreciation for assertions having reviewed some of tiger beetle's codebase / style guide.
Assert all function arguments and return values, pre/postconditions and invariants. A
function must not operate blindly on data it has not checked. The purpose of a function is to
increase the probability that a program is correct. Assertions within a function are part of how
functions serve this purpose. The assertion density of the code must average a minimum of two
assertions per function.
fn = _stream_output_netcdf_v1(self._output_file) | ||
elif filetype == ".parquet": | ||
fn = _stream_output_parquet_v1(self._output_file) | ||
elif filetype == ".pkl": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop pkl support. I think one human readable format and one binary is sufficient. Parquet is probably the preferable binary format, but I suppose we can use netcdf as a nicety...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, plus pkl
serialization can change based on the python version (although I think that is more or less a non-issue now). Im fine with dropping pkl
support and we can always add it in the future if there is a real need.
return int(r_dt) | ||
|
||
|
||
def _csv_output_v1(p: Path, realization: NgenRealization) -> _NgenCalModelOutputFn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the biggest concerns about using these as module level functions is the lack of conceptual encapsulation. ngen_output
is module for ngen output, not just t-route output, and may (probably should) extend its capabilities to use/provide other ngen model outputs, which could get a bit messy having these as module scoped funtions with seemingly generic names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can empathize with the names feeling a little generic. I mirrored the naming with how they are named in t-route
's config (see here for more info). So for example, this function maps to a t-route
config that looks like:
output_parameters:
csv_output:
csv_output_folder: output/
Other example:
output_parameters:
stream_output:
stream_output_directory: output/
stream_output_time: 10000000 # [hr]
stream_output_type: '.csv'
stream_output_internal_frequency: 60
0e52dad
to
439e39c
Compare
439e39c
to
ac13288
Compare
ngen.cal
support for t-route
's netcdf
, parquet
, pickle
, and 3 csv
output formats variantsngen.cal
support for t-route
's netcdf
, parquet
, ~pickle
~, and 3 csv
output formats variants
ngen.cal
support for t-route
's netcdf
, parquet
, ~pickle
~, and 3 csv
output formats variantsngen.cal
support for t-route
's netcdf
, parquet
, and 3 csv
output formats variants
Thanks, @hellkite500! |
Depends on #161#161 now mergedSee #153 for more background
Additions
t-route
'snetcdf
,parquet
,, and 3pickle
csv
output formats variantsnetcdf
extra require feature flagTODO:
ngen.cal
versiont-route
still supports originalhdf5
writing capabilities; add support if still feature. To clarify,t-route
will still write anetcdf
file, but the file contains no features. I likely don't have things configured correctly though.