Compute CDR uptake curve#417
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
ScottEilerman
left a comment
There was a problem hiding this comment.
A few optional suggestions/comments, mostly about small performance gains. Looks good!
| dim=["eta_rho", "xi_rho"] | ||
| ) | ||
| * ds_cdr["window_length"] | ||
| ).cumsum(dim="time") |
There was a problem hiding this comment.
Are the flux values 2-d arrays at the top layer, or are they 3-d (and presumably zero) for all the deeper layers? I assume the former, but if the latter, it'd be an efficiency gain to select s_rho=0 (or -1) somewhere in here, right?
There was a problem hiding this comment.
They are 2-d arrays and have no s_rho dimension.
| ) | ||
|
|
||
| # Normalize by cumulative source with safe division (NaN where source=0) | ||
| with np.errstate(divide="ignore", invalid="ignore"): |
There was a problem hiding this comment.
I've never seen this errstate thing before. Interesting!
An alternative would be to formulate the below like uptake_efficiency_flux = np.where(source == 0, np.nan, flux / source). I suspect (but haven't tested) that it might be a little faster too. Error handling can be slow, so if the current code is actually triggering and then bypassing a lot of errors, it might be slower.
This assumes we're only concerned with zeros in source and not nans in source or flux. If that's the case, you can probably disregard.
There was a problem hiding this comment.
The point of np.errstate here was to suppress a bunch of annoying warnings that the user will see if we divide by zero. Maybe there is a better way to do this?
There was a problem hiding this comment.
I thought that doing uptake_efficiency_flux = np.where(source == 0, np.nan, flux / source) would get rid of the warnings by not dividing by zero in the first place, but after thinking for an additional second and then also trying it in a terminal, it looks like it does still calculate the entirety of flux/source prior to subsetting/selecting, so nevermind. 🙃
| # Pick the variable | ||
| field = self.ds[var_name] | ||
| # Pick the variable, prefer ds_cdr if it exists | ||
| field = ds_cdr[var_name] if ds_cdr and var_name in ds_cdr else self.ds[var_name] |
There was a problem hiding this comment.
In general, a preferable pattern to some of these "does it exist" checks would be to make ds_cdr a @property that gets lazily calculated the first time it's needed. That way a user or developer doesn't need to have pre-knowledge about its existence or worry about getattrs and Nones. It makes certain linters happier too.
Here, we'd have to think a little on this logic. Probably we'd want a list of possible variables in ds_cdr, so that we don't trigger the cdr analysis just to see if a var_name might be in there.
Consider this one optional. It'll work fine as-is, and the headache of maintaining var lists or worrying about when the computation might get triggered could outweigh the other considerations.
There was a problem hiding this comment.
Oh! It turns out we don’t need to check for ds_cdr in the plot method after all. The only variables present in ds_cdr but not in ds are the uptake curves, which are plotted using a separate function, plot_uptake_efficiency, since plot doesn’t actually handle time series.
This check is just a leftover from an intermediate commit. I will remove it!
| times = ds["abs_time"] | ||
|
|
||
| # Check for monotonically increasing times | ||
| if not np.all(np.diff(times) > np.timedelta64(0, "s")): |
There was a problem hiding this comment.
Probably insignificant, but there's a slight efficiency gain to be had -- see the last code block here.
|
I think this is good to merge, @ScottEilerman? |
|
Yep go for it! |
This PR adds the method
.cdr_metrics()to theROMSOutputclass. The method computes the required CDR diagnostics (if not already present), save them in the attribute.ds_cdr, and produce a plot of the uptake efficiency over time.pre-commit run --all-filesdocs/releases.mddocs/api.rst