Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert wave module to xarray #302

Merged
merged 43 commits into from
Apr 24, 2024

Conversation

akeeste
Copy link
Contributor

@akeeste akeeste commented Apr 3, 2024

This PR converts the wave module to be based in xarray.

TODO list

  • for functions where pandas is an input, add allow xarray as an allowed input
  • for functions that take pandas or xarray input, convert the internal functionality to be xarray-based
  • add tests for xarray input
  • for functions that output pandas, add a flag to output xarray
  • all tests passing
  • wave notebooks running

Modules completed:

  • wave.contours --> based in xarray
  • wave.performance --> based in xarray
  • wave.resource --> based in xarray
  • wave.graphics --> IO allows xarray
  • wave.io.cdip --> IO allows xarray
  • wave.io.ndbc --> IO allows xarray
  • wave.io.swan --> IO allows xarray
  • wave.io.wecsim --> IO allows xarray
  • wave.hindcast.hindcast --> IO allows xarray
  • wave.hindcast.wind_toolkit --> IO allows xarray

@akeeste akeeste changed the base branch from master to develop April 3, 2024 21:10
@akeeste
Copy link
Contributor Author

akeeste commented Apr 3, 2024

At long last... (kind of, a little more work to go)

@ssolson I still have a bit more work that I will push before I'm on travel. Fortunately I was able to get most of my previous work on the wave module working smoothly and that is included here. My goal before travel is to finish adding xarray IO to each function for the wave.io module and make sure all of the current tests are passing. I may need your help adding some tests using xarray. Will update further in a day or two

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akeeste this looks good overall. I have pretty minor comments here to address. Hopefully we can get this turned around quickly next week for a release.

Comment on lines -58 to -59
self.assertEqual(ws_output["cables"].name, "Cable")
self.assertEqual(ws_output["constraints"]["constraint1"].name, "Mooring")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we lose Cable and Mooring in the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the cable and mooring outputs are still present. But a DataFrame's name parameter is dropped when converting to a Dataset. There's not a direct equivalent in the Dataset. The closest thing is Dataset attribute that could hold the DataFrame's name, but that is not done automatically by xarray. These two are not particularly robust or insightful tests either, so I dropped them in interest of completing this PR.

Comment on lines 191 to 194
"""
Recursively searches inside nested dictionaries for pandas DataFrames to
convert to xarray Datasets.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs full docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include parameters, returns and specifies where the function is typically used

Comment on lines 425 to 437
to_pandas: bool (optional)
Flag to output pandas instead of xarray. Default = True.

Returns
-------
results: dictionary
'vars1D': DataFrame
1D variables indexed by time
'data': dictionary of variables
'vars': pandas DataFrame or xarray Dataset
1D variables indexed by time
'vars2D': dictionary of DataFrames or Datasets, optional
If 2D-vars are passed in the 'parameters key' or if run
with all_2D_variables=True, then this key will appear
with a dictonary of DataFrames/Datasets of 2D variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the returns should now be DataArray or Dictionary.

Which makes the variable to_pandas not strictly correct but maybe its close enough as it is a dictionary of DataFrames, and is consistent with other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, the function is now returning a dictionary of DataFrames or a dictionary of Datasets. I prefer to keep to_pandas named the way it is for consistency, but I will update it's description to note that it specifies whether the output is a dictionary of pandas or xarray objects

mhkit/wave/io/ndbc.py Outdated Show resolved Hide resolved
mhkit/wave/io/ndbc.py Outdated Show resolved Hide resolved
mhkit/wave/resource.py Outdated Show resolved Hide resolved
mhkit/wave/resource.py Outdated Show resolved Hide resolved
Comment on lines +298 to +299
if not list(phases.data_vars) == list(S.data_vars):
raise ValueError("phases must have the same variable names as S")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this a different check than previously.

Shouldn't we still ensure the array sizes match?

Copy link
Contributor Author

@akeeste akeeste Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a check to see if each variable within phases is the same size as that within S

mhkit/wave/resource.py Outdated Show resolved Hide resolved
mhkit/wave/resource.py Show resolved Hide resolved
@akeeste
Copy link
Contributor Author

akeeste commented Apr 22, 2024

@ssolson I addressed all of you review comments. Let me know if you have any other feedback on this PR

@akeeste
Copy link
Contributor Author

akeeste commented Apr 23, 2024

I merged develop into this branch. The linting test should pass now

@akeeste
Copy link
Contributor Author

akeeste commented Apr 23, 2024

Not sure why the power linting tests are still failing. The only change to the power module is fixing an TypeError in power.quality. I can address the one linting suggestion for power.characteristics easily enough, but I'm unsure why this is failing here and now.

@ssolson
Copy link
Contributor

ssolson commented Apr 23, 2024

@akeeste I was testing the examples and the cdip example is failing for me. This is unreleated to your changes. I am running py 3.11 on windows. Can you tell me if you are getting a similar error:
image

I believe the fix for this on 267 is:

start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
start_date = start_date.replace(tzinfo=pytz.UTC)

then the same for end date on 278.

@akeeste
Copy link
Contributor Author

akeeste commented Apr 23, 2024

@ssolson Thanks for the reminder, I'm glad you caught that. I did see this warning recently when running the CDIP example outside of this PR. I agree that your fix should work; I'll add a commit

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akeeste thanks for your work on this. I think this is in a good place to merge.

@akeeste
Copy link
Contributor Author

akeeste commented Apr 24, 2024

Hey @ssolson I caught one last thing. The wave.io.hindcast.wind_toolkit function needs a parameter to convert data to xarray instead of pandas. I'll add this quickly today, then we can merge this PR. That should completely round out xarray IO for all of our functions

@ssolson ssolson mentioned this pull request Apr 24, 2024
@akeeste akeeste merged commit 8651d01 into MHKiT-Software:develop Apr 24, 2024
15 checks passed
@ssolson ssolson mentioned this pull request May 6, 2024
ssolson added a commit that referenced this pull request May 8, 2024
# MHKiT v0.8.0
We're excited to announce the release of MHKiT v0.8.0, which brings a host of new features, enhancements, and bug fixes across various modules, ensuring compatibility with Python 3.10 and 3.11, and introducing full xarray support for more flexible data handling. Significant updates in the Wave and DOLfYN modules improve functionality and extend capabilities.

## Python 3.10 & 3.11 Support
MHKiT now supports python 3.10 and 3.11. Support for 3.12 will follow in the next minor update.
- #240


## Wave Module
### Enhancements:
**Automatic Threshold Calculation for Peaks-Over-Threshold**: We've introduced a new feature that automatically calculates the "best" threshold for identifying significant wave events. This method, originally developed by Neary, V. S., et al. in their 2020 study, has now been translated from Matlab to Python, enhancing our existing peaks-over-threshold functionality.

**Wave Heights Analysis**: A new function, `wave_heights`, has been added to extract the heights of individual waves from a time series. This function uses zero up-crossing analysis to accurately measure wave heights, improving upon our previous methods which only provided the maximum value between up-crossings.

**Enhanced Zero Crossing Analysis**: Building on the above, the zero crossing code previously embedded in `global_peaks` has been isolated into a helper function. This modular approach not only refines the codebase but also supports new functionalities such as calculating wave heights, zero crossing periods, and identifying crests.

### Bug Fixes:
**Contour Sampling Error in Wave Contours**: A bug identified in `mhkit.wave.contours.samples_contour` has been resolved. The issue occurred when period samples defined using the maximum period resulted in values outside the interpolation range of the contour data. This was corrected by ensuring that all sampling points are within the interpolation range and adjusting the contour data selection process accordingly.

- #268 
- #252 
- #278


## Xarray Support
MHKiT functions now fully support the use of xarray for passing and returning data.

- #279 
- #282
- #285
- #302
- #310


## DOLfYN

Thanks to the many user contributions and users of MHKiT the DOLFYN module include a significant number of enhancements and bug fixes. 

### Enhancements:
**Altimeter Support**: Enhanced the Nortek Signature Reader to add capability for reading ADCP dual profile configurations.

**Data Handling Improvements**: Introduced logic to skip messy header data that can accumulate during measurements collected via Nortek software on PCs and Macs.

**Instrument Noise Subtraction**: Added a function to subtract instrument noise from turbulence intensity estimation using RMS calculations, providing results that differ by approximately 1% from the existing standard deviation-based "I" property.

**Improved File Handling**: Updates for RDI files to handle changing "number of cells" and variable "cell sizes," which are now bin-averaged into the largest cell size.

### Bug Fixes:
**Power Spectra Calculation**: Fixed a bug where a given noise value was not being subtracted from the power spectra, and noise was inadvertently added as an input to dissipation rate calculations.

**Improved Header Handling**: Allowed RDI reader to skip junk headers effectively.

**Nortek Reader C Types Update**: Adjusted C types in the Nortek reader to handle below-zero water temperatures and to allow pitch and roll values to go negative.


- #280 
- #289
- #290
- #292
- #293
- #294
- #299


## River & Tidal: D3D
Added limits to `variable_interpolation` and added 3 array input capability to `create_points`
- #271

## Developer Experience
### Black formatting
Black formatting is now enforced on all MHKiT files. This ensures consistent formatting across the MHKiT package.
- #281

### Linting & Type Hints
MHKiT is in the process of enforcing pylint and adding type hints to all functions. Currently this has been achieved and is enforced in the Loads and Power modules.
- #288 
- #296 

### CI/CD
This release introduces significant reduction in testing time for development. This is achieved by reducing the number of tests for pulls against the develop branch and only running hindcast test when changes are made to it. A bug in the hindcast CI was fixed which only ran on changes to the hindcast tests instead of the hindcast module. Additionally the wave and wind hindcast needed to be separated in 2 jobs due to the excessive time taken to run a wind cache. This created a number of follow on PRs around solidifying the logic of these job. A special case for Python 3.8, pip, and Mac OS was added to use homebrew to install NetCDF and HDF5 to get tests to pass.
- #241
- #270
- #306
- #311
- #317
- #318
- #319
- #320
- #324

### Clean Up
MHKiT fixed an implementation error where functions used assert instead of built in errors for type and value checking. Additionally these PRs removed unused files, fixed typos, and created an argument which allows users to run CDIP API calls silently.
- #276
- #272
- #273
@akeeste akeeste deleted the xarray_internal_wave branch May 13, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants