Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert river and tidal modules to xarray #285

Merged
merged 45 commits into from
Mar 14, 2024

Conversation

akeeste
Copy link
Contributor

@akeeste akeeste commented Jan 17, 2024

This PR converts the river and tidal modules to be based in xarray

TODO list

  • for functions where pandas is an input, add allow xarray as an allowed input
  • for functions that take pandas or xarray input, convert the internal functionality to be xarray-based
  • add tests for xarray input
  • for functions that output pandas, add a flag to output xarray
  • all tests passing
  • river, tidal, tidal performance notebooks running

Modules completed:

  • river.graphics
  • river.performance
  • river.resource
  • river.io.usgs
  • river.io.d3d / tidal.io.d3d
  • tidal.graphics
  • tidal.performance
  • tidal.resource
  • tidal.io.noaa

@akeeste akeeste changed the title Convert river module to xarray Convert river and tidal modules to xarray Feb 29, 2024
@akeeste akeeste requested a review from ssolson March 5, 2024 17:03
@ssolson
Copy link
Contributor

ssolson commented Mar 6, 2024

@akeeste could you take a look at fixing the PR failed tests before I review?

_______________________ TestIO.test_load_usgs_data_daily _______________________

self = <mhkit.tests.river.test_io_usgs.TestIO testMethod=test_load_usgs_data_daily>

    def test_load_usgs_data_daily(self):
        file_name = join(datadir, "USGS_0831[30](https://github.com/MHKiT-Software/MHKiT-Python/actions/runs/8147648568/job/22268844047?pr=285#step:6:31)00_Jan2019_daily.json")
        data = river.io.usgs.read_usgs_file(file_name)
    
        expected_index = pd.date_range("2019-01-01", "2019-01-[31](https://github.com/MHKiT-Software/MHKiT-Python/actions/runs/8147648568/job/22268844047?pr=285#step:6:32)", freq="D")
        self.assertEqual(data.columns, ["Discharge, cubic feet per second"])
>       self.assertEqual((data.index == expected_index.tz_localize("UTC")).all(), True)
E       AssertionError: False != True

@akeeste
Copy link
Contributor Author

akeeste commented Mar 6, 2024

@akeeste could you take a look at fixing the PR failed tests before I review?

_______________________ TestIO.test_load_usgs_data_daily _______________________

self = <mhkit.tests.river.test_io_usgs.TestIO testMethod=test_load_usgs_data_daily>

    def test_load_usgs_data_daily(self):
        file_name = join(datadir, "USGS_0831[30](https://github.com/MHKiT-Software/MHKiT-Python/actions/runs/8147648568/job/22268844047?pr=285#step:6:31)00_Jan2019_daily.json")
        data = river.io.usgs.read_usgs_file(file_name)
    
        expected_index = pd.date_range("2019-01-01", "2019-01-[31](https://github.com/MHKiT-Software/MHKiT-Python/actions/runs/8147648568/job/22268844047?pr=285#step:6:32)", freq="D")
        self.assertEqual(data.columns, ["Discharge, cubic feet per second"])
>       self.assertEqual((data.index == expected_index.tz_localize("UTC")).all(), True)
E       AssertionError: False != True

@ssolson yeah definitely, i'll see what's going on and ping you when it's ready

@akeeste
Copy link
Contributor Author

akeeste commented Mar 7, 2024

@ssolson Sorry for the wait, I think I have the tests passing now. I'll double check tomorrow morning. I was having some issues converting river.io.usgs.py and especially variable_interpolation and turbulence_intensity in river.io.d3d.py to xarray. The calls to scipy.interp.griddata were proving difficult with d3d data in xr.Dataset format. For now, I reverted those functions to use pandas, but added the option to return xarray datasets. So river.io and tidal.io functions are not yet based in xarray, but can now return xarray.

@akeeste
Copy link
Contributor Author

akeeste commented Mar 8, 2024

@ssolson okay tests are finally passing! I had to go back and fix a few changes I made to how some river.resource functions output their data, but that is all working as it was before. Again noting that the river and tidal io functions can now output xarray, but are not based in xarray yet because they were tricky

@akeeste akeeste mentioned this pull request Mar 8, 2024
Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akeeste thank you for putting this together. I only have a couple of comments.

I see that the overlap in our util functions do not explicitly overlap. I think the name of my utils file is preferable e.g. I think type_handling is more descriptive of what these functions do than data_utils.

The function I propose in #288 ensures the data is numeric in addition to the data structure check. E.g. this function will currently accept ['a', 'b', 'c'] and then fail in calculation when performing operations on a string.

Overall though I am not too worried about this and I am happy to mesh these functions together as I pass through with the linting in order to get our release out.

mhkit/tests/tidal/test_performance.py Show resolved Hide resolved
mhkit/tests/utils/test_utils.py Outdated Show resolved Hide resolved
mhkit/tidal/performance.py Outdated Show resolved Hide resolved
mhkit/tidal/performance.py Outdated Show resolved Hide resolved
mhkit/tidal/performance.py Outdated Show resolved Hide resolved
mhkit/utils/data_utils.py Outdated Show resolved Hide resolved
@akeeste
Copy link
Contributor Author

akeeste commented Mar 13, 2024

Thanks for the review @ssolson . I addressed all of your review comments, ported my functions to your much better named utils.type_handling 😊, and can confirm that the river, tidal, and tidal performance examples all run without errors or warnings.

@akeeste
Copy link
Contributor Author

akeeste commented Mar 13, 2024

Within test_utils, the name change from convert_to_dataarray to test_convert_to_dataarray made that test actually run. In convert_to_dataarray there was one minor inconsistency with how datasets are cast to dataarrays. the xr to_array function always makes the variable name a new dimension, which is not necessary in our scenario and makes the returned dataarray different from how all other types are converted. I fixed this case so the function works consistently

@akeeste akeeste merged commit 5d014f4 into MHKiT-Software:develop Mar 14, 2024
15 checks passed
@ssolson ssolson mentioned this pull request Apr 24, 2024
@ssolson ssolson mentioned this pull request May 6, 2024
ssolson added a commit that referenced this pull request May 8, 2024
# MHKiT v0.8.0
We're excited to announce the release of MHKiT v0.8.0, which brings a host of new features, enhancements, and bug fixes across various modules, ensuring compatibility with Python 3.10 and 3.11, and introducing full xarray support for more flexible data handling. Significant updates in the Wave and DOLfYN modules improve functionality and extend capabilities.

## Python 3.10 & 3.11 Support
MHKiT now supports python 3.10 and 3.11. Support for 3.12 will follow in the next minor update.
- #240


## Wave Module
### Enhancements:
**Automatic Threshold Calculation for Peaks-Over-Threshold**: We've introduced a new feature that automatically calculates the "best" threshold for identifying significant wave events. This method, originally developed by Neary, V. S., et al. in their 2020 study, has now been translated from Matlab to Python, enhancing our existing peaks-over-threshold functionality.

**Wave Heights Analysis**: A new function, `wave_heights`, has been added to extract the heights of individual waves from a time series. This function uses zero up-crossing analysis to accurately measure wave heights, improving upon our previous methods which only provided the maximum value between up-crossings.

**Enhanced Zero Crossing Analysis**: Building on the above, the zero crossing code previously embedded in `global_peaks` has been isolated into a helper function. This modular approach not only refines the codebase but also supports new functionalities such as calculating wave heights, zero crossing periods, and identifying crests.

### Bug Fixes:
**Contour Sampling Error in Wave Contours**: A bug identified in `mhkit.wave.contours.samples_contour` has been resolved. The issue occurred when period samples defined using the maximum period resulted in values outside the interpolation range of the contour data. This was corrected by ensuring that all sampling points are within the interpolation range and adjusting the contour data selection process accordingly.

- #268 
- #252 
- #278


## Xarray Support
MHKiT functions now fully support the use of xarray for passing and returning data.

- #279 
- #282
- #285
- #302
- #310


## DOLfYN

Thanks to the many user contributions and users of MHKiT the DOLFYN module include a significant number of enhancements and bug fixes. 

### Enhancements:
**Altimeter Support**: Enhanced the Nortek Signature Reader to add capability for reading ADCP dual profile configurations.

**Data Handling Improvements**: Introduced logic to skip messy header data that can accumulate during measurements collected via Nortek software on PCs and Macs.

**Instrument Noise Subtraction**: Added a function to subtract instrument noise from turbulence intensity estimation using RMS calculations, providing results that differ by approximately 1% from the existing standard deviation-based "I" property.

**Improved File Handling**: Updates for RDI files to handle changing "number of cells" and variable "cell sizes," which are now bin-averaged into the largest cell size.

### Bug Fixes:
**Power Spectra Calculation**: Fixed a bug where a given noise value was not being subtracted from the power spectra, and noise was inadvertently added as an input to dissipation rate calculations.

**Improved Header Handling**: Allowed RDI reader to skip junk headers effectively.

**Nortek Reader C Types Update**: Adjusted C types in the Nortek reader to handle below-zero water temperatures and to allow pitch and roll values to go negative.


- #280 
- #289
- #290
- #292
- #293
- #294
- #299


## River & Tidal: D3D
Added limits to `variable_interpolation` and added 3 array input capability to `create_points`
- #271

## Developer Experience
### Black formatting
Black formatting is now enforced on all MHKiT files. This ensures consistent formatting across the MHKiT package.
- #281

### Linting & Type Hints
MHKiT is in the process of enforcing pylint and adding type hints to all functions. Currently this has been achieved and is enforced in the Loads and Power modules.
- #288 
- #296 

### CI/CD
This release introduces significant reduction in testing time for development. This is achieved by reducing the number of tests for pulls against the develop branch and only running hindcast test when changes are made to it. A bug in the hindcast CI was fixed which only ran on changes to the hindcast tests instead of the hindcast module. Additionally the wave and wind hindcast needed to be separated in 2 jobs due to the excessive time taken to run a wind cache. This created a number of follow on PRs around solidifying the logic of these job. A special case for Python 3.8, pip, and Mac OS was added to use homebrew to install NetCDF and HDF5 to get tests to pass.
- #241
- #270
- #306
- #311
- #317
- #318
- #319
- #320
- #324

### Clean Up
MHKiT fixed an implementation error where functions used assert instead of built in errors for type and value checking. Additionally these PRs removed unused files, fixed typos, and created an argument which allows users to run CDIP API calls silently.
- #276
- #272
- #273
@akeeste akeeste deleted the xarray_internal_river branch May 13, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants