Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic NWM client tool with HTTP client tool #120

Merged
merged 24 commits into from
Aug 4, 2021

Conversation

jarq6c
Copy link
Collaborator

@jarq6c jarq6c commented Aug 3, 2021

The nwm_client subsumes the gcp_client and includes an additional interface to access NWM data served from generic web pages. This includes NOMADS, apache server directory listings, and python simple HTTP servers.

Additions

  • nwm_client subpackage
  • hydrotools.nwm_client.http module for accessing NWM NetCDF files from generic web servers

Removals

  • gcp_client tests. The functionality of this package is still tested under nwm_client

Changes

  • deprecated gcp_client. The same functionality is now available under hydrotools.nwm_client.gcp

Testing

  1. All old tests still intact and new minimal tests against NOMADS for nwm_client.http

Notes

  • default installation only includes http. Use pip install hydrotools.nwm_client[gcp] to install dependencies for Google Cloud Platform

Todos

  • Significant overlap in gcp and http in terms of parsing NetCDF files. Will want to thin out the codebase later.

Checklist

  • PR has an informative and human-readable title
  • PR is well outlined and documented. See #12 for an example
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (see CONTRIBUTING.md)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output) using numpy docstring formatting
  • Placeholder code is flagged / future todos are captured in comments
  • Reviewers requested with the Reviewers tool ➡️

@jarq6c jarq6c added the enhancement New feature or request label Aug 3, 2021
@jarq6c jarq6c self-assigned this Aug 3, 2021
@jarq6c
Copy link
Collaborator Author

jarq6c commented Aug 3, 2021

@aaraney may be interested in this added functionality

@jarq6c jarq6c merged commit 8ec5c87 into NOAA-OWP:main Aug 4, 2021
@jarq6c jarq6c deleted the http-client branch August 4, 2021 12:12
@aaraney
Copy link
Member

aaraney commented Aug 5, 2021

@jarq6c thanks for tagging me here. In general, I think the api looks good and provides a worthy alternative to gcp. Great stuff!

I have a few design suggestions that might be worth mulling over (mainly, with the future in mind). Both the hydrotools::nwm_client::gcp::NWMDataService and hydrotools::nwm_client::http::NWMDataService share a common interface (get) and have some code duplication (configurations, crosswalk, cache_path, cache_group, and max_processes).

On approach could be to implement and abstract base class that captures the common interface between the NWMDataService's, namely get, then implement a mixin class that adds the duplicated code mentioned above and have the NWMDataService's be based off of the abstract base class, lets call it AbstactDataService, and also the mixin (DataServiceMixin). However, I think this solution asks the NWMDataService's to do too much. In other words, I think the NWMDataService's have more than a single responsibility -- they have low cohesion.

I think a more durable solution might be to rethink the role of the NWMDataService itself. Personally, I would suggest we decoupled the data service code and end user API, where a data service is for example, gcp or nomads, and the end user API is for example, get. This might look like this following:

gcp_service = GCPDataService("nwm-bucket", "analysis_and_assimilation")

service = NWMDataService(gcp_service)

data = service.get(...)

So, instead the NWMDataService's dependency on some data service is inverted.

First thoughts to implementing something like this:

  1. Create an abstract base class that captures the API of a data service. (AbstractDataService)
  2. Implement/refactor concrete data services, so rip out the gcp and http code and refactor into interface of AbstractDataService. I still like the idea of using one or more mixin class(es) to capture the behavior of configurations, crosswalk, cache_path, cache_group, and max_processes. I think this design is more maintainable and leaves room for extension or exclusion of such features from services that don't need them.
  3. Refactor NWMDataService to basically consume a subclass of AbstractDataService and retrieve data. The NWMDataService should become a loose wrapper (in this implementation). It could also become a factory instead where you specify the source and instead, the NWMDataService creates a subclass of the AbstractDataService and spits out the output.

I think this design pattern is "more maintainable" and extendable. I think the extendable part is why it (could be) worth the labor. I am thinking in the future, there might be a slew of data sources/output specifications for different data coming from NGEN. The aforementioned design would reduce the labor required to add new data sources which might be desirable in the future. There are likely other reasons to spend the labor, but that is all I can think of at the moment.

As always, this is just a suggestion. This may be over engineering a solution to a problem that we will never have, but I thought I would mention it for the records sake and to spur some conversation.

@jarq6c
Copy link
Collaborator Author

jarq6c commented Aug 5, 2021

@aaraney

I agree in so far as the need to reduce code duplication (see the Notes section in the original PR comment). In the shorter term, I may implement a minimal Abstract class, but I'm not entirely familiar with all the design patterns you suggested and I'm concerned it may be too large a maintainability burden for me. For example, while trying to diagnosis #107 I found that _restclient and nwis_client are far more sophisticated than I had anticipated. Any future changes to those will have to rely on more competent developers. Things may change after AGU if we can generate more interest and attract additional developers.

Edit: Also, you're a prophet. The unified interface was partially motivated to start positioning the package for better integration with NGEN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants