Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Using dataretrieval as searvey backend #59

Closed
SorooshMani-NOAA opened this issue Dec 12, 2022 · 6 comments
Closed

Question: Using dataretrieval as searvey backend #59

SorooshMani-NOAA opened this issue Dec 12, 2022 · 6 comments

Comments

@SorooshMani-NOAA
Copy link

Hi @thodson-usgs, we started exploring using dataretrieval and soon realized there are a lot of metadata provided by the REST API that is discarded and not passed to the user when using get_record. This metadata can provide a lot of information about what type of data we're looking at, e.g. name of variable, units, etc. which will be hidden under variable codes columns with get_record.

This made us (searvey dev) wonder if using REST API directly might still make more sense or not. Per @brey's suggestion we wanted to reach out to you and ask your opinion before moving on with having our implementation from scratch. For more information please take a look at oceanmodeling/searvey#14 (comment)

A couple of points we're thinking about:

  • Is there any reason why most of that metadata is removed from the response?
  • What is the benefit to the user of dataretrieval compared to them using REST API, if the final data-frame columns need to be reorganized anyways (whether it's from REST or dataretrieval)?

In summary, searvey is supposed to be a light wrapper around data provider calls to then [minimally] reorganize the data and present to the end-user. dataretrieval seems to have the similar goals, but with a different way of transforming the data. That made us wonder if it makes sense to use it to retrieve the data and then re-transform the data to our liking!

@elbeejay
Copy link
Contributor

Hi @SorooshMani-NOAA, I'm obviously not Tim but I'll weigh in with some thoughts anyway.

there are a lot of metadata provided by the REST API that is discarded and not passed to the user when using get_record

This is true. The get_record function is a convenience wrapper around the individual functions for different types of data, such as the instantaneous value data from NWIS, and the corresponding get_iv function. Each of these individual functions, such as get_iv, actually do bundle up and return the metadata associated with the NWIS API call (get_iv API reference documentation).

To your other question, which is generally "what is the point of using dataretrieval rather than using the REST API directly?" I think it depends a bit on your end-use case and how much transformation of the data you want/need to do. The biggest benefit to referencing dataretrieval rather than pointing at the REST API directly, is that the we are aiming to maintain and update the API queries within the dataretrieval package. So if something changes on the REST API-side, a package depending on dataretrieval should not have to make any big changes, on the dataretrieval side we will try to provide deprecation warnings and backwards-compatibility as much as possible. If you interface with the REST API directly, then you have to keep an eye out and react and modify your package to reflect any changes to the upstream service. So if the format of the dataretrieval returns is close to what you need, then I think it would make sense to use dataretrieval as a dependency. If your data format is drastically different from what dataretrieval returns (which is reasonably close to the return from the API queries), then you might want to query the REST API yourself.

I hope that helps and that I understood the gist of your questions correctly.

Jay

@SorooshMani-NOAA
Copy link
Author

@elbeejay, thank you for the detailed explanation. I had missed that the metadata returned by get_iv can include more info such as parameter code!! I need to spend some more time to make sure I understand the full potential of dataretrieval.

On a side note, another concern that I/we have is how long it might take for bugs we notice to be fixed if we depend on dataretrieval as our upstream. I don't mean to ask for commitment, but in general is there active development on this repo to fix bugs that break downstream package use-cases? (I will create a ticket for the issue I noticed)

@thodson-usgs
Copy link
Collaborator

thodson-usgs commented Dec 13, 2022

@SorooshMani-NOAA,
@elbeejay summed it up nicely.
I think get_record should be deprecated or revised to return the metadata. I'll spare you the backstory. Most of the core users had shifted to the lower-level functions like get_iv, so it isn't ideal that I use it as an example in the README. I suppose I liked it because it was the simplest starting point for new users.

As for bugs and the time it takes to fix them. We do our best under the constraints we have. There are several other good Python packages for pulling USGS hydrologic data, but ours is a good place to contribute because dataretrieval is the only one supported directly by USGS (the recent contributions by @elbeejay). From my perspective, dataretrieval's value is as a simple dependable tool like the classic unix command line utilities: string them together to achieve great things. You can also wrap our APIs yourself or use more complicated tools; the choice depends on your needs, but with any of these projects, including dataretrieval, the risk is that one day it will lose its community. Keeping things simple minimizes that risk, and in the meantime, I hope you find it a useful tool as I have.

@SorooshMani-NOAA
Copy link
Author

@thodson-usgs thank you for your take on this. As you know our initial plan was to use dataretrieval in our package; after I explored using dataretreival on its own a little bit, I wanted to play devil's advocate to see what makes sense for searvey. In the end I obviously missed some of the features related to metadata that led to this discussion.

I close this ticket, since I know your perspective for using dataretrieval. Thank you!

@elbeejay
Copy link
Contributor

If you find a gap in what dataretrieval is offering, please let us know @SorooshMani-NOAA and we can work to either add a missing feature, or help you come up with a solution for your use-case. As @thodson-usgs mentioned, the aim is to keep this project a "simple dependable tool" and we expect to have the resources to maintain this going forward.

@SorooshMani-NOAA
Copy link
Author

Sure, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants