Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiG-CZ: Fetch time series data values from WDC services via WaterML #2238

Closed
aufdenkampe opened this issue Sep 7, 2017 · 14 comments
Closed
Assignees
Labels

Comments

@aufdenkampe
Copy link
Member

aufdenkampe commented Sep 7, 2017

All services that are discoverable in the CUAHSI Water Data Center (WDC) catalog are registered to provide data values in WaterML via Water On Flow (WOF) web services. We need to implement this capability to view time series data for both the BiG-CZ Portal and also for Monitor My Watershed.

There are 3 variants of WaterML (1.0, 1.1, 2.0). We will use 1.1, as it is most uniformly provided.

For Documentation from WDC, see the "Time Series Data Storage and Transmission" section of this page: https://www.cuahsi.org/data-models/for-developers​. These are the most useful PDFs from that help text:

Here are two prior notebooks that @emiliom created demonstrating the use of the Ulmo Python library for WOF/WaterML access of both site metadata and time series data:

cc: @rajadain, @emiliom, @lsetiawan

@rajadain rajadain added the BigCZ label Sep 7, 2017
@ajrobbins ajrobbins added the 1 label Sep 14, 2017
rajadain added a commit that referenced this issue Sep 14, 2017
 * Title now features location, which is extracted from the id
 * Source has service_org and service_title, with the description
   inside a popup
 * Instead of the calendar icon, we now use the text "Data
   collected on" or "Data collected between" based on if the end
   date is different from the beginning date
 * Sample mediums are listed
 * Source Data button is shown if details_url exists
 * Web Services button is shown pointing to the service url
 * Last collected value is shown in relative time, using a new
   toTimeAgo filter
 * A table of concept keywords is added
   - This currently only features the keyword name
   - Values and units will be pulled in the future, see #2238
 * There is comment placeholder for adding charts, coming in #2238
 * Citation in the bottom
 * Some style edits to make it all fit

The popover() and bootstrapTable() functions will only execute if
the template has any corresponding data-toggle elements. Thus,
they noop for CINERGI and HydroShare.
rajadain added a commit that referenced this issue Sep 14, 2017
 * Title now features location, which is extracted from the id
 * Source has service_org and service_title, with the description
   inside a popup
 * Instead of the calendar icon, we now use the text "Data
   collected on" or "Data collected between" based on if the end
   date is different from the beginning date
 * Sample mediums are listed
 * Source Data button is shown if details_url exists
 * Web Services button is shown pointing to the service url
 * Last collected value is shown in relative time, using a new
   toTimeAgo filter
 * A table of concept keywords is added
   - This currently only features the keyword name
   - Values and units will be pulled in the future, see #2238
 * There is comment placeholder for adding charts, coming in #2238
 * Citation in the bottom
 * Some style edits to make it all fit

The popover() and bootstrapTable() functions will only execute if
the template has any corresponding data-toggle elements. Thus,
they noop for CINERGI and HydroShare.
rajadain added a commit that referenced this issue Sep 15, 2017
 * Title now features location, which is extracted from the id
 * Source has service_org and service_title, with the description
   inside a popup
 * Instead of the calendar icon, we now use the text "Data
   collected on" or "Data collected between" based on if the end
   date is different from the beginning date
 * Sample mediums are listed
 * Source Data button is shown if details_url exists
 * Web Services button is shown pointing to the service url
 * Last collected value is shown in relative time, using a new
   toTimeAgo filter
 * A table of concept keywords is added
   - This currently only features the keyword name
   - Values and units will be pulled in the future, see #2238
 * There is comment placeholder for adding charts, coming in #2238
 * Citation in the bottom
 * Some style edits to make it all fit

The popover() and bootstrapTable() functions will only execute if
the template has any corresponding data-toggle elements. Thus,
they noop for CINERGI and HydroShare.
@rajadain rajadain added queue and removed 1 labels Sep 21, 2017
@emiliom
Copy link
Contributor

emiliom commented Sep 25, 2017

@rajadain, just a brief comment to help minimize future confusion, regarding this:

There are 3 variants of WaterML (1.0, 1.1, 2.0).

1.1 and 1.0 are very similar. But 2.0 is a whole different standard, that borrows only conceptually from WaterML 1.x. Don't bother to even look it up for this phase of things.

We will use 1.1, as it is most uniformly provided.

Yup.

@rajadain rajadain self-assigned this Sep 25, 2017
@rajadain
Copy link
Member

Hi @emiliom, @aufdenkampe,

I've been making progress on this, and have gotten to the point where I'd like to fetch a small subset of values for a given site and variable. With the ulmo.cuahsi.wof.get_values call, I can't specify just the start date and get all values from then until the end. It seems like I need to specify both start and end dates. Unfortunately, it seems like the value of "end date" is different for every CUAHSI endpoint. For a demo of this, please see https://gist.github.com/rajadain/e117e3c0ce16552564e764de65b3de85, especially results Out[12], Out[15], and Out[17].

Essentially, this is the begin and end date as received from the GetSeriesCatalogInBox2 call:

Out[12]: ('2017-03-06T00:00:00', '2017-09-07T00:00:00')

This is the begin and end date from the ulmo.cuahsi.wof.get_site_info call:

Out[15]: ('2017-03-06T00:00:00', '2017-09-21T00:00:00')

This is the begin (as per my request parameter, so neglect) and end date from the ulmo.cuahsi.wof.get_values call:

Out[17]: ('2017-09-07T00:00:00', '2017-09-27T16:00:00')

Questions are:

  • Are these expected to be different?
  • Since we have a false end date from GetSeriesCatalogInBox2 (the value of 09/07/2017 from above), how can we fetch the "most recent values" for each variable for the table?
  • For the graphs, I was planning to use the end date from GetSeriesCatalogInBox2 as the end date, and then a week before that, a month before that, or more as the begin date based on chart settings. This will likely also drop some potentially more recent values. Is this alright?

@emiliom
Copy link
Contributor

emiliom commented Sep 27, 2017

Quick comment here, for now:

With the ulmo.cuahsi.wof.get_values call, I can't specify just the start date and get all values from then until the end

ulmo.cuahsi.wof.get_values does allow you to not pass an end date (or pass None). That should fetch all data from the start date you passed, to the end of the time series. See https://github.com/ulmo-dev/ulmo/blob/master/ulmo/cuahsi/wof/core.py#L119

@rajadain
Copy link
Member

Despite the documentation to the contrary, in practice this does not work. If I pass in None as the end date, I get this error:

---------------------------------------------------------------------------
WebFault                                  Traceback (most recent call last)
<ipython-input-8-492b7b2f08cf> in <module>()
      4                                            search_params['beginDate'],
      5                                            None,
----> 6                                            None)
      7 
      8 len(values['values'])

/Users/ttuhinanshu/scratch/model-my-watershed/ulmo/env/lib/python2.7/site-packages/ulmo/cuahsi/wof/core.pyc in get_values(wsdl_url, site_code, variable_code, start, end, suds_cache)
    177     response = suds_client.service.GetValues(
    178         site_code, variable_code, startDate=start_dt_isostr,
--> 179         endDate=end_dt_isostr)
    180 
    181     response_buffer = io.BytesIO(util.to_bytes(response))

/Users/ttuhinanshu/scratch/model-my-watershed/ulmo/env/lib/python2.7/site-packages/suds/client.pyc in __call__(self, *args, **kwargs)
    519         client = clientclass(self.client, self.method)
    520         try:
--> 521             return client.invoke(args, kwargs)
    522         except WebFault, e:
    523             if self.faults():

/Users/ttuhinanshu/scratch/model-my-watershed/ulmo/env/lib/python2.7/site-packages/suds/client.pyc in invoke(self, args, kwargs)
    579             timer)
    580         timer.start()
--> 581         result = self.send(soapenv)
    582         timer.stop()
    583         metrics.log.debug("method '%s' invoked: %s", self.method.name, timer)

/Users/ttuhinanshu/scratch/model-my-watershed/ulmo/env/lib/python2.7/site-packages/suds/client.pyc in send(self, soapenv)
    617             content = e.fp and e.fp.read() or ''
    618             return self.process_reply(reply=content, status=e.httpcode,
--> 619                 description=tostr(e), original_soapenv=original_soapenv)
    620         return self.process_reply(reply=reply.message,
    621             original_soapenv=original_soapenv)

/Users/ttuhinanshu/scratch/model-my-watershed/ulmo/env/lib/python2.7/site-packages/suds/client.pyc in process_reply(self, reply, status, description, original_soapenv)
    668                             "Reporting as an internal server error.", status)
    669                     if self.options.faults:
--> 670                         raise WebFault(fault, replyroot)
    671                     return (httplib.INTERNAL_SERVER_ERROR, fault)
    672             if status != httplib.OK:

WebFault: Server raised fault: 'Value cannot be null.
Parameter name: s'

I have updated the gist to demonstrate this: https://gist.github.com/rajadain/e117e3c0ce16552564e764de65b3de85

@rajadain
Copy link
Member

Looking at the WSDL definition for a GetValues request:

<element name="GetValues">
    <complexType>
        <sequence>
            <element maxOccurs="1" minOccurs="0" name="location" type="string"/>
            <element maxOccurs="1" minOccurs="0" name="variable" type="string"/>
            <element maxOccurs="1" minOccurs="0" name="startDate" type="string"/>
            <element maxOccurs="1" minOccurs="0" name="endDate" type="string"/>
            <element maxOccurs="1" minOccurs="0" name="authToken" type="string"/>
        </sequence>
    </complexType>
</element>

While endDate has minOccurs=0 indicating that it might not be specified, it has type=string, indicating that if it is specified it should be a valid string and not null. I haven't worked much with suds, so do not know if it usually removes None values before sending or includes them with nil, but that seems to be the case.

@rajadain
Copy link
Member

And we get the same error whether the values are specified as None or absent:

# Causes 'Value cannot be null' exception
values_to_end = ulmo.cuahsi.wof.get_values(sample.ServURL + '?WSDL',
                                           sample.location,
                                           variable,
                                           search_params['beginDate'],
                                           None,
                                           None)

# Causes 'Value cannot be null' exception
values_to_end = ulmo.cuahsi.wof.get_values(sample.ServURL + '?WSDL',
                                           sample.location,
                                           variable,
                                           search_params['beginDate'])

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

@rajadain, sorry for my lag. I'll follow up around 11:20am ish PT

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

Following up on the issue of failures with null endDate in ulmo.cuahsi.wof.get_values. Thanks for that extra research, @rajadain.

FYI, I have contributed to ulmo.cuahsi.wof.get_values in the past, and may have been the one who implemented the ability to handle None for beginDate and endDate, a couple of years ago. I use that capability in operational data harvesters that run at least hourly. That said ...

I think we're dealing with an implementation bug on the CUAHSI/WDC end 😞 The WSDL definition you reported, my understanding of WaterOneFlow ("WOF") GetValues requests, and my own usage over the years, all lead me to that conclusion.

If you look at the two sample notebooks of mine (that Anthony linked to at the start of this issue), you'll see that in both of them I use None end dates, explicitly or implicitly. The difference vs what you're dealing with is that those two examples use WOF endpoints other than the CUAHSI WDC endpoint; one of them uses a different WOF server software altogether.

My suggestion to mitigate this bug and move forward is to pass the current clock time (you can hard-wire Eastern time, for now) as the endDate.

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

And we get the same error whether the values are specified as None or absent:

FYI, that's expected (specifying None or no value returns the same error), since ulmo.cuahsi.wof.get_values uses None as the default for endDate.

@rajadain
Copy link
Member

Thanks for the background @emiliom. And yes, I did notice that you had add those bits to ulmo around four years ago 😃

The problem with using the current date, however, is for those services that no longer have current values, such as a hypothetical sensor with daily values until 2010. If we default to showing the most recent week of data from today, we won't have any for that sensor until one scrolls back to 2010. This is why I was going with 1 week of values from the reported endDate in GetSeriesCatalogInBox2.

Any recommendations for this?

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

Regarding the different, inconsistent end timestamps: This is disappointing, but I strongly suspect this is the result of lagged, cached metadata being used in the responses to GetSeriesCatalogInBox2 and ulmo.cuahsi.wof.get_site_info (ultimately a CUAHSI/WDC WOF GetSiteInfo request). The long lag with the former is an issue we had raised with them, and they were working on it; I don't remember the last update from them, but I doubt the timestamp returned by that request will ever be within 24 hours of the actual end timestamp for high-frequency data streams that are active and transmitted in real time. But I would expect the response from the latter (GetSiteInfo) should have a minimal lag, if at all, and definitely within 24 hours!

(I should editorialize that in my opinion this is also an inherent limitation of the WaterML 1.x standard per se. It should allow for a flag for the end timestamp that is not a timestamp proper, but an indicator that the data stream is active and ongoing. But this comment is not helpful to you.)

The problem with using the current date, however, is for those services that no longer have current values, such as a hypothetical sensor with daily values until 2010. If we default to showing the most recent week of data from today, we won't have any for that sensor until one scrolls back to 2010. This is why I was going with 1 week of values from the reported endDate in GetSeriesCatalogInBox2.

Yes, sorry, I'm getting to this. I was answering the question about the endDate errors with narrow blinders.

For services (or more specifically, sites/variables) whose latest data are "old" -- say, at least a month old, I'm fairly confident GetSiteInfo and GetValues end timestamps will align. We can pick some examples later, to test this. The problem is with sites that are active monitoring sites and stream data regularly (at least daily). In that case, GetSiteInfo will probably be old.

I gotta run now. My quick suggestion is this: Issue ulmo.cuahsi.wof.get_site_info and get its end timestamp. If it's < 2 weeks old, use it as is; otherwise, ignore it and use the current clock timestamp.

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

@rajadain, I'm back. I have up to 40 minutes (till 4:15pm ET), then I'll be busy again with a call and other things for the rest of the day. Let me know if you have a quick question or need a clarification that I can help with right now, that would help you today.

@rajadain
Copy link
Member

Thanks @emiliom, I think I'm set for now, but will ping here when I run into a wall again. Thanks for all your support!

@emiliom
Copy link
Contributor

emiliom commented Sep 28, 2017

Great. Happy to help.

rajadain added a commit that referenced this issue Oct 10, 2017
…lues-backend

BiG-CZ: Fetch CUAHSI Values using WaterML and Ulmo

Connects #2243
Connects #2238
rajadain added a commit that referenced this issue Oct 12, 2017
…lues-frontend

BiG-CZ: Fetch and Render CUAHSI Variable Values

Builds on #2353
Connects #2243
Connects #2238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants