Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R Jupyter notebooks with conda environment setup #7

Closed
emiliom opened this issue Oct 27, 2017 · 40 comments
Closed

R Jupyter notebooks with conda environment setup #7

emiliom opened this issue Oct 27, 2017 · 40 comments

Comments

@emiliom
Copy link
Member

emiliom commented Oct 27, 2017

@benjamincrary: @aufdenkampe reminded us yesterday about the cool Jupyter notebook you put together a couple of months ago, using R to demo access to data available via CUAHSI WaterOneFlow/WaterML 1.1 services. For the mid-November BiGCZ-ODM2 workshop we're putting together, we'd like to explore using some version of this notebook (likely slimmer) as a tutorial/demo. Hopefully we can count on you for help!

But first, a question: did you run that notebook using an R environment you put together using conda? If so, can you share your steps or conda environment file with us, here, via this issue?

Thanks!
cc @lsetiawan

@ocefpaf
Copy link
Member

ocefpaf commented Oct 28, 2017

@emiliom the key package needed for the R environment is https://github.com/ioos/notebooks_demos/blob/master/environment.yml#L80

If you want to toggled the kernel menu, so users can switch kernels using the Jupyter UI, you'll need nb_conda_kernels too. Then it is a matter of listing the R dependencies.

@emiliom
Copy link
Member Author

emiliom commented Oct 28, 2017

Thanks @ocefpaf! I forgot that IOOS has done one or two R demo notebooks. For our reference, here are two such notebooks:
http://ioos.github.io/notebooks_demos/notebooks/2017-01-23-R-notebook/
http://ioos.github.io/notebooks_demos/notebooks/2017-08-01-xtractoR/

I think @lsetiawan has already included nb_conda_kernels. Though I don't see it in the current env (I only see ipykernel, don't know if it's related), the feature is working on our JupyterHub, and @lsetiawan has worked with it recently.

@lsetiawan
Copy link
Member

I think @lsetiawan has already included nb_conda_kernels. Though I don't see it in the current env (I only see ipykernel, don't know if it's related), the feature is working on our JupyterHub, and @lsetiawan has worked with it recently.

@emiliom: I have explained this on another issue nb_conda_kernels is installed in the root conda environment, and ipykernel is needed in order for nb_conda_kernels to see the environment and then automatically add it to the jupyter notebook kernel choices.

@emiliom
Copy link
Member Author

emiliom commented Oct 30, 2017

Thanks for the reference, @lsetiawan

@benjamincrary
Copy link

No problem @emiliom, I'd be glad to help. @ocefpaf referenced the correct environment with conda. I'll pull together the required R packages and the appropriate installation source (CRAN or directly from github) and post them here shortly.

@lsetiawan
Copy link
Member

I'll pull together the required R packages and the appropriate installation source (CRAN or directly from github) and post them here shortly.

Thanks @benjamincrary, that would be awesome if you could do that! I can then create a conda environment and implement in our jupyterhub! What r version are you using?

@benjamincrary
Copy link

benjamincrary commented Oct 30, 2017

R version 3.3.2 (2016-10-31)

@lsetiawan these are the packages (and sources) called in the notebook. All dependencies should be installed by default with the commands below. I recall rCharts creating some difficulty when getting this set up. I can test out the environment to see if everything is working if you need. Just let me know.

install.packages("devtools")
library(devtools) #load devtools to be able to use install_github
install.packages("tidyverse")
install_github("tidyverse/ggplot2")
install.packages("lubridate")
install_github("ramnathv/rCharts")
install.packages(CRAN)

@lsetiawan
Copy link
Member

lsetiawan commented Oct 30, 2017

Great! Thanks @benjamincrary :)

UPDATE: I got an R environment with the same name as the python odm2client environment. I tested the notebooks, but I lots of errors. Not sure how to debug in R. @emiliom should I allow @benjamincrary to try out jupyterhub for the R component? Thanks!

@benjamincrary
Copy link

Hi @emiliom, I just want to give you a heads up that I am on vacation from 11/9-11/13 and won't be able to make updates to the Jupyter notebook during that time. If you can send me some feedback ahead of those dates, I'll gladly work on the changes ahead of time off.

Thanks!

@aufdenkampe
Copy link
Member

I just found out that the EnviroDIY WOFpy endpoints just went to production yesterday!
http://data.envirodiy.org/wofpy/

So now you can update and test those in your R notebook!

Also, the Time Series Analyst (TSA) viewer is also available:
http://data.envirodiy.org/tsa/

@emiliom
Copy link
Member Author

emiliom commented Nov 3, 2017

@benjamincrary thanks for the heads-up about your vacation (and congratulations again!).

Between now and then, please work with @lsetiawan to:

Thanks!

@emiliom
Copy link
Member Author

emiliom commented Nov 3, 2017

UPDATE: I got an R environment with the same name as the python odm2client environment. I tested the notebooks, but I lots of errors. Not sure how to debug in R. @emiliom should I allow @benjamincrary to try out jupyterhub for the R component?

@lsetiawan Yes, please give @benjamincrary JupyterHub access! Sorry for not responding to that comment earlier.

On a side note, I'm wondering if it makes more sense (and might minimize potential problems?) to have separate conda environments for Python and R? We don't have to make any decisions right now, but we should by the end of next week.

@lsetiawan
Copy link
Member

I don't think it includes the WaterML R package yet

It does have this in there already.

Upload to the notebooks directory a slimmed down, tested version of your R notebook. Test it on our JupyterHub server with @lsetiawan's help.

@benjamincrary I have added you to the whitelist to use the jupyter notebook. Please go to http://jupyterhub.bigcz.org/

On a side note, I'm wondering if it makes more sense (and might minimize potential problems?) to have separate conda environments for Python and R? We don't have to make any decisions right now, but we should by the end of next week.

I did try doing this, but R environment didn't show up without a python environment. So I ended up with like conda environment python: envname and r: envname, and the python one basically not having any other packages other than the ipykernel that allows nb_conda_kernels to see the environment.

@emiliom
Copy link
Member Author

emiliom commented Nov 3, 2017

I don't think it includes the WaterML R package yet

It does have this in there already.

I don't see anything obvious in the env file. But I don't know how R works, so I trust your tests that included the loading of WaterML R.

On a side note, I'm wondering if it makes more sense (and might minimize potential problems?) to have separate conda environments for Python and R?

I did try doing this, but R environment didn't show up without a python environment. So I ended up with like conda environment python: envname and r: envname, and the python one basically not having any other packages other than the ipykernel that allows nb_conda_kernels to see the environment.

I don't follow this. But we can talk about it next week.

@emiliom
Copy link
Member Author

emiliom commented Nov 7, 2017

@benjamincrary Have you made progress in getting your R notebook to run on JupyterHub? With your leave coming up soon, it'd be great if we could have a working version on the repo (and tested on JupyterHub) soon.

@lsetiawan Please go ahead and split the single conda env yaml file in the repo, into the two env files we discussed last week (ie, move the R stuff to a new env file).

@lsetiawan
Copy link
Member

@lsetiawan Please go ahead and split the single conda env yaml file in the repo, into the two env files we discussed last week (ie, move the R stuff to a new env file).

Will try to do that today. Lots of juggling, this is definitely cued up on my list. Sorry for the lag 😞

@benjamincrary
Copy link

Hi @emiliom I haven't made progress yet, but have carved out some time later this afternoon to do some upload testing and some trimming of the tutorial. I'll keep you updated.

@emiliom
Copy link
Member Author

emiliom commented Nov 7, 2017

Thanks, @benjamincrary !

@aufdenkampe
Copy link
Member

@emiliom, I just went over @benjamincrary's notebook in R:
https://github.com/BiG-CZ/BiG-CZ-Toolbox/blob/master/ipynotebooks/EnviroDIY_Rnotebook.ipynb
We were looking for places to trim content, at your suggestion, and I couldn't find anything obvious to cut. He divides the notebook into 3 sections, and the 3rd section is indeed "bonus" material, but it could easily be skipped.

Ben is about to upload an updated version, with the new live EnviroDIY endpoints and with some minor formatting improvements. After that goes up, let us know if there are any sections that you might suggest trimming.

@benjamincrary
Copy link

@emiliom I'm still experiencing some inconsistency with the endpoints and certain WaterML requests.
For example,

  • GetSiteInfo() works with the SOAP 1.0 url but not with the SOAP 1.1 url
  • GetValues() works with the SOAP 1.1 url but not with the SOAP 1.0 url

For the time-being, I can substitute the functioning url for each step in the notebook, but can you verify that the communications are okay from the endpoint perspective? I can also reach out to the WaterML author to see if there are any known bugs from the package side. I don't think this is a show-stopper in the short term, but it may be a point of confusion during the workshop.

@emiliom
Copy link
Member Author

emiliom commented Nov 7, 2017

@benjamincrary Please side step any tests with WaterML 1.0. We're not guaranteeing that it still works, and there's no value in the context of this workshop to using it in addition to 1.1.

Last time I checked we had run solid tests against SOAP and REST 1.1. But the details and exact timing escape me. @lsetiawan, what do you remember? I think it was in the context of EnviroDIY, but it could've been with the Luquillo CZO endpoint instead. Can't remember.

@emiliom, I just went over @benjamincrary's notebook in R:
https://github.com/BiG-CZ/BiG-CZ-Toolbox/blob/master/ipynotebooks/EnviroDIY_Rnotebook.ipynb
We were looking for places to trim content, at your suggestion, and I couldn't find anything obvious to cut. He divides the notebook into 3 sections, and the 3rd section is indeed "bonus" material, but it could easily be skipped

The key question is what's the goal for that R notebook? Are we trying to flex and demo all kinds of WOF/WaterML access patterns ... or present a slimmed down version only as an entry point and nod to workshop participants who know no Python at all? My assumption (and vote) was the latter, and at the same time that we would have an analog Python notebook that may also mix and match WOF/WaterML results with direct ODM2 results, such as an odm2api query.

I haven't looked at the notebook very recently and won't have time to do it today. However, as long as we have a working notebook in place before @benjamincrary leaves, tested on JupyterHub, we can trim ourselves later as we see fit.

@benjamincrary
Copy link

@emiliom I just committed an update to the notebook (https://github.com/BiG-CZ/BiG-CZ-Toolbox/blob/master/ipynotebooks/EnviroDIY_Rnotebook.ipynb), and everything is functional on JupyterHub!

There are still a couple of pings to the SOAP 1.0 url, but we can discuss what is best. The raw data comes from the SOAP 1.1 url. I still have a little bit of availability tomorrow to help trim down if we land on an approach. My opinion is that sections 1-2 are very easy to breeze through and will illustrate how to explore the CUAHSI data or point to a specific resource. Feedback is welcome.

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

Thanks, @benjamincrary! That's great.

It sounds like you're no longer running into SOAP 1.1 problems, right? I hope so, and that's what I'm seeing in your notebook. To be extra sure, I've run on the EnviroDIY live endpoint a notebook that @lsetiawan and I have developed to test SOAP and REST 1.1 services from a WOFpy endpoint. Both SOAP and REST ran fine. If you're curious, see https://github.com/BiG-CZ/BiG-CZ-Toolbox/blob/master/ipynotebooks/WOFpy_ODM2_SOAPandREST_Tests.ipynb

I will push your notebook to the workshop repo and test it myself on JupyterHub tomorrow morning. I'm just tickled to see it in action.

In principle, I think we could nix all of section 1 (but retain some of the nice documentation text ), though I personally like the TN vs TP plot. However, as far as I'm concerned, you're off the hook now! We don't need to know R to exercise selective surgery on the notebook as we decided what to drop, if anything. You have more important things to focus your attention on now!

@aufdenkampe
Copy link
Member

@emiliom, Ben is still running into problems with SOAP 1.1. That's the only reason he's been using SOAP 1.0, because SOAP 1.1 doesn't work with GetSites (and GetSitesInfo ?). See the first block of code under the heading: 2. Querying and downloading EnviroDIY data.
GetValues does work with SOAP 1.1 in the R code

Given that it all worked for you with Python, it is possible that the issue is with the WaterML R package. Also, I think Ben also had to use SOAP 1.0 for GetSites when pinging the CUAHSI HIS Catalog (http://hydroportal.cuahsi.org/). @benjamincrary, can you confirm?

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

Given that it all worked for you with Python, it is possible that the issue is with the WaterML R package. Also, I think Ben also had to use SOAP 1.0 for GetSites when pinging the CUAHSI HIS Catalog (http://hydroportal.cuahsi.org/).

Yes, that's my hunch too (ie, a problem with WaterML R). We've done extensive testing on WOFpy 1.1 output; GetSiteInfo response is readily consumed by ulmo and also validates with the CUAHSI WaterML 1.1 XML Schema.

@lsetiawan
Copy link
Member

everything is functional on JupyterHub!

Thanks @benjamincrary for testing this on jupyterhub. I'm glad all the packages works! 😄

@lsetiawan
Copy link
Member

@aufdenkampe @emiliom Seems like GetSiteInfo works just fine on WaterML 1.1 from CUAHSI HIS Catalog, but erroring out form EnviroDIY WOFpy end point.

screen shot 2017-11-08 at 09 45 21
screen shot 2017-11-08 at 09 45 48

@benjamincrary
Copy link

Thanks, @lsetiawan. That is the same finding I had. Without this functionality, it's difficult to identify which variables are available at the site.

The WaterML R package was written with specific focus on the CUAHSI server. Is there another server we could ping using 1.1?

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

Hmm. Can one of you paste here the full error (preferably as text rather than as a screenshot) starting with the line 1. GetSiteInfo(server, "wofpy:160065_Crosslands")? I'm guessing the following line (the one cutoff in @lsetiawan's screenshot) will be very helpful. Thanks.

Looks like an 1.1. assumption WaterML R must be making that adheres to CUAHSI WOF server response but not WOFpy responses, even though as far as we can tell WOFpy 1.1 responses are compliant. Tricky, but hopefully trackable, by comparing responses and looking at the WaterML R error message.

@lsetiawan
Copy link
Member

[1] "downloading SiteInfo from: http://data.envirodiy.org/wofpy/soap/cuahsi_1_1/"
[1] "download time: 0.927999999999884 seconds, status: Success"

Error in data.frame(SiteName = rep(SiteName, N), SiteID = rep(SiteID, : arguments imply differing number of rows: 10, 1, 0
Traceback:

1. GetSiteInfo(server, "wofpy:160065_Crosslands")
2. data.frame(SiteName = rep(SiteName, N), SiteID = rep(SiteID, 
 .     N), SiteCode = rep(SiteCode, N), FullSiteCode = rep(paste(Network, 
 .     SiteCode, sep = ":"), N), Latitude = rep(as.numeric(Latitude), 
 .     N), Longitude = rep(as.numeric(Longitude), N), Elevation = rep(as.numeric(Elevation), 
 .     N), State = rep(State, N), County = rep(County, N), Comments = rep(Comments, 
 .     N), VariableCode = VariableCode, FullVariableCode = paste(Vocabulary, 
 .     VariableCode, sep = ":"), VariableName = VariableName, ValueType = ValueType, 
 .     DataType = DataType, GeneralCategory = GeneralCategory, SampleMedium = SampleMedium, 
 .     UnitName = UnitName, UnitType = UnitType, UnitAbbreviation = UnitAbbreviation, 
 .     NoDataValue = as.numeric(NoDataValue), IsRegular = IsRegular, 
 .     TimeUnitName = TimeUnitName, TimeUnitAbbreviation = TimeUnitAbbreviation, 
 .     TimeSupport = TimeSupport, Speciation = Speciation, methodID = MethodID, 
 .     methodCode = MethodCode, methodDescription = MethodDescription, 
 .     methodLink = MethodLink, sourceID = SourceID, organization = Organization, 
 .     sourceDescription = SourceDescription, citation = Citation, 
 .     qualityControlLevelID = QualityControlLevelID, qualityControlLevelCode = QualityControlLevelCode, 
 .     qualityControlLevelDefinition = QualityControlLevelDefinition, 
 .     valueCount = ValueCount, beginDateTime = as.POSIXct(strptime(BeginDateTime, 
 .         "%Y-%m-%dT%H:%M:%S")), endDateTime = as.POSIXct(strptime(EndDateTime, 
 .         "%Y-%m-%dT%H:%M:%S")), beginDateTimeUTC = as.POSIXct(strptime(BeginDateTimeUTC, 
 .         "%Y-%m-%dT%H:%M:%S")), endDateTimeUTC = as.POSIXct(strptime(EndDateTimeUTC, 
 .         "%Y-%m-%dT%H:%M:%S")), stringsAsFactors = FALSE)
3. stop(gettextf("arguments imply differing number of rows: %s", 
 .     paste(unique(nrows), collapse = ", ")), domain = NA)

@lsetiawan
Copy link
Member

One thing that I noticed that WOFpy don't have within GetSiteInfo is the information about timeScale it is currently

<timeScale/>

While Cuahsi is

<timeScale isRegular="true">
                        <unit>
                            <unitName>hour</unitName>
                            <unitType>Time</unitType>
                            <unitAbbreviation>hr</unitAbbreviation>
                            <unitCode>103</unitCode>
                        </unit>
                        <timeSupport>0</timeSupport>
                    </timeScale>

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

Thanks, @lsetiawan!

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

I think <timeScale> was deprecated in 1.1, but we'll have to look into it (again, our WOFpy responses validate against the official WaterML 1.1 XSD Schema!). Still, good catch.

Also, GetSites 1.1 response is failing, and that one doesn't have the timeScale or timeSupport info; but the 1.0 response has a timeZoneInfo that's not found in our 1.1 ...

@lsetiawan
Copy link
Member

GetSites 1.1 is not failing on my end.

screen shot 2017-11-08 at 11 42 08

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

GetSites 1.1 is not failing on my end.

Thanks so much for testing that!! I assumed it was, b/c @benjamincrary was using 1.0 for that request. Great to know that we can narrow down the problem to GetSiteInfo. @lsetiawan, I suspect your finding about timeScale is the source of the problem, though there might be an issue with WOFpy not providing begin and end DateTime in UTC as well (in addition to the local-timezone versions, which WOFpy provides). I've only checked the REST responses.

@aufdenkampe
Copy link
Member

aufdenkampe commented Nov 8, 2017

@benjamincrary just headed out for a long weekend off for his wedding!

These are his notes from his desk, all using the SOAP 1.1 endpoints.

Request CUAHSI WOFpy
GetSites() yes yes
GetSiteInfo() yes no
GetVariables() yes yes
GetValues() yes yes

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

Thanks, @aufdenkampe. @lsetiawan and I are on top of this.

FYI, as we've reported above, we think we've found the source of the "problem", and we have a strategy. Don is working on it right now. But WOFpy is not doing something illegal/non-compliant per se; it's doing something that is allowed by the CUAHSI WaterML 1.1 XSD schema (not including timescale/timeunits info in GetSiteInfo), but probably not handled by clients due to hard-wired expectations. ulmo handles that absence gracefully (again, compliant with the XSD schema), and WaterML R fails b/c it doesn't handle it gracefully.

Don is adding the timescale/timeunit elements to the GetSiteInfo response right now. This addition will make for a more informative -- but slower -- GetSiteInfo response.

@emiliom
Copy link
Member Author

emiliom commented Nov 8, 2017

@lsetiawan has addressed the timescale issue, and his PR has been merged into master. But WaterML R consumption of WOFpy SOAP 1.1 GetSiteInfo for EnviroDIY still throws an error. We don't understand it. We've examined SOAP 1.0 and 1.1 responses and are seeing some odd things, yet, like I've said, we have no problem ingesting all the 1.1 responses via ulmo and the Python SOAP-handling library it uses, suds-jurko.

This is not something we can address this week. We'll have to ask the CUAHSI WDC gang at the workshop to help us debug.

@aufdenkampe
Copy link
Member

Agreed! This can be addressed some other time. Thanks for all your hard work on this.

@emiliom
Copy link
Member Author

emiliom commented Nov 18, 2017

Opened WOFpy issue ODM2/WOFpy#217 to track remaining, puzzling SOAP 1.1 problem.

Closing this issue. Everything else here was addressed. R notebook works nicely.

@emiliom emiliom closed this as completed Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants