Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A next-generation GUI for visualizing big gridded data in Python #14

Open
rsignell-usgs opened this issue Feb 11, 2019 · 69 comments

Comments

Projects
None yet
9 participants
@rsignell-usgs
Copy link

commented Feb 11, 2019

ESIP Member Organization Name

US Geological Survey

Mentors

Rich Signell, USGS @rsignell-usgs
Martin Durant, Anaconda @martindurant

Project Idea

CF-Compliant ndarray datasets can be visualized in tools that understand the NetCDF Data Model and the geospatial metadata they contain. While stand-alone tools like Panoply are often used on the desktop, the goal here is to develop a tool for visualizing and interacting with these datasets in Jupyter Notebooks using xarray, dask and the pyviz collection of widgets, rendering and layout tools.

2019-03-06_15-30-42

Idea Title

A next-generation GUI for visualizing big gridded data in Python

Abstract

There have been several important advances that now make it possible to render big gridded data from many different sources with a common Python-based toolset in the browser. The first is the CF conventions that allow non-ambiguous identification of coordinate and data variables, the second is xarray which represents the CF data model in Python, and the third is the pyviz collection of tools, that allows rendering of massive gridded data, widgets to control data selection, and tools to specify layouts of widgets and data displays in the browser.

Technical Details

The GSoC student would build a dashboard using panel that allows the user to specify an intake catalog containing CF-compliant datasets that can be loaded into xarray. The user would select a dataset, which would then display a list of variables contained in that dataset. The user would then select a variable, which would then trigger reading the variable by xarray and displaying on a map using hvplot. Depending on the variable dimensions, widgets containing time steps and/or vertical levels would be displayed which would allow specification of specific index or coordinate values to plot.

Bells and whistles like colormap selection and different types of visualization techniques could be added as time permits.

The work will be coordinated with the overarching "Next-Generation GUI" effort of @martindurant.

The development can be done on a Pangeo environment running on the Amazon cloud, which means the student will not have run anything or install any software locally -- they will only need their browser and a github account.

Helpful Experience

Experience with Python is required, experience with xarray, pyviz and github a big plus.

First steps

Video:

Getting Started:

@rsignell-usgs rsignell-usgs changed the title Develop interactive viewer for CF-compliant gridded datasets in the Pangeo environment A next-generation GUI for visualizing big gridded data in Python Feb 11, 2019

@abburgess abburgess added the GSoC 2019 label Feb 11, 2019

@sugam45

This comment has been minimized.

Copy link

commented Feb 14, 2019

Hi, @rsignell-usgs,
I am very interested in working on the project and making a contribution to it.
Thank You!

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Feb 14, 2019

@sugam45, excellent. What is your experience with these components thus far?

@abburgess

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

@hdsingh

This comment has been minimized.

Copy link

commented Feb 18, 2019

I got to know about Pyviz through this issue. Pyviz is just absolutely amazing!!!
It make visualising so much simpler, better and interesting as compared to simply matplotlib, seaborn or other plotting libraries.Thanks a lot for creating this issue and providing such superb resources!!

I would love to contribute to this project for GSOC. I have gone through all the videos mentioned above and finished pyviz tutorials.

There is also personal motivation involved in this project for me since I have several months of minute stock market data for which I wanted to design a dedicated interactive GUI in Python. This project is very much on the similar lines.

This is the task list I have prepared for myself to get started:

  1. Understand Holoviews by doing some more similar plotting tasks.
  2. Understand internals of Intake in depth.
  3. Understand xarray and dask in depth.
  4. Start using Pangeo to understand it better.
  5. Prepare a simple GUI for stock data which will help me to understand Pyviz and other packages in much depth and detail.

I really wish that ESIP gets selected!!

@hdsingh

This comment has been minimized.

Copy link

commented Feb 27, 2019

Congratulations, ESIP for getting selected in GSOC 2019 !!!

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Feb 27, 2019

Great news indeed!

@sugam45

This comment has been minimized.

Copy link

commented Feb 28, 2019

pangeo.pydata.org site showing the error "Server Not Found". I think they have closed the access to it.
screenshot_2019-03-01 deployments pangeo documentation

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Feb 28, 2019

Yes, that pangeo instance is gone. Leaving it wide open was causing the research credits to be depleted too quickly. You can use the pangeo binder though. Try clicking the launch binder button here, for example: https://github.com/reproducible-notebooks/HRRR_Dashboard

@kunakl07

This comment has been minimized.

Copy link

commented Mar 2, 2019

@rsignell-usgs ,I would be thrilled if I get to contribute to this projects as I have done past data visualization projects using Matplotlib,Seaborn,Plotly,Geoplot and also machine learning libraries like sci-kit learn,tensorflow,keras,Categotical Boost and Pybrain

@veer11997

This comment has been minimized.

Copy link

commented Mar 4, 2019

i am looking to work on the project of the organization and contribute to the opensource

@parthpm

This comment has been minimized.

Copy link

commented Mar 4, 2019

@rsignell-usgs I am deeply interested in this project.I have done past projects using well known plotting libraries in python.I have gone through the first steps section,Looking forward to contribute.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 4, 2019

I'm glad to see all the interest here! This will be very interesting and rewarding project!

@kunakl07

This comment has been minimized.

Copy link

commented Mar 4, 2019

Sir,@rsignell-usgs.Can I contribute to this project?
and where can I find the dataset?
Or should I download the dataset of some regions from NCDC and show you the result?

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 4, 2019

For reproducibility, better to load data directly from a CF-compliant OPeNDAP endpoint or an S3 bucket (e.g. zarr format), like in:
https://github.com/reproducible-notebooks/HRRR_Dashboard
Binder

@kunakl07

This comment has been minimized.

Copy link

commented Mar 4, 2019

Thanks sir,@rsignell-usgs!!

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 5, 2019

Part of this project will be figuring out what the lon,lat and time dimensions of an xarray dataset are, instead of assuming that they are in a certain order, as in the HRRR_Dashboard referenced above. The MetPy package can do this:
https://stackoverflow.com/questions/53469510/how-to-identify-time-lon-and-lat-coordinates-in-xarray/

@kunakl07

This comment has been minimized.

Copy link

commented Mar 6, 2019

Hi @rsignell-usgs ,
With the help of the link that you had provided,I am able to visualize the interactive map.Should,my map also show the latitude and the longitude as the mouse hovers through the region?

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 6, 2019

Yes, it should as long as the hover control is selected. Try running this example:
Binder
Note also that we currently need conda packages from development channels to make this work:
https://github.com/reproducible-notebooks/HRRR_Dashboard/blob/master/environment.yml

@kunakl07

This comment has been minimized.

Copy link

commented Mar 6, 2019

Cool
Thanks sir
And sir Colab doesn't support pyviz so we need to do in Jupyter Notebook only

@martindurant

This comment has been minimized.

Copy link

commented Mar 8, 2019

Signing up to this conversation. I am heading up Intake at Anaconda and involved in our development of the GUI for it.

@kunakl07

This comment has been minimized.

Copy link

commented Mar 9, 2019

Hi @rsignell-usgs ,
As the initial task is to predict weather condtions,I have taken weather data from of 5 cities(NCDC).
Here,we are trying to predict the weather for the Danish city "Odense" 24 hours into the future, given the current and past weather-data from 5 cities.
The cities are 'Aalborg', 'Aarhus', 'Esbjerg', 'Odense' and 'Roskilde.
bandicam 2019-03-09 03-08-52-295

I am using a Recurrent Neural Network (RNN) because it can work on sequences of arbitrary length.
We are going to predict temperature,Pressure and WindSpeed on our test set that the model has never seen during training.
bandicam 2019-03-09 03-14-19-331

It worked reasonably well for predicting the temperature where the daily oscillations were predicted well, but the peaks were sometimes not predicted so accurately.
bandicam 2019-03-09 03-14-47-844

The atmospheric pressure was also predicted reasonably well, although the predicted signal was more noisy and had a short lag.

bandicam 2019-03-09 03-42-56-922

The wind-speed could not be predicted very well,and I think improving dataset would lead to better predictions.
bandicam 2019-03-09 03-14-56-672

@kunakl07

This comment has been minimized.

Copy link

commented Mar 9, 2019

@rsignell-usgs ,
Sir,sorry for responding late,the data filtering from NCDC took a lot of my time.
Would the next step be plotting these points on Maps and when we hover the mouse over these co-ordinates it should give us the predicted climate conditions along with longitude and latitude?
Sir,please correct me where I am going wrong and what will be the tasks to be further accomplished.
Thank you.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 10, 2019

@kunakl07, predicting weather via neural nets is cool, but we were intending the focus of this GSoC project to be more on making a general GUI for exploring data n-dimensional data on a map.

I think a reasonable initial target would be to build off/improve the examples
https://github.com/reproducible-notebooks/HRRR_Dashboard
https://github.com/reproducible-notebooks/COAWST-ROMS_Dashboards
and try to make the GUI more general, for example, automatically handling a heterogeneous collection of model output such as in the Pangeo Intake catalog:
https://pangeo.io/catalog.html
https://github.com/pangeo-data/pangeo/raw/master/gce/catalog.yaml
(the zarr datasets in this catalog are publicly readable, but not the netcdf files)

An additional step would be to allow the user to extract and plot a time series at a location selected on the map.

Does that make sense?

@kunakl07

This comment has been minimized.

Copy link

commented Mar 10, 2019

Yes @rsignell-usgs,I got your point.
I was going offtrack and thanks for bringing me back on the right track,now my main focus will be creating a general GUI which can be used to show various dimensions on the map and would start by improving this.

@martindurant

This comment has been minimized.

Copy link

commented Mar 10, 2019

Note for the pangeo Intake catalog linked above: you can also see a nicely rendered version of it at https://pangeo.io/catalog.html .

From the point of view of the visualisation/GUI, you probably need know nothing about the catalog, but I imagine the interface would be passed an xarray object which could come from anywhere. The HRRR Dashboard, above, does not happen to use Intake.

@kunakl07

This comment has been minimized.

Copy link

commented Mar 14, 2019

Hi @rsignell-usgs ,
bandicam 2019-03-14 08-42-01-823
This is the interactive GUI created using holoviews,geoviews,dask,datashader and bokeh of New York Taxi Trip.
When we hover over the image we can see its plot_x and ploy_y's co-ordinates respectively.
The data is loaded in Parquet-format file,Aggregating pickups and dropoffs and then we got hold of xarray arrays and then shaded the region with more drop-offs and pickups
@martindurant ,
Longitude-Latitude
An interactive GUI that displays latitude and longitude,as we hover over the image.
This is created using only Bokeh and google api wherein we get the latitude and longitude as we hover

If any participant participating in GSOC like me,would require the code for above GUI's,I would be happy to share

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Mar 18, 2019

Folks, @martindurant and I had a chat and we will be creating a repo with a toy dashboard panel problem that you will be able to work on to show your familiarity with widget interaction. Should be ready within a few days.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

@hdsingh , probably the only functionalities that we could use from ncview (that are not already in panoply) are:

  • extraction of time series, with ability to overlay several time series on same plot
  • quick animation of the 2D maps over a specified 3rd dimension (frequently time)
@martindurant

This comment has been minimized.

Copy link

commented Apr 1, 2019

@hdsingh : to be sure, there are many viewers of particular types of data in particular domains, and it is interesting to consider why some have become ubiquitous and others not, which doesn't necessarily map well to range of functionality. It can be that a viewer has only one feature lacking in others, but the users consider it so important that it compensates for other features being poor or absent.

For instance, in my former astronomy career, almost all raw image viewing happened in "ds9", which understands world coordinates well, and has a whole host of keyboard-interactive astro tools written for it.

Meanwhile, my former medical imaging old-guard colleagues were all using imageJ, which has a terrible interface.

Both may be of interest as inspiration/curiosity even when talking geospacial, and we certainly don't want to replicate something that doesn't work well. However, and unfortunately, adoption is much easier had with something that is familiar to the community, so you can get further, faster, by not straying too far... Don't forget, that there is always a lot of inertia when people have a tool that they know and gets the job done. It's when they find they can't do anything that they look around for alternatives.

@hdsingh

This comment has been minimized.

Copy link

commented Apr 1, 2019

@rsignell-usgs I will figure out a way to implement these functionalities and add these in the proposal.
@martindurant Thank you sir, for your valuable insights. I will design the interface keeping these things in mind.

The interface would be designed in a way to integrate best features of Panoply and ncview. I will also look into some other softwares available for NetCDF and Geodata visualization and see what are their features that differentiate them from the others.

@abburgess

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

Hi all - a friendly reminder that there is ONE WEEK LEFT to submit your proposals for this project! Best of luck and we're excited to see what is submitted!

@hdsingh

This comment has been minimized.

Copy link

commented Apr 2, 2019

Pyviz community is really amazing and great !!. Within a few minutes of discussion I got a way to maximise the screen area for the dashboard.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

@hdsingh, glad to hear it!

@hdsingh

This comment has been minimized.

Copy link

commented Apr 4, 2019

@rsignell-usgs @martindurant I updated my proposal according to our discussion. I also completed the remaining sections and highlighted important improvements. Can you please have a look and let me know what else could be changed?
Thanks a lot!!

@martindurant

This comment has been minimized.

Copy link

commented Apr 4, 2019

The proposal is certainly ambitious! I would be very glad to see it all come to fruition.

I promise to have more for you by tomorrow, but a few stylistic initial thoughts:

  • you should capitalise all of the names, such as Intake, Python, Panel and so on
  • you should provide links for each of these external things (there are many!) the first time it is mentioned outside of abstract. You might consider a glossary section with brief descriptions of all of the components/libraries you refer to.
@hdsingh

This comment has been minimized.

Copy link

commented Apr 5, 2019

I modified the proposal according to above recommendations and also added Time Availability section. I am waiting to get more feedback.

@rsignell-usgs @martindurant Thanks a lot for the review!

@martindurant

This comment has been minimized.

Copy link

commented Apr 5, 2019

I think that the abstract can use a little work, it is the most important part of the proposal.

have limited functionality and can only be reasonably extended by the software developers on those projects

"limited functionality" is very vague. Rather, you can say that users wish to apply complex analysis methods to the data they are visualising, which is something you can only do in combination with the python-data stack.
Secondly, the latter statement about developers is true for all open-source projects; anyone who contributes to a project is then a developer of that project. I'm not quite sure what you are driving at - is it that you could fairly embed your GUI within other things like Intake's interface, or to use in conjunction with other Python tools like Dask?

controlled data selection

probably: data-set selection and control of fields for plotting ?

Currently only stand-alone applications

Is this strictly true? Pangeo is indeed exploring geospatial data in python already, just not as nicely as this GUI would be. You could instead say that the majority of geospatial data visualisation happens in these apps.

written in Python, but experienced as a dashboard in the browser

"experienced" -> presented
"the browser" -> as a standalone browser app or in a notebook environment

..and holds the promise ... explore data

This should be at least two sentences

"Cloud" does not need to be capitalised :)

That's it, good luck!

@hdsingh

This comment has been minimized.

Copy link

commented Apr 6, 2019

  1. limited functionality - resolved
  2. can only be reasonably extended by the software developers on those projects - Provided example of Dask
  3. controlled data selection - meant to say different variables, indexes at particular timestamps or locations, so change to controlled data points selection
  4. All other comments - resolved

I have re-read the Abstract several times so it is becoming a bit difficult for me to approximate the quality of content. Can you please view it one last time before I submit the proposal, just to be sure and let me know?

Thanks a lot!

@martindurant

This comment has been minimized.

Copy link

commented Apr 6, 2019

Sorry, I'm travelling today, I probably won't be able to comment any more.

@hdsingh

This comment has been minimized.

Copy link

commented Apr 7, 2019

@rsignell-usgs
I am trying to extract and plot a time series at a location selected on the map using GreatLakes_Dashboard

I extracted the location from map using streams.Tap. The location returned is in lat and lon however dimensions in xarray dataset are nx, ny. The timeseries could be plotted in relation to nx ny by ds.sel(nx =10,ny = 10,sigma = 0.0).air_u.hvplot(). However while passing lat lon values to ds.where(((ds.lat==43.5882) & (ds.lon==-77.9989)) ,drop=True) nothing is returned. I have tested this for multiple values of lat lon.

Can you please give me some more details regarding how to correctly extract time series data w.r.t lat lon?

Screenshot from 2019-04-08 00-17-33

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 7, 2019

@hdsingh, while xarray can easily extract a time series at a point when the lat and lon coordinates are one dimensional, when lon and lon coordinates are two dimensional, as in this case, you need to bring in some other tools. A tree approach is probably best, like: https://stackoverflow.com/a/40044540

@hdsingh

This comment has been minimized.

Copy link

commented Apr 8, 2019

Sincere thanks and gratitude to @rsignell-usgs @martindurant @jsignell for your guidance and support during the proposal preparation phase. I feel that because of this project I have learnt so much interesting data visualization things and I am excited to learn more. I have submitted the proposal.
Thanks a lot for listing this project in GSOC!

I am checking out the above approach and let you know soon.

@hdsingh

This comment has been minimized.

Copy link

commented Apr 9, 2019

I added a feature to extract time series at a particiular location in GreatLakes_Dashboard. I tried np.unravel_index based approach first and it seemed to be working fine.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@hdsingh, way to go! I tried it and it works! This is really important functionality!

@hdsingh

This comment has been minimized.

Copy link

commented Apr 15, 2019

I added a feature to overlay several time series on the same plot in GreatLakes_Dashboard. The idea is to replicate the functionality of ncView.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 15, 2019

@hdsingh, great! I love that this is already useful for real work, and will just keep getting more and more useful!
2019-04-15_9-06-37

@hdsingh

This comment has been minimized.

Copy link

commented Apr 23, 2019

I added Location Marker feature. As of now all the markers are of same color. I will soon update these to have different colors according to time series.

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

Indeed! Works great for me:
2019-04-23_8-47-58
I'm excited you are continuing to make excellent progress on this!

@hdsingh

This comment has been minimized.

Copy link

commented Apr 25, 2019

I added feature for location markers to have same color as of their respective time series line. Please have a look.

@martindurant

This comment has been minimized.

Copy link

commented Apr 25, 2019

Thank you @hdsingh !

@hdsingh

This comment has been minimized.

Copy link

commented May 2, 2019

@rsignell-usgs Sir, Thank you for adding me as a member in the reproducible-notebooks !

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented May 2, 2019

@hdsingh, as an experiment, the pyviz team deployed your current Great Lakes notebook as a web app here: https://roms.aip.anaconda.com/GreatLakes_Dashboard. Very cool!

@hdsingh

This comment has been minimized.

Copy link

commented May 2, 2019

This is really amazing. I am very glad to see that Pyviz team is also as excited as me for this project!!
Thanks a lot team Pyviz !

@hdsingh

This comment has been minimized.

Copy link

commented May 6, 2019

@rsignell-usgs @martindurant
Sir, Thanks a lot for your guidance! I got selected !
Looking forward to work and learn more from you.
Let's make this project successful !

@rsignell-usgs

This comment has been minimized.

Copy link
Author

commented May 6, 2019

Congratulations @hdsingh! I have no doubt that with your enthusiasm, intelligence and drive, this project will be successful! This work should have many benefits, among them:

  • you will gain through learning and experience
  • the community will gain a useful tool for visually exploring CF-compliant multidimensional data
  • the community will get a well-documented example of building an interactive daskboard for the browser using Python
  • the ESIP Python community will learn more about the power of the PyViz and Intake to aid their scientific workflows
@martindurant

This comment has been minimized.

Copy link

commented May 6, 2019

Good news - all of the exploratory work has paid off :)

@ZER-0-NE

This comment has been minimized.

Copy link

commented May 6, 2019

Congratulations @hdsingh :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.