Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add Jupyter notebook analysis examples #1096

Draft
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

robintw
Copy link
Collaborator

@robintw robintw commented Nov 29, 2021

Very rough work at the moment, and won't be merged in this state - but pushed so you can see the full notebooks

🧰 Issue

Fixes #1078

🚀 Overview:

🔗 Link to preview (or screenshot, if relevant)

🤔 Reason:

🔨Work carried out:

  • Tests pass

🖥️ Screenshot

Confirmations

  • I have chosen reviewers for my PR.
  • I have chosen an appropriate label for the PR, adding interactive_review if reviewers will need to see UI
  • I have extended/updated the documentation in \docs folder
  • Any database content changes (Create, Edit, Delete) are recorded in the Log/Changes tables
  • Any database schema changes are implemented via alembic revision transitions
  • I have completed the mandatory sections of this document.
  • I have deleted any unused sections.

📝 Developer Notes:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@IanMayo IanMayo temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c November 29, 2021 21:11 Inactive
@codecov
Copy link

codecov bot commented Nov 29, 2021

Codecov Report

Merging #1096 (9151464) into develop (3f625e3) will decrease coverage by 0.69%.
The diff coverage is 32.31%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1096      +/-   ##
===========================================
- Coverage    80.10%   79.40%   -0.70%     
===========================================
  Files          110      112       +2     
  Lines        11931    12092     +161     
===========================================
+ Hits          9557     9602      +45     
- Misses        2374     2490     +116     
Impacted Files Coverage Δ
importers/ais_marine_cadastre_importer.py 27.58% <27.58%> (ø)
importers/ais_dstl_importer.py 33.80% <33.80%> (ø)
pepys_admin/admin_cli.py 78.83% <66.66%> (-0.28%) ⬇️
pepys_import/core/store/data_store.py 96.35% <100.00%> (ø)
pepys_import/file/file_processor.py 97.77% <0.00%> (-1.27%) ⬇️
pepys_import/resolvers/command_line_resolver.py 94.60% <0.00%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85b7587...9151464. Read the comment docs.

@IanMayo
Copy link
Member

IanMayo commented Nov 30, 2021

@robintw - would we get a live version of this if we added theinteractive_review label?

Oops, I see we have a live version, even without the interactive_review label. Hmmm. Aah, no. That's a live preview of timeline - not a hosted notebook instance.

@IanMayo IanMayo added the interactive_review Triggers an interactive review via Binder label Nov 30, 2021
@github-actions
Copy link

github-actions bot commented Nov 30, 2021

🚀 Interactive demo available via Binder logo.

Note: The Jupyter notebook browser will open in the demos folder. See comments above for which notebook to use,
otherwise use the _Default_Demo.ipynb notebook.

@IanMayo
Copy link
Member

IanMayo commented Nov 30, 2021

@robintw - I added the label to create the interactive preview. The preview loaded, but couldn't find pandas. Does the environment need to load more dependencies? Aah, it looks like the dependency needs to be added, since pandas wasn't used before this PR.

image

@IanMayo IanMayo temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 1, 2021 12:29 Inactive
@IanMayo
Copy link
Member

IanMayo commented Dec 1, 2021

Aah, one other change, please @robintw

Can we move the notebook to the demo sub-folder please? That's the folder that Binder opens up in.

@IanMayo IanMayo temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 1, 2021 12:38 Inactive
@IanMayo IanMayo temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 1, 2021 12:47 Inactive
@IanMayo IanMayo temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 1, 2021 13:09 Inactive
@IanMayo
Copy link
Member

IanMayo commented Dec 1, 2021

Ok, we can forget about using myBinder. It only allows http access to other servers - so it can't connect to trac-store:
jupyterhub/binderhub#982

The only way to do it would be to put a REST wrapper around the database, but we don't have a justification for that right now.

Never mind.

Aah, or we connect to a local SQLite database. That would prevent the use of pepys.states_for. Hmm, we may choose to produce a python API to avoid the use of SQL. I guess that pepys.states_for is already hiding quite a lot of SQL. Ok, a huge amount of SQL.

Update: I discussed the use of SQLite with @robintw - but he indicated that geopandas didn't support SQLite. Aah, hold on. It may be possible if conducted via the GeoDataFrame:
image

Source here

@IanMayo
Copy link
Member

IanMayo commented Dec 2, 2021

Aah, one other piece of feedback. You're currently using the demonstrator to develop ways of plotting Pepys data. But, once it has started to mature, could you add inline code comments? I'm looking at plot_platform and plot_datafile - but I'm not familiar with some of the pandas/geopandas API calls.

@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 2, 2021 21:20 Inactive
@robintw
Copy link
Collaborator Author

robintw commented Dec 2, 2021

Just a heads up @IanMayo that supporting the libraries we're using in Jupyter like geopandas is going to require some changes to our procedures for creating a deployable release. geopandas depends on fiona which depends on GDAL, and GDAL is always a pain to install on Windows. I think we'll have to package some of the .whl files to install GDAL/fiona on Windows in the Pepys repository.

Anyway, just a heads up. I will have a go at fixing it sometime, but at the moment I'll just focus on getting some interesting stuff done in the notebook, as playing with GDAL installs has already taken up far too much of my life (!).

@IanMayo
Copy link
Member

IanMayo commented Dec 3, 2021

I've done some reading up on installing GDAL on Windows @robintw. It appears to be sensitive to the version of Python that is in use. Fortunately, we provide the local Python installation - so that takes one problem away.

@IanMayo
Copy link
Member

IanMayo commented Dec 3, 2021

Aah, @robintw - I just did some research. Google Collab appears to allow connections to external Postgres databases.

But, I can only load pepys_import into the notebook via !pip install pepys_import. That worked ok, but it's the released version - which doesn't include your modification to make the connection string persistent. Maybe it's worth making that change in a separate PR - so I can play with your notebook as it matures. This should be a working link to the collab version:
https://colab.research.google.com/drive/17Xk4v2LWJHHVXguqq_QrarCafKbuFPU8?usp=sharing

Aah :-(. It appears the user needs to be logged into Google to play the Collab sheet, which analysts can't do from their normal desktop IT.

@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c December 14, 2021 13:36 Inactive
@robintw
Copy link
Collaborator Author

robintw commented Dec 14, 2021

Notebook pushed now @IanMayo. It's definitely not tidy, but it should give an idea. The majority of the work of the 'find vessels within X distance' is implemented in a SQL query - the rest is just plumbing.

@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c January 31, 2022 19:09 Inactive
@robintw
Copy link
Collaborator Author

robintw commented Jan 31, 2022

I've done some more work on this to split the Jupyter demos out into separate notebooks in the demos folder - they should be a bit clearer now.

The raw Jupyter notebooks don't really help you to get the 'full experience' as they need running to get any of their interactivity. You added the relevant extra requirements to the requirements.txt file a while back, but this has caused errors in the create_deployment script as some of the dependencies require GDAL, and that can be a pain to install on Windows. I think the easiest way around this is to install GDAL and Fiona (which is based on GDAL) using a wheel file from here, like we did in #300 for distlib. Does that sound ok to you?

In the meantime, you can probably get the relevant dependencies installed ok on OS X or Linux if you want to try out the notebooks interactively.

In terms of the work to deal with dependencies and also how to run Jupyter notebooks on Windows from the right folder, with the right modules: is that something we need to deal with now, or can it be left for the moment for these 'proof-of-concept' demos? They are all solvable problems, but I don't know whether I should spend time on them now, or focus more on showing the other stuff we can do within notebooks. Thoughts @IanMayo?

@IanMayo
Copy link
Member

IanMayo commented Feb 1, 2022

Thanks @robintw .

That's a puzzler. My gut feeling would be to push ahead with the examples - since that's where the innovation happens. But, I fear that would mean introducing technical debt that could grow as the examples get more diverse.

So, please let's resolve the MS-Windows deployment issue. If our (your) work gets terminated then at least the analysts will have fully working examples to develop from.

@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 2, 2022 19:57 Inactive
@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 2, 2022 20:06 Inactive
@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 2, 2022 20:15 Inactive
@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 2, 2022 20:32 Inactive
@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 2, 2022 20:34 Inactive
@robintw robintw temporarily deployed to pepys-timeli-jupyter-pl-oo5g3c February 4, 2022 20:30 Inactive
@robintw
Copy link
Collaborator Author

robintw commented Feb 6, 2022

A quick update @IanMayo: I've now got the deployable release updated to include the required dependencies for Jupyter, and have added an option Pepys Admin to allow the user to start a Jupyter notebook server based in their home directory (C:\Users\Username on Windows):

image

@robintw
Copy link
Collaborator Author

robintw commented Feb 7, 2022

Also, an update on my pandas PR. It has been closed as apparently there is a duplicate PR, and they want to postpone this work to pandas 2.0. See here for details.

@IanMayo
Copy link
Member

IanMayo commented Feb 7, 2022

Also, an update on my pandas PR. It has been closed as apparently there is a duplicate PR, and they want to postpone this work to pandas 2.0. See here for details.

Had a quick look, and the "duplicate" doesn't seem to offer as much value/improvement as yours does. I hope they come back & use your contribution.

@robintw robintw removed the interactive_review Triggers an interactive review via Binder label Feb 7, 2022
@robintw
Copy link
Collaborator Author

robintw commented Feb 28, 2022

@IanMayo I've been working on the task to get interactive mapping working when disconnected from the internet. The first task was to make the folium library work offline - it usually downloads the JS/CSS files that are needed for the map (leaflet etc) from web-based CDNs. There was a bit of information online about making this work offline, but it was all rather nasty to use, and I have managed to tidy this up into a module called offline_folium.

The way this works is that you do a simple import from offline_folium, and then this automatically sets up folium to load the JS/CSS from local files rather than remote files. To get these local files you can run the module on the command-line, which will download them into the package folders (we could run this during the 'create deployment' process). Once you've done the import, folium works as usual, and will work completely offline.

I think it would be best to put this module in a separate repo. It needs to be a separate module because of how I've hooked into the module's import system to get it to alter things in folium. I also think that this could be a useful module to open-source, as others may find it useful - it's quite simple, but took a little while to get working correctly. What do you think? If you think this is a sensible way forward then could you create a repo called offline_folium, and I'll then put the code in there (you can choose the license).

I haven't had as much time as I'd like to work on this recently, but my next plans are to tackle providing some sort of offline background mapping (GeoTIFFs of charts or GeoJSONs of coastlines) for the maps, and then to look at other ways of interactively exploring the data.

@IanMayo
Copy link
Member

IanMayo commented Mar 1, 2022

@robintw - your strategy sounds good. Have created the repo - and invited you to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proof of Concept for Pepys in Jupyter
2 participants