# Jupyter Lab for Researchers
### Colby Witherup Wood
September 11, 2019

### Outline
1. Introduction to Jupyter Lab
2. Jupyter Lab for beginner, intermediate, and advanced researchers
3. What's new in Jupyter Lab
4. Limitations and conclusions

### Jupyter Notebooks

- [very popular with academic researchers](https://www.nature.com/articles/d41586-018-07196-1)
- code is presented alongside output and markdown
- over 3 million public Jupyter Notebooks on GitHub

[astronomy notebook](https://github.com/eleanorlutz/asteroids_atlas_of_space/blob/master/4_log_scale_plotting.ipynb)

[RNAseq Zika notebook](https://github.com/MaayanLab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb)

#### Notebooks are still the main draw of Jupyter Lab, but there are many new features outside of the notebooks.

### Project Jupyter

Project Jupyter developed out of the IPython Project.

Project Jupyter deals with developing tools that are language-agnostic.

Project Jupyter is non-profit, open-source, and free.

Jupyter originally stood for Julia, Python, and R, but now works with many other languages.

#### Jupyter Notebook will soon cease to be supported as a stand-alone program, forcing all notebook users to switch to Jupyter Lab (which includes notebooks as one feature)

### Jupyter Lab

- browser based, extendable IDE
- [over 100 kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) are available for different programming languages

Contains 4 main tools:
- consoles for interactive coding in the kernels of your choice
- a text editor with syntax highlighting, key maps, indentation preferences, etc.
- notebooks in your kernels of choice
- a terminal (bash or PowerShell)

Also contains a file tree and several bonus features:
- opens most file types inside the browser tab (jpeg, png, pdf, gif, json)
- opens large csv files (even larger than Excel can handle)
- has many extensions available, both for notebooks, which we will talk about later, and for the rest of Jupyter Lab
- additional extensions are coming out, both from Project Jupyter and from other developers and users
- has extensions for features you will see in other IDEs, like GitHub, Vim, and LaTex, but also features that you won't find in other IDEs

### Jupyter Lab for Beginning Coders

- browser/tab based, so familiar to new coders

- free, included in anaconda, works well with both Macs and PCs

- clean, uncomplicated interface

- has Terminal, text editor, console

- browser based gives easy access to Google

- has all color-coding, bracket-completion, tab-completion, etc. of an IDE

- we've used it in some of the beginning and next steps Python workshops this summer through RCS, so some beginners at NU will already be familiar

### Jupyter Lab for Intermediate Coders

- multiple languages can be read, written, and run from one IDE

- over 3 million notebooks are available online to learn new skills and borrow code

### Jupyter Lab for Advanced Researchers

#### Teaching
- Create interactive notebooks to [supplement coursework](https://github.com/aleksicmil/Pima-Indians-Diabetes) with hands-on data analysis
- Create interactive tutorials for programming skills for [beginning](https://github.com/aGitHasNoName/intro-python-summer2019/blob/master/learningPandas.ipynb) or [advanced](https://github.com/fastai/numerical-linear-algebra) topics

#### Collaboration
- Share your analyses with coauthors in a narrative format
- Access Google Drive files ([or a remote server](https://benjlindsay.com/blog/running-jupyter-lab-remotely/)) shared by lab members or coauthors
- Use [Jupyter Hub](https://jupyter.org/hub) to set up one Jupyter Lab environment for many users on a single server or cluster
- Join the [Jupyter for Research Facilities discussion forum](https://groups.google.com/forum/?pli=1#!forum/jupyter-research-facilities)

#### Analysis, results, and publication
- Use data visualization tools and machine learning on huge data sets [with HPC](https://blog.jupyter.org/jupyter-for-science-user-facilities-and-high-performance-computing-de178106872)
- Combine text, code, and visuals in one narrative
- Create interactive software tools for analysis and visualization that can be made available to other researchers

### New and Useful Tools

### [Google Drive extension](https://github.com/jupyterlab/jupyterlab-google-drive)
Access files straight from you Google Drive account

### [Draw.io extension](https://github.com/QuantStack/jupyterlab-drawio)
Draw simple figures and workflows to include in your notebooks

In [15]:
IFrame(src='https://www.youtube.com/embed/CJH34I01cKA', width='780', height='424')

### [Google Facets extension](https://pair-code.github.io/facets/)
Summarize, explore, filter, and visualize your data, particularly for ML datasets

### [Notify](https://github.com/ShopRunner/jupyter-notify) 
Sends a notification to your screen when a long code block stops running

### New to Notebooks
- Drag and drop cells
- Organize your Lab in either tabs or tiles
- View the same notebook side by side
- Collapse/expand cells
- Export notebooks to more file types (LaTex, executible script, pdf, html, slides)

### [Interactive widgets](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Basics.html) (like Shiny)
- sliders
- range sliders
- buttons
- text boxes
- progress bars
- color picker
- tabs and accordion cells

#### Example of interactive widget:

In [19]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import IFrame
from ipywidgets.embed import embed_minimal_html

In [3]:
df = pd.read_csv("q3CleanedData.csv")
df = df.set_index("id")

In [4]:
def makeBarChart(num):
    """Function to make bar graph with means of party, gender, and total, plus actual values
    for lowest values for column provided. Number of lowest values is also provided."""
    fig, ax = plt.subplots(figsize=(4,6))
    means = [df.groupby("party")["vote_pct"].mean()["D"], df.groupby("party")["vote_pct"].mean()["R"], 
             df.groupby("gender")["vote_pct"].mean()["M"], df.groupby("gender")["vote_pct"].mean()["F"],
             df["vote_pct"].mean()]
    labels = ["Democrat", "Republican", "Male", "Female", "All Senators"]
    colors = ["cornflowerblue", "tomato", "mediumpurple", "mediumpurple", "mediumpurple"]
    lowestdf = df.nsmallest(num, "vote_pct")
    lowest_list = list(lowestdf.index)
    for senator in lowest_list:
        means.append(lowestdf.loc[senator]["vote_pct"])
        if lowestdf.loc[senator].party == "R":
            colors.append("tomato")
        else:
            colors.append("cornflowerblue")
        labels.append(lowestdf.loc[senator].first_name + " " +lowestdf.loc[senator].last_name)
    ax.barh(labels, means, label=labels, color=colors)
    ax.invert_yaxis()
    ax.set_title("Percentage of votes attended since 2010 among all senators")
    plt.show()
    
def makeBarChartYear(num):
    """Function to make bar graph with means of party, gender, and total, plus actual values
    for lowest values for column provided. Number of lowest values is also user provided."""
    fig, ax = plt.subplots(figsize=(4,6))
    means = [df.groupby("party")["vote_pct"].mean()["D"], df.groupby("party")["vote_pct"].mean()["R"], 
             df.groupby("gender")["vote_pct"].mean()["M"], df.groupby("gender")["vote_pct"].mean()["F"],
             df["vote_pct"].mean()]
    labels = ["Democrat", "Republican", "Male", "Female", "All Senators"]
    colors = ["cornflowerblue", "tomato", "mediumpurple", "mediumpurple", "mediumpurple"]
    
    lowyeardf = df[df.term_length > 1].nsmallest(num, "vote_pct")
    lowyear_list = list(lowyeardf.index)
    for senator in lowyear_list:
        means.append(lowyeardf.loc[senator]["vote_pct"])
        if lowyeardf.loc[senator].party == "R":
            colors.append("tomato")
        else:
            colors.append("cornflowerblue")
        labels.append(lowyeardf.loc[senator].first_name + " " +lowyeardf.loc[senator].last_name)
    ax.barh(labels, means, label=labels, color=colors)
    ax.invert_yaxis()
    ax.set_title("Percentage of votes attended since 2010 among senators with over 1 year of service")
    plt.show()
    
interactive_plot = interactive(makeBarChart, 
                               num=widgets.IntSlider(value=3,
                                                     min=0,
                                                     max=10,
                                                     step=1,
                                                     description="Lowest:",
                                                     continuous_update=False))

interactive_plot_year = interactive(makeBarChartYear, 
                               num=widgets.IntSlider(value=3,
                                                     min=0,
                                                     max=10,
                                                     step=1,
                                                     description="Lowest:",
                                                     continuous_update=False))

In [21]:
accordion = widgets.Accordion(children=[interactive_plot, interactive_plot_year])
accordion.set_title(0, 'Senators with lowest voting percentage')
accordion.set_title(1, 'Senators with lowest voting percentage (over 1 year of service)')
display(accordion)
embed_minimal_html('export.html', views=[accordion], title='Widgets export')

Accordion(children=(interactive(children=(IntSlider(value=3, continuous_update=False, description='Lowest:', m…

#### There currently isn't a straightforward way to export your interactive widgets to retain their interactive nature outside of a notebook. It can be done, but it's not seamless yet.
- [nbinteract](https://www.nbinteract.com/tutorial/tutorial_github_setup.html) converts notebooks to interactive html pages with embedded JavaScript libraries. Pages can then be hosted on the web (ex. GitHub Pages). But, some features require torubleshooting or may not work.
- [Voila](https://github.com/QuantStack/voila) is a package to convert and host notebooks containing widgets. But viewers must also have voila.

## Limitations of Jupyter Lab

- Beginners may be tempted to use notebooks for everything. Counterpoint: Is it better for beginners to use notebooks for everything or use the console for everything?

- Some tools have been developed specifically for Python, and not other languages

- Encourages notebook development rather than software development
<br> Counterpoint: makes these tools available for more researchers who aren't going to develop the skills of a software developer or who don't have the funds to hire a software developer

- Researchers can lean on notebooks and never learn proper script-writing and code-testing techniques. Counterpoint: Jupyter Lab has all the tools to code without notebooks

- Unlike scripts, notebooks cannot be combined into computational pipelines. Counterpoint: Jupyter Lab has all the tools to code without notebooks

- [Notebooks don't do well with version control conflicts](https://nextjournal.com/schmudde/how-to-version-control-jupyter)

- Doesn't have the layout options of Shiny

- Doesn't have the hosting abilities of Shiny and is more difficult to convert to interactive html.

- Extensions can have some dependencies and be mildly annoying to install.

## Summary

Jupyter Lab brings the well-loved notebooks into a clean, simple IDE that can be expanded with extensions. Perks are the included terminal, ability to code in multiple language consoles in one IDE, and the Jupyter widgets to create interactive notebooks. Negatives include the notebooks don't encourage good coding habits and the interactive notebooks don't yet rival Shiny apps. 

Jupyter Lab is a good choice for researchers who are just starting to code, especially in Python, as it includes a text editor, terminal, and console. 

Researchers who already love Notebooks, and who will soon be forced to used Jupyter Lab, will have an easy transition, as the design is minimalist.

Jupyter Lab is a good choice for researchers who need to visualize data and results from very large datasets that are stored on remote servers or Google Drive, and for professors looking to build textbooks that include code in Python or other languages that don't already have developed markdown software.

More advanced Python coders who conduct research in all disciplines can feel confident that the functionality of Jupyter Lab is going to continue to improve exponentially as new features are added by Jupyter and new extensions are developed by users.

## Questions?