# Jupyter (IPython) Advanced Features
---

Outline
- Keyboard shortcuts
- Magic
- Accessing the underlying operating system
- Using different languages inside single notebook
- File magic
- Using Jupyter more efficiently
- Profiling
- Output
- Automation
- Extensions
- 'Big Data' Analysis
    

Sources: [IPython Tutorial](https://github.com/ipython/ipython-in-depth/blob/pycon-2019/1%20-%20Beyond%20Plain%20Python.ipynb), [Dataquest](https://www.dataquest.io/blog/advanced-jupyter-notebooks-tutorial/), and [Dataquest](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/), [Alex Rogozhnikov Blog](http://arogozhnikov.github.io/2016/09/10/jupyter-features.html) [Toward Data Science](https://towardsdatascience.com/how-to-effortlessly-optimize-jupyter-notebooks-e864162a06ee)

---

## Keyboard Shortcuts


Keyboard Shortcuts

As in the classic Notebook, you can navigate the user interface through keyboard shortcuts. You can find and customize the current list of keyboard shortcuts by selecting the Advanced Settings Editor item in the Settings menu, then selecting Keyboard Shortcuts in the Settings tab.

### Shortcut Keys for Jupyter lab

While working with any tools, it helps if you know shortcut key to perform most frequent tasks. It increases your productivity and can be very comfortable while working. I have listed down some of the shortcuts which I use frequently while working on Jupyter Lab. Hopefully, it will be useful for others too. Also, you can check full list of shortcut by accessing the __commands tab__ in the Jupyter lab. You will find it below the Files on the left hand side.

1.  **ESC** takes users into command mode view while **ENTER** takes users into cell mode view.
2.  **A** inserts a cell above the currently selected cell. Before using this, make sure that you are in command mode (by pressing ESC).
3.  **B** inserts a cell below the currently selected cell. Before using this make sure that you are in command mode (by pressing ESC).
4.  **D + D** = Pressing D two times in a quick succession in command mode deletes the currently selected cell. 
5.  Jupyter Lab gives you an option to change your cell into Code cell, Markdown cell or Raw Cell. You can use **M** to change current cell to a markdown cell, **Y** to change it to a code cell and  **R** to change it to a raw cell.
6.  ****CTRL + B**** = Jupyter lab has two columns design. One column is for launcher or code blocks and another column is for file view etc. To increase workspace while writing code, we can close it.  **CTRL + B** is the shortcut for toggling the file view column in the Jupyter lab.
7.  **SHIFT + M** = It merges multiple selected cells into one cell. 
8.  **CTRL + SHIFT + –** = It splits the current cell into two cells from where your cursor is.
9.  **SHIFT+J** or **SHIFT + DOWN** = It selects the next cell in a downward direction.  It will help in making multiple selections of cells.
10.  **SHIFT + K** or **SHIFT + UP** = It selects the next cell in an upwards direction. It will help in making multiple selections of cells.
11.  **CTRL +** / = It helps you in either commenting or uncommenting any line in the Jupyter lab. For this to work, you don’t even need to select the whole line. It will comment or uncomment line where your cursor is. If you want to do it for more that one line then you will need to first select all the line and then use this shortcut.

A PDF!!!
- https://blog.ja-ke.tech/2019/01/20/jupyterlab-shortcuts.html
- https://github.com/Jakeler/jupyter-shortcuts

## Magics

---

Magics are turning simple python into *magical python*. Magics are the key to power of ipython.

Magic functions are prefixed by % or %%, and typically take their arguments without parentheses, quotes or even commas for convenience.  Line magics take a single % and cell magics are prefixed with two %%.

#### What is Magic??? Information about IPython's 'magic' % functions.

In [None]:
%magic

#### List available python magics

In [None]:
%lsmagic

#### %env
You can manage environment variables of your notebook without restarting the jupyter server process. Some libraries (like theano) use environment variables to control behavior, %env is the most convenient way.

In [None]:
# %env - without arguments lists environmental variables
%env OMP_NUM_THREADS=4

# Accessing the underlying operating system

---

## Executing shell commands

You can call any shell command. This in particular useful to manage your virtual environment.

In [None]:
!pip install numpy

In [None]:
!pip list | grep Theano

## Adding packages can also be done using...

%conda install numpy

%pip install numpy

will attempt to install packages in the current environment.

In [None]:
!pwd

In [None]:
%pwd

In [None]:
pwd

In [None]:
files = !ls .
print("files in notebooks directory:")
print(files)

In [None]:
!echo $files

In [None]:
!echo {files[0].upper()}

Note that all this is available even in multiline blocks:

In [None]:
import os
for i,f in enumerate(files):
    if f.endswith('ipynb'):
        !echo {"%02d" % i} - "{os.path.splitext(f)[0]}"
    else:
        print('--')

## I could take the same list with a bash command

because magics and bash calls return python variables:

In [None]:
names = !ls ../images/ml_demonstrations/*.png
names[:5]

## Suppress output of last line

sometimes output isn't needed, so we can either use `pass` instruction on new line or semicolon at the end 

%conda install matplotlib

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy

In [None]:
# if you don't put semicolon at the end, you'll have output of function printed

plt.hist(numpy.linspace(0, 1, 1000)**1.5);

# Using different languages inside single notebook

---

If you're missing those much, using other computational kernels:

- %%python2
- %%python3
- %%ruby
- %%perl
- %%bash
- %%R

is possible, but obviously you'll need to setup the corresponding kernel first.

In [None]:
%%ruby
puts 'Hi, this is ruby.'

In [None]:
%%bash
echo 'Hi, this is bash.'

## Running R code in Jupyter notebook

#### Installing R kernel

Easy Option: Installing the R Kernel Using Anaconda
If you used Anaconda to set up your environment, getting R working is extremely easy. Just run the below in your terminal:

In [None]:
# %conda install -c r r-essentials

#### Running R and Python in the same notebook.

The best solution to this is to install rpy2 (requires a working version of R as well), which can be easily done with pip:

In [None]:
%pip install rpy2

You can then use the two languages together, and even pass variables inbetween:

In [None]:
%load_ext rpy2.ipython

In [None]:
%R require(ggplot2)

In [None]:
import pandas as pd
df = pd.DataFrame({
        'Letter': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
        'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
        'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
        'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]
    })

In [None]:
%%R -i df
ggplot(data = df) + geom_point(aes(x = X, y= Y, color = Letter, size = Z))

## Writing functions in cython (or fortran)

Sometimes the speed of numpy is not enough and I need to write some fast code. 
In principle, you can compile function in the dynamic library and write python wrappers...

But it is much better when this boring part is done for you, right?

You can write functions in cython or fortran and use those directly from python code.

First you'll need to install:
```
%pip install cython 
```

In [None]:
%pip install cython

In [None]:
%load_ext Cython

In [None]:
%%cython
def myltiply_by_2(float x):
    return 2.0 * x

In [None]:
myltiply_by_2(23.)

I also should mention that there are different jitter systems which can speed up your python code.
More examples in [my notebook](http://arogozhnikov.github.io/2015/09/08/SpeedBenchmarks.html). 


For more information see the IPython help at: [Cython](https://github.com/ipython/ipython-in-depth/blob/pycon-2019/6%20-%20Cross-Language-Integration.ipynb)

# File magic

%%writefile Export the contents of a cell

In [None]:
%%writefile?

`%pycat` ill output in the pop-up window:
```
Show a syntax-highlighted file through a pager.

This magic is similar to the cat utility, but it will assume the file
to be Python source and will show it with syntax highlighting.

This magic command can either take a local filename, an url,
an history range (see %history) or a macro as argument ::

%pycat myscript.py
%pycat 7-27
%pycat myMacro
%pycat http://www.example.com/myscript.py
```

## %load 
loading code directly into cell. You can pick local file or file on the web.

After uncommenting the code below and executing, it will replace the content of cell with contents of file.


In [None]:
# %load https://matplotlib.org/_downloads/f7171577b84787f4b4d987b663486a94/anatomy.py

## %run to execute python code

%run can execute python code from .py files &mdash; this is a well-documented behavior. 

But it also can execute other jupyter notebooks! Sometimes it is quite useful.

NB. %run is not the same as importing python module.

In [None]:
# this will execute all the code cells from different notebooks
%run ./matplotlib-anatomy.ipynb

# Using Jupyter more efficiently

---

## Store Magic - %store: lazy passing data between notebooks

%store lets you store your macro and use it across all of your Jupyter Notebooks.

In [None]:
data = 'this is the string I want to pass to different notebook'
%store data
del data # deleted variable

In [None]:
# in second notebook I will use:
%store -r data
print(data)

## %who: analyze variables of global scope

In [None]:
%whos

In [None]:
# pring names of string variables
%who str

## Multiple cursors

Since recently jupyter supports multiple cursors (in a single cell), just like sublime ot intelliJ! __Alt + mouse selection__ for multiline selection and __Ctrl + mouse clicks__ for multicursors.

<img src='./images/jupyter/multi-cursor.gif' />

Gif taken from http://swanintelligence.com/multi-cursor-in-jupyter.html

## Timing 

When you need to measure time spent or find the bottleneck in the code, ipython comes to the rescue.

%%time
import time
time.sleep(2) # sleep for two seconds

In [None]:
# measure small code snippets with timeit !
import numpy
%timeit numpy.random.normal(size=100)

In [None]:
%%writefile pythoncode.py

import numpy
def append_if_not_exists(arr, x):
    if x not in arr:
        arr.append(x)
        
def some_useless_slow_function():
    arr = list()
    for i in range(10000):
        x = numpy.random.randint(0, 10000)
        append_if_not_exists(arr, x)

In [None]:
# shows highlighted source of the newly-created file
%pycat pythoncode.py

In [None]:
from pythoncode import some_useless_slow_function, append_if_not_exists

## Hiding code or output

- Click on the blue vertical bar or line to the left to collapse code or output

## Commenting and uncommenting a block of code

You might want to add new lines of code and comment out the old lines while you’re working. This is great if you’re improving the performance of your code or trying to debug it.
- First, select all the lines you want to comment out.
- Next hit cmd + / to comment out the highlighted code!

## Pretty Print all cell outputs

Normally only the last output in the cell will be printed. For everything else, you have to manually add print(), which is fine but not super convenient. You can change that by adding this at the top of the notebook:

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Profiling: %prun, %lprun, %mprun
---

See a much longer explination of profiling and timeing in Jake Vanderplas' Python Data Science Handbook: 
https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html

In [None]:
# shows how much time program spent in each function
%prun some_useless_slow_function()

Example of output:
```
26338 function calls in 0.713 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.684    0.000    0.685    0.000 pythoncode.py:3(append_if_not_exists)
    10000    0.014    0.000    0.014    0.000 {method 'randint' of 'mtrand.RandomState' objects}
        1    0.011    0.011    0.713    0.713 pythoncode.py:7(some_useless_slow_function)
        1    0.003    0.003    0.003    0.003 {range}
     6334    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.713    0.713 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
```

In [None]:
# %load_ext memory_profiler ???

In [None]:
To profile memory, you can install and run pmrun

# %pip install memory_profiler
# %pip install line_profiler

In [None]:
# tracking memory consumption (show in the pop-up)
# %mprun -f append_if_not_exists some_useless_slow_function()

Example of output:
```
Line #    Mem usage    Increment   Line Contents
================================================
     3     20.6 MiB      0.0 MiB   def append_if_not_exists(arr, x):
     4     20.6 MiB      0.0 MiB       if x not in arr:
     5     20.6 MiB      0.0 MiB           arr.append(x)
```

**%lprun** is line profiling, but it seems to be broken for latest IPython release, so we'll manage without magic this time:

```python
import line_profiler
lp = line_profiler.LineProfiler()
lp.add_function(some_useless_slow_function)
lp.runctx('some_useless_slow_function()', locals=locals(), globals=globals())
lp.print_stats()
```

## Debugging with %debug

Jupyter has own interface for [ipdb](https://docs.python.org/2/library/pdb.html). Makes it possible to go inside the function and investigate what happens there.

This is not pycharm and requires much time to adapt, but when debugging on the server this can be the only option (or use pdb from terminal).

In [None]:
#%%debug filename:line_number_for_breakpoint
# Here some code that fails. This will activate interactive context for debugging

A bit easier option is `%pdb`, which activates debugger when exception is raised:

In [None]:
# %pdb

# def pick_and_take():
#     picked = numpy.random.randint(0, 1000)
#     raise NotImplementedError()
    
# pick_and_take()

# Output
---

## [RISE](https://github.com/damianavila/RISE): presentations with notebook

Extension by Damian Avila makes it possible to show notebooks as demonstrations. Example of such presentation:   http://bollwyvl.github.io/live_reveal/#/7

It is very useful when you teach others e.g. to use some library.


## Jupyter output system

Notebooks are displayed as HTML and the cell output can be HTML, so you can return virtually anything: video/audio/images. 

In this example I scan the folder with images in my repository and show first five of them:

In [None]:
import os
from IPython.display import display, Image
names = [f for f in os.listdir('../images/') if f.endswith('.png')]
for name in names[:5]:
    display(Image('../images/' + name, width=300))

## Write your posts in notebooks

Like this one. Use `nbconvert` to export them to html.

# [Jupyter-contrib extensions](https://github.com/ipython-contrib/jupyter_contrib_nbextensions)

are installed with 
```
!pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
!pip install jupyter_nbextensions_configurator
!jupyter contrib nbextension install --user
!jupyter nbextensions_configurator enable --user
```

<img src='./images/jupyter/nbextensions.png' />

this is a family of different extensions, including e.g. **jupyter spell-checker and code-formatter**, 
that are missing in jupyter by default. 

## Reconnect to kernel

Long before, when you started some long-taking process and at some point your connection to ipython server dropped, 
you completely lost the ability to track the computations process (unless you wrote this information to file). So either you interrupt the kernel and potentially lose some progress, or you wait till it completes without any idea of what is happening.

`Reconnect to kernel` option now makes it possible to connect again to running kernel without interrupting computations and get the newcoming output shown (but some part of output is already lost).

# Big data analysis

A number of solutions are available for querying/processing large data samples: 
- [ipyparallel (formerly ipython cluster)](https://github.com/ipython/ipyparallel) is a good option for simple map-reduce operations in python. We use it in [rep](github.com/yandex/rep) to train many machine learning models in parallel
- [pyspark](http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html)
- spark-sql magic [%%sql](https://github.com/jupyter-incubator/sparkmagic)

Additional Resources:

*   IPython [built-in magics](https://ipython.org/ipython-doc/3/interactive/magics.html)
*   Nice [interactive presentation about jupyter](http://quasiben.github.io/dfwmeetup_2014/#/) by Ben Zaitlen
*   Advanced notebooks [part 1: magics](https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/) and [part 2: widgets](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/)
*   [Profiling in python with jupyter](http://pynash.org/2013/03/06/timing-and-profiling/)
*   [4 ways to extend notebooks](http://mindtrove.info/4-ways-to-extend-jupyter-notebook/)
*   [IPython notebook tricks](https://www.quora.com/What-are-your-favorite-tricks-for-IPython-Notebook)
*   [Jupyter vs Zeppelin for big data](https://www.linkedin.com/pulse/comprehensive-comparison-jupyter-vs-zeppelin-hoc-q-phan-mba-)
*   [Making publication ready Python notebooks](http://blog.juliusschulz.de/blog/ultimate-ipython-notebook).
*   https://yoursdata.net/installing-and-configuring-jupyter-lab-on-windows/