# Jupyter Notebook

[Jupyter](http://jupyter.org/) notebooks are a way that you can have code, text, images, and math all live together in harmony. This concept can be called "[Literate Proramming](https://en.wikipedia.org/wiki/Literate_programming)", where all code that is written also has comments or an explanation of *why* it was written (which is not always obvious). In particular, the Jupyter notebook allows you to write some code, run (evaluate) it, and see the output all in once. This is called a "read-eval-print loop" or [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop).


By the end of this notebook, you will have...

* Installed Anaconda Python and R on your TSCC account
* Created a `biom262` specific environment that has only your specific packages
* Started a Jupyter notebook server on TSCC
* Viewed your remotely hosted (i.e. living on TSCC) Jupyter notebooks on your personal laptop

* * *


## Log in to TSCC

```
ssh ucsd-train##@tscc.sdsc.edu
```

### Mac El Capitan only

There's some issue with Mac El Capitan where you keep getting asked for the password. To fix this, you'll need to do `ssh-add` to add your private key again:

```
ssh-add ~/.ssh/biom262_rsa.pri
```

Remember that this means that this file `~/.ssh/biom262_rsa.pri` has to already exist. If not, you will need to follow [these](2_tscc_setup.ipynb#Set-up-your-logins) instructions again to set it up.

I suggest adding this command to your `~/.bash_profile`:

```
nano ~/.bash_profile
```

And add the line to the end of the file:

```
ssh-add ~/.ssh/biom262_rsa.pri
```

So this will get run EVERY time you open a new terminal window or tab and you (shouldn't) ever have to do the stupid `ssh-add` thing ever again!


## Install Anaconda Python, R, and Jupyter to your account on TSCC


Download the Anaconda Python/R package manager using `wget` (web-get). The link below is from the Anaconda downloads [page](https://www.continuum.io/downloads). This takes some time..

```
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda3-2.4.1-Linux-x86_64.sh
```

To install Anaconda, run the shell script with bash (this will take some time). It will ask you a bunch of questions, and use the defaults for them (press enter for all)

```
bash Anaconda3-2.4.1-Linux-x86_64.sh
```
If you get this warning:
```
WARNING:
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Anaconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Anaconda3: /home/ucsd-train46/anaconda3
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/ucsd-train46/.bashrc ? [yes|no]
[no] >>> 
```

Type "`yes`" that you do want Anaconda3 to get installed.

Anaconda may say that you need to open another terminal window to activate it - don't listen to it. Follow the instructions below to activate.

**IMPORTANT: Make sure this command truly added lines to your `~/.bashrc` file** 

This has added the folder `~/anaconda3` to your system and added stuff to your `$PATH` variable in `~/.bashrc`, but your current `$PATH` variable has not been updated, and therefore the terminal has no idea where this newfangled thing is. If you try to do any `conda` command, you'll get an error:

```
[ucsd-train12@tscc-login2 ~]$ conda --help
-bash: conda: command not found
```

To activate `conda`, use `source` on your `.bashrc`:

```
source ~/.bashrc
```

Check your python version:  
```
$ python -V
Python 3.5.1 :: Anaconda 2.4.1 (64-bit)
```

If you do `python -v` (little "v") by accident, exit Python with:

```
>>> quit()
$ 
```

And make sure your Python is pointing to the Anaconda Python:

```
$ which python
/home/ucsd-train##/anaconda3/bin/python
```

## Install some more packages


For `biom262`, we'll want to install these additional packages to Anaconda:
* R
* IRKernel - Use R in Jupyter notebook
* [`seaborn`](http://stanford.edu/~mwaskom/software/seaborn/) (nicer plots in Python)

Create a `biom262`-specific `conda` environment. By specifying these packages, we're also specifying all their dependencies (the other packages that each of these requires)

```
conda install --channel r r r-irkernel seaborn
```

Here's that big command broken down:

* `conda` - the base command (like how `git` was the base command you used for git stuff). Every `conda` subcommand is actually `conda-subcommand` e.g. `conda-create` under the hood, but we use it with just the spaces for convenience.
* `install` - The conda subcommand to create an environment
* `--channel r` - A "channel" is a URL to a folder that contains packages that you can install. Anaconda doesn't come with the R channel by default so we have to specify it here.
* `r r-irkernel seaborn` - The packages to install.
 
The output is quite big, it will look something like this:

```
$ conda install --channel https://conda.anaconda.org/r r r-irkernel seaborn
Warning: could not import binstar_client (invalid token (pkg_resources.py, line 44))Fetching package metadata: ......
Solving package specifications: ...................................................................
Package plan for installation in environment /home/ucsd-train01/anaconda3:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    glib-2.43.0                |                2         7.4 MB  r
    decorator-4.0.6            |           py35_0           6 KB  defaults
    numpy-1.10.2               |           py35_0         5.8 MB  defaults
    pyzmq-15.1.0               |           py35_0         782 KB  defaults
    requests-2.9.0             |           py35_0         647 KB  defaults
    setuptools-19.1.1          |           py35_0         348 KB  defaults
    cairo-1.12.18              |                6         594 KB  defaults
    conda-3.19.0               |           py35_0         180 KB  defaults
    scipy-0.16.1               |      np110py35_0        23.3 MB  defaults
    harfbuzz-0.9.35            |                6         1.1 MB  r
    pango-1.36.8               |                3         796 KB  r
    nbconvert-4.1.0            |           py35_0         275 KB  defaults
    r-base-3.2.2               |                0        20.6 MB  r
    seaborn-0.6.0              |      np110py35_0         257 KB  defaults
    r-3.2.2                    |                0           2 KB  r
    r-base64enc-0.1_3          |         r3.2.2_0          25 KB  r
    r-boot-1.3_17              |         r3.2.2_0         575 KB  r
    r-cluster-2.0.3            |         r3.2.2_0         466 KB  r
    r-codetools-0.2_14         |         r3.2.2_0          45 KB  r
    r-digest-0.6.8             |         r3.2.2_2          93 KB  r
    r-foreign-0.8_66           |         r3.2.2_0         220 KB  r
    r-jsonlite-0.9.17          |         r3.2.2_0         927 KB  r
    r-kernsmooth-2.23_15       |         r3.2.2_0          84 KB  r
    r-lattice-0.20_33          |         r3.2.2_0         698 KB  r
    r-magrittr-1.5             |         r3.2.2_1         154 KB  r
    r-mass-7.3_45              |         r3.2.2_0         1.0 MB  r
    r-nnet-7.3_11              |         r3.2.2_0          99 KB  r
    r-repr-0.3                 |         r3.2.2_0          44 KB  r
    r-rpart-4.1_10             |         r3.2.2_0         861 KB  r
    r-rzmq-0.7.7               |         r3.2.2_3          60 KB  r
    r-spatial-7.3_11           |         r3.2.2_0         122 KB  r
    r-stringi-1.0_1            |         r3.2.2_0        10.7 MB  r
    r-survival-2.38_3          |         r3.2.2_0         4.4 MB  r
    r-uuid-0.1_2               |         r3.2.2_0          18 KB  r
    r-class-7.3_14             |         r3.2.2_0          82 KB  r
    r-irdisplay-0.3            |         r3.2.2_0          23 KB  r
    r-matrix-1.2_2             |         r3.2.2_0         3.1 MB  r
    r-nlme-3.1_122             |         r3.2.2_0         2.0 MB  r
    r-stringr-1.0.0            |         r3.2.2_0          78 KB  r
    r-evaluate-0.8             |         r3.2.2_0          39 KB  r
    r-mgcv-1.8_9               |         r3.2.2_0         1.8 MB  r
    r-irkernel-0.5             |         r3.2.2_1          71 KB  r
    r-recommended-3.2.2        |         r3.2.2_0          707 B  r
    ------------------------------------------------------------
                                           Total:        89.7 MB

The following NEW packages will be INSTALLED:

    cairo:         1.12.18-6          defaults
    glib:          2.43.0-2           r       
    harfbuzz:      0.9.35-6           r       
    libgcc:        4.8.5-1            r       
    ncurses:       5.9-4              r       
    pango:         1.36.8-3           r       
    pcre:          8.31-0             defaults
    pixman:        0.32.6-0           defaults
    r:             3.2.2-0            r       
    r-base:        3.2.2-0            r       
    r-base64enc:   0.1_3-r3.2.2_0     r       
    r-boot:        1.3_17-r3.2.2_0    r       
    r-class:       7.3_14-r3.2.2_0    r       
    r-cluster:     2.0.3-r3.2.2_0     r       
    r-codetools:   0.2_14-r3.2.2_0    r       
    r-digest:      0.6.8-r3.2.2_2     r       
    r-evaluate:    0.8-r3.2.2_0       r       
    r-foreign:     0.8_66-r3.2.2_0    r       
    r-irdisplay:   0.3-r3.2.2_0       r       
    r-irkernel:    0.5-r3.2.2_1       r       
    r-jsonlite:    0.9.17-r3.2.2_0    r       
    r-kernsmooth:  2.23_15-r3.2.2_0   r       
    r-lattice:     0.20_33-r3.2.2_0   r       
    r-magrittr:    1.5-r3.2.2_1       r       
    r-mass:        7.3_45-r3.2.2_0    r       
    r-matrix:      1.2_2-r3.2.2_0     r       
    r-mgcv:        1.8_9-r3.2.2_0     r       
    r-nlme:        3.1_122-r3.2.2_0   r       
    r-nnet:        7.3_11-r3.2.2_0    r       
    r-recommended: 3.2.2-r3.2.2_0     r       
    r-repr:        0.3-r3.2.2_0       r       
    r-rpart:       4.1_10-r3.2.2_0    r       
    r-rzmq:        0.7.7-r3.2.2_3     r       
    r-spatial:     7.3_11-r3.2.2_0    r       
    r-stringi:     1.0_1-r3.2.2_0     r       
    r-stringr:     1.0.0-r3.2.2_0     r       
    r-survival:    2.38_3-r3.2.2_0    r       
    r-uuid:        0.1_2-r3.2.2_0     r       
    seaborn:       0.6.0-np110py35_0  defaults

The following packages will be UPDATED:

    conda:         3.18.8-py35_0      defaults --> 3.19.0-py35_0      defaults
    decorator:     4.0.4-py35_0       defaults --> 4.0.6-py35_0       defaults
    nbconvert:     4.0.0-py35_0       defaults --> 4.1.0-py35_0       defaults
    numpy:         1.10.1-py35_0      defaults --> 1.10.2-py35_0      defaults
    pyzmq:         14.7.0-py35_1      defaults --> 15.1.0-py35_0      defaults
    requests:      2.8.1-py35_0       defaults --> 2.9.0-py35_0       defaults
    scipy:         0.16.0-np110py35_1 defaults --> 0.16.1-np110py35_0 defaults
    setuptools:    18.5-py35_0        defaults --> 19.1.1-py35_0      defaults

Proceed ([y]/n)?
```

Press "y" to proceed
```
Fetching packages ...
r-base64enc-0. 100% |######################################################| Time: 0:00:00 347.71 kB/s
r-digest-0.6.8 100% |######################################################| Time: 0:00:01  81.73 kB/s
r-jsonlite-0.9 100% |######################################################| Time: 0:00:18  52.38 kB/s
r-magrittr-1.5 100% |######################################################| Time: 0:00:00 436.33 kB/s
r-repr-0.3-r3. 100% |######################################################| Time: 0:00:00  75.39 kB/s
r-rzmq-0.7.7-r 100% |######################################################| Time: 0:00:00 294.29 kB/s
r-stringi-1.0_ 100% |######################################################| Time: 0:00:03   3.71 MB/s
r-uuid-0.1_2-r 100% |######################################################| Time: 0:00:00 247.64 kB/s
r-irdisplay-0. 100% |######################################################| Time: 0:00:00 329.61 kB/s
r-stringr-1.0. 100% |######################################################| Time: 0:00:00 366.92 kB/s
r-evaluate-0.8 100% |######################################################| Time: 0:00:00 381.25 kB/s
r-irkernel-0.5 100% |######################################################| Time: 0:00:00 342.46 kB/s
Extracting packages ...
[      COMPLETE      ]|#########################################################################| 100%
Linking packages ...
[      COMPLETE      ]|#########################################################################| 100%
```


## My First Jupyter Notebook

Start jupyter notebook server, where "`####`" is some number larger than 1024 (this is for a unique "port" number - yes like a port for boats and ships - that your notebook will run on). The `&` ("ampersand") at the end is important, because it tells the Jupyter process to run in the background, so we can run other commands on top.

```
$ jupyter notebook --no-browser --port #### &
[1] 12583
(biom262)[ucsd-train12@tscc-login2 ~]$ [I 13:23:05.786 NotebookApp] Writing notebook server cookie secret to /home/ucsd-train12/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:23:06.291 NotebookApp] Serving notebooks from local directory: /home/ucsd-train12
[I 13:23:06.291 NotebookApp] 0 active kernels
[I 13:23:06.291 NotebookApp] The IPython Notebook is running at: http://localhost:7788/
[I 13:23:06.291 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
```

The `[1] 12583` shows you the process number of the `jupyter` process. You can convince yourself that it's really there by using `ps` which shows you all processes:

```
$ ps
  PID TTY          TIME CMD
 6593 pts/44   00:00:00 bash
12583 pts/44   00:00:02 jupyter-noteboo
12592 pts/44   00:00:00 links
14208 pts/44   00:00:00 ps
```

#### Possible errors

##### "The port #### is already in use, try another random port"

This can happen if you ran `jupyter notebook` and killed it but wanted to run it again. This is because even though the `jupyter notebook` process died, there's still a "zombie" (real computer term) process running. To kill it, do:

```
$ ps
  PID TTY          TIME CMD
 6593 pts/44   00:00:00 bash
12583 pts/44   00:00:02 jupyter-noteboo
12592 pts/44   00:00:00 links
14198 pts/44   00:00:01 jupyter-noteboo
14208 pts/44   00:00:00 ps
```

See which processes are associated with `jupyter` and then use `kill -9` to stop them ("kill" stops the process and `-9` means the meanest form like premeditated murder of the program)

```
$ kill -9 12583
$ kill -9 14198
```

##### Big weird screen opens up

If you're getting this screen:

![](images/jupyter_minus_no_browser_flag.png "Jupyter run without `--no-browser` flag")

Then you forgot the `--no-browser` flag. Try again:

```
jupyter notebook --no-browser --port ####
```

##### "No such file or directory"

If you get this error:

```
[ucsd-train21@tscc-login1 ~]$ jupyter notebook —no-browser —port 1224
[C 18:12:26.584 NotebookApp] No such file or directory: /home/ucsd-train21/—no-browser
```

That means that you forgot the extra dashes for `--no-browser` and `--port`. Try again with:

```
jupyter notebook —-no-browser —-port ####
```


##### `NameError: name 'pkg_resources' is not defined`

You may get this error:

```
$ jupyter notebook --no-browser --port 7788 &
[1] 47665
[ucsd-train01@tscc-login1 ~]$ Traceback (most recent call last):
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/path.py", line 122, in <module>
    import pkg_resources
  File "/opt/biotools/bx-python/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 44
    def _bypass_ensure_directory(name, mode=0777):
                                               ^
SyntaxError: invalid token

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ucsd-train01/anaconda3/bin/jupyter-notebook", line 4, in <module>
    from notebook.notebookapp import main
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/notebook/notebookapp.py", line 83, in <module>
    from IPython.paths import get_ipython_dir
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/__init__.py", line 48, in <module>
    from .terminal.embed import embed
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/terminal/embed.py", line 16, in <module>
    from IPython.core.interactiveshell import DummyMod
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 31, in <module>
    from pickleshare import PickleShareDB
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/pickleshare.py", line 41, in <module>
    from path import path as Path
  File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/path.py", line 126, in <module>
    except pkg_resources.DistributionNotFound:
NameError: name 'pkg_resources' is not defined
^C
[1]+  Exit 1                  jupyter notebook --no-browser --port 7788

```

The issue is that there are multiple Pythons around and they don't know how to talk to each other. To get rid of this, do

```
export PYTHONPATH=
```
Which will empty your paths of possible python libraries.

#### Sending the notebooks home


##### Tunneling: Mac/Linux
<font color='DeepPink'>***Note: These commands must be done on your home laptop in another terminal window, NOT on TSCC***</font> 


Now, back on your home laptop, open another tab in your terminal window. To send this notebook back to your laptop from TSCC (aka "tunneling"), use this command (replace `####` and `username` with your own port and username). You will also need to replace `tscc-login#` with either `tscc-login1` or `tscc-login2`, whichever you got randomly assigned to when you logged in.


```
ssh -NL ####:localhost:#### username@tscc-login#.sdsc.edu
```

The first time you connect, you'll get output like this. Say "yes" that you want to continue connecting. Computers are paranoid schizophrenic and freak out any time something new is happening.

```
MacBook:~ benlewis$ ssh -NL 1225:localhost:1225 ucsd-train21@tscc-login1.sdsc.edu
The authenticity of host 'tscc-login1.sdsc.edu (132.249.107.90)' can't be established.
RSA key fingerprint is ee:40:c3:c6:19:03:9d:29:23:e6:ee:82:80:02:87:9b.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'tscc-login1.sdsc.edu' (RSA) to the list of known hosts.
```

##### Possible problem: "`Write failed: Broken pipe`"

If you see a "`Write failed: Broken pipe`" output in your laptop terminal, that means you've lost connection to the server. It's nothing to worry about. **Solution:** Re-run your `ssh` command (either logging in to TSCC or doing the tunneling) and you'll be all set.

###### Possible Problem: "`bind: Address already in use`"

If you try to do the tunneling and you see this kind of output:
```
MacBook:~ benlewis$ ssh -NL 1224:localhost:1224 ucsd-train21@tscc-login2.sdsc.edu
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 1224
Could not request local forwarding
```

That happens when you've run multiple of the `ssh -NL ####:localhost:#### ucsd-train##@tscc-login#.sdsc.edu` commands, and your computer is getting confused because you're telling it to do multiple things with the same port number.

**Solution:** Close the tab and open a new one - start fresh! Try doing the `ssh -NL ...` command again.

##### Tunneling: Windows

To get Jupyter notebook on your computer, you'll need to set up a "SSH Tunnel" that "listens" to that particular port, and thus gets the Jupyter notebook from TSCC. 

###### Step 1: Create a new Putty session

So that you only need to have one Putty session open, we'll make a new TSCC Session. Create one with `ucsd-train##@tscc-login#.sdsc.edu`, and call the session "TSCC Jupyter"

![](images/putty_tscc_jupyter_step1.png)

###### Step 2: Add your private key and allow forwarding

Go to "Connection > SSH > Auth." Click the checkbox next to "Allow agent forwarding" and add the "Putty Private Key" that you created with the `biom262_rsa` file.

![](images/putty_tscc_jupyter_step2.png)

###### Step 3: Add a tunnel

Go to "Connnection > SSH > Tunnels." Then:

1. Click the checkbox next to "Local ports accept connections from other ports"
2. Add your `####` for your source port
3. Add `localhost:####` for your Destination
4. Click "Local"
5. Click "Add"

![](images/putty_tscc_jupyter_step3.png)

You should now see this:

![](images/putty_tscc_jupyter_step3_added.png)

###### Step 4: Save your settings!

So you don't have to do this every time... Save your settings! Go all the way back to the "Session" window and click "Save"

![](images/putty_tscc_jupyter_step4.png)

###### Step 5: Log in! You may see this message below. It's totally normal

![](images/putty_tscc_jupyter_step5.png)

###### Step 6: Run your `jupyter notebook` command

You'll see this kind of output:

![](images/putty_tscc_jupyter_step6.png)


And now you can move on to the next step to view the notebooks! Go to `http://localhost:####` in your browser (Chrome, Firefox, IE)

#### Viewing the notebooks (everyone)

Connect to the jupyter notebook server `http://localhost:####/`.
    
You should see a page that looks like this:

![](images/jupyter_new.png "Jupyter notebook page")

Start a new notebook using the dropdown menu in the top right of the screen:
![New doc image reference](images/newdoc.png "New doc image reference")



