<img src="http://www.nersc.gov/assets/Uploads/n-logo.png" width="100" align="right">

Jupyter Notebooks at NERSC (https://jupyter-dev.nersc.gov/)
===========================================================

There are currently 2 Jupyter notebook services running at NERSC.  One is running on a [Science Gateway Node](https://jupyter.nersc.gov) and is more of a production service.  The other is running on a dedicated data analytics node on [Cori](https://jupyter-dev.nersc.gov) and is more experimental (but open to users).  The second set up provides several advantages over the first.

It's Really Running on Cori
===========================

Running `socket.gethostname()` on the SGN Jupyter service just returns a docker container ID, but here you can see it's actually on a Cori node dedicated to Jupyter: 

In [None]:
import socket
socket.gethostname()

... and there are 32 (Haswell) CPUs:

In [None]:
import psutil
psutil.cpu_count()

... and there is 500+ GB of memory available, much more than is available on the SGN Jupyter service: 

In [None]:
psutil.virtual_memory().total / 2**30

Cori Scratch Access
===================

Access to Cori `$SCRATCH` is not possible from old Jupyter, but using Jupyter on Cori it is.  This is highly useful for people who need to analyze or visualize data stored on that file system.

In [None]:
import os
os.environ.get("SCRATCH")

It's the Same Python Environment as at Cori Login
=================================================

That is, you get the same Python as when you log in and do `module load python` or `module load python/3.5-anaconda` and not a different Python stack on some other system.  This is helpful if you use both modes of interacting with Cori via Python.

In [None]:
import sys
sys.executable

Access to the Cori Batch Queues
===============================

Jupyter supports "%magic" commands that expose functionality in code cells beyond the language kernel.  At NERSC we have set up [Slurm magic](https://github.com/NERSC/slurm-magic) commands to expose the Cori batch queue through Jupyter.  This is an admittedly limited but still useful mechanism for interacting with batch jobs.

Slurm magic commands simply wrapper Slurm command line functionality, so most Slurm commands you use at the command line you just prefix with a "%" like so.  To see what's in the queue:

In [None]:
%squeue

That might look different from _your_ `squeue` output.  Note that your `SQUEUE_FORMAT` environment variable is respected --- Jupyter on Cori observes your shell dotfile login setup:

In [None]:
import os
os.environ.get("SQUEUE_FORMAT", "not defined")

Back to `squeue` though.  By default the output of this command is placed into a [Pandas](http://pandas.pydata.org/) dataframe object.  (Observe that you can also capture the result of a magic command as if it were just a Python function too.)

In [None]:
df = %squeue
type(df)

It's a Pandas dataframe.  So if you get bored waiting for _your_ job to run, you can do some big data crunching on the Cori batch queue data set.

In [None]:
df.groupby("PARTITION")["JOBID"].count().to_frame()

You Can Submit Jobs from Jupyter
================================

The `sbatch` command works two ways in Jupyter on Cori, either as a *line magic* or a *cell magic*.  The former would be useful for submitting existing batch scripts.  The latter lets you put the batch script _inside your Jupyter notebook._  Here's a trivial example to show we can run on 2 nodes:

In [None]:
%%sbatch -p debug -t 10 -N 2 -C haswell
#!/bin/bash
srun -n 16 hostname

... little parsing of that response from sbatch to get the Job ID ...

In [None]:
jobid = _.split()[-1]
jobid

In [None]:
%squeue -j $jobid

If I was fast enough the above would be a data frame, and if the queue is not too slow the job should be running.  Below we look at the Slurm job output and should see 2 different compute node hostnames 8 times each.

In [None]:
with open("slurm-{}.out".format(jobid), "r") as stream:
    print(stream.read())

... But Wait There's More
=========================

See demonstrations and notebooks from other speakers here, including machine learning notebooks from Evan Racah (next speaker) and Lisa Gerhardt (Spark).