# Jupyter usage

How to efficiently navigate a jupyter notebook, and use it's limited debugger.

In [3]:
import matplotlib.pyplot as plt
import seaborn as sns

import pandas as pd
import numpy as np

## Jupyter Shortcut Commands
The following shortcuts work when in COMMAND mode.  To enter the COMMAND mode press ESC or click anywhere outside the cell

<ul>

<li> <mark>shift-enter #run cell </li>
<li><mark>cmd-m #change cell to markdown</li>
<li><mark>cmd-y #change cell to code</li>
<li><mark>cmd-a #add cell before</li>
<li><mark>cmd-b #add cell after</li>
<li><mark>cmd d,d #delete cell</li>
    <li><mark>cmd x #cut cell</li>
<li><mark>cmd v #paste cell</li>
<li><mark>cmd-o       #hide output, if needed, edit Settings->Advanced Settings->Keyboard Shortcuts- add O and shift-O to User Preferences</li>
<li><mark>cmd shift-L  #toggle line numbers, useful to pinpoint where error occurred</li>

<li><mark>Shift + Up Arrow select the current cell and the cell above</li>
<li><mark>Shift + Down Arrow select the current cell and the cell below</li>
<li><mark>Shift + m merge selected cells
    
</ul>
  

## Magic commands in Jupyter

Magic commads help with analyzing data in Jupyter Notebooks<br>

The following are some magics you may find useful.  For a complete list see 
[Built in  Magic Commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

Note: These are capable of changing the current state of the notebook.  For instance the directory the notebook thinks it's running in.



### Line Magics

<mark>Start with a % and apply to a single line.</mark><br> A useful subset follows.

In [7]:
#print current directory
%pwd

'/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1'

#### A bit about pushd, popd and dirs

Excerpt from [this post](https://unix.stackexchange.com/questions/77077/how-do-i-use-pushd-and-popd-commands)

    pushd . adds current directory XX to dirs stack. Afterwards, you can move around using cd, and to return to XX you just do popd regardless of how "far away" you are in the directory tree (can jump over multiple levels, sideways etc). Especially useful in bash scripts.

In [8]:
# first save current dir
%pushd .

/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1


['~/AA_jupyter_tuts/DATA301_CODE/week_1']

In [4]:
#whats in the saved directories stack
%dirs

['~/AA_jupyter_tuts/DATA301_CODE/week_1']

In [9]:
#then change the notebooks current working directory, NOTE: this is permanent for all cells
%cd ~
%pwd

/home/keith


'/home/keith'

In [11]:
#restore the pushed dir
%popd
%pwd

/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1
popd -> ~/AA_jupyter_tuts/DATA301_CODE/week_1


'/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1'

#### More Line Magics

In [12]:
#measure how long the operation takes, averages over many iterations
def fun1():
    pass
%timeit fun1

8.37 ns ± 0.116 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [15]:
#list of variables defined in this ipython session
%who

#same as above with a bit more info
%whos

fun1	 np	 pd	 plt	 sns	 
Variable   Type        Data/Info
--------------------------------
fun1       function    <function fun1 at 0x7f6ac44840e0>
np         module      <module 'numpy' from '/ho<...>kages/numpy/__init__.py'>
pd         module      <module 'pandas' from '/h<...>ages/pandas/__init__.py'>
plt        module      <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
sns        module      <module 'seaborn' from '/<...>ges/seaborn/__init__.py'>


### Cell Magics
<mark>Start with a %% and apply the to an entire cell.</mark><br>  %% commands must be the first line in the cell<br>

In [16]:
%%time
for i in range(10):
    pass

#measure how long this cell takes to run, just 1 iteration

CPU times: user 12 µs, sys: 0 ns, total: 12 µs
Wall time: 18.6 µs


## Shell commands in Jupyter
You can run shell commands on the local operating system in Jupyter.  For instance to install missing packages or check the current directory or for a 1 time content download. <br>
! calls out to a shell (in a new process) and then returns, while % affects the process associated with the notebook (or the notebook itself; many % commands have no shell counterpart).
!cd foo, by itself, has no lasting effect, since the process with the changed directory immediately terminates.
%cd foo changes the current directory of the notebook process, which is a lasting effect.<br>
<mark>BEWARE- On my local machine, I'm calling out to a linux shell!  These commands should also work on a Mac since the OS is based on a unix kernel. If you are running on Windows however, then these '!' commands I am using will not work.

In [20]:
# install if not there
# once installed its in this kernel forever, so comment out following lines
#since you dont want to install over and over
# !pip install hdbscan 
# !pip install folium -y

Collecting hdbscan
  Downloading hdbscan-0.8.33.tar.gz (5.2 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting cython<3,>=0.27 (from hdbscan)
  Using cached Cython-0.29.37-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.metadata (3.1 kB)
Using cached Cython-0.29.37-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Building wheels for collected packages: hdbscan
  Building wheel for hdbscan (pyproject.toml) ... [?25l-^C
[?25canceled
[31mERROR: Operation cancelled by user[0m[31m
[0m

In [21]:
!pwd

/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1


In [22]:
#executes the shell command but does not change the notebooks directory
#ie still in week_1
!cd 
!pwd

/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1


In [23]:
#you can also access variables defined in your notebook by enclosing them in {}
path='/home/keith/AA_jupyter_tuts/DATA301_CODE/week_1/'
!ls {path}

'1_JupyterUsage (copy).ipynb'   test.ipynb
 1_JupyterUsage.ipynb	        Untitled.ipynb


## Help on packages,functions and classes

once you import a package you can get help on package contents<br>

The following work in jupyter lab<br>
<mark>tab</mark>  offers context appropriate coding suggestions<br>
<mark>shift-tab</mark>  offers api help including signatures and docstrings 

In [14]:
#create a series
ds=pd.  #add a dot to the left and then hit tab to get code suggestions

ds=pd.Series([4,7,-5,3],index=['d','b','a','c']) #hover over pd or Series and hit shift-tab  

you can get lots more information using ? and ??

In [24]:
# get docstring, file location etc.
pd.Series?

[0;31mInit signature:[0m
[0mpd[0m[0;34m.[0m[0mSeries[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m:[0m [0;34m'Dtype | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcopy[0m[0;34m:[0m [0;34m'bool | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfastpath[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'None'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for pe

In [25]:
#get sourcecode
pd.Series??

[0;31mInit signature:[0m
[0mpd[0m[0;34m.[0m[0mSeries[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m:[0m [0;34m'Dtype | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcopy[0m[0;34m:[0m [0;34m'bool | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfastpath[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'None'[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mSeries[0m[0;34m([0m[0mbase[0m[0;34m.[0m[0mIndexOpsMixin[0m[0;34m,[0m [0mNDFrame[0m[0;34m)[0m[0;34m:[0m  [0;31m# type: ignore[misc][0m[0;34m[0m
[0;34m[0m    [0;34m"""[0m
[0

In [28]:
# If you import something, you can see where it is by asking for it's file attribute
import pandas as pd
print(pd.__file__)

import pandas.core.series as s
print(s.__file__)

#this will not work since Series is stored in file series.py in the core dir under pandas
# import pandas.series as s

#? if above does not work then how can s= pd.Series(...) work?  where is the .core?  
# Answer: Look in pandas __init__.py.  It imports Series from core.

/home/keith/anaconda3/envs/temp/lib/python3.11/site-packages/pandas/__init__.py
/home/keith/anaconda3/envs/temp/lib/python3.11/site-packages/pandas/core/series.py


ModuleNotFoundError: No module named 'pandas.series'

In [29]:
#want to know what attributes are available?
dir(pd.Series)

['T',
 '_AXIS_LEN',
 '_AXIS_ORDERS',
 '_AXIS_TO_AXIS_NUMBER',
 '_HANDLED_TYPES',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_ufunc__',
 '__bool__',
 '__class__',
 '__column_consortium_standard__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pandas_priority__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__

## Debugging
Debugging is primitive in a Jupyter notebook, but it's being activly worked on and changing rapidly.  It will get better but it's no substitute for an IDE.<br> To enable, select the ipykernel for your notebook. (ipykernel – uses default python,Or python in virtual environment that Jupyter lab launched from). Kernel you are using is in the upper right of this window. Highlight to choose<br>
To debug;<br>
turn debugging on - bug in upper right of this window<br>
set a breakpoint - in trough to the left of statement<br>
and execute cell (ctrl- enter)<br>

<mark>Demo this now</mark>

In [3]:
def func():
    i=3
    for j in range(3):
        print(j)
func()
kk=3

0
1
2


In [4]:
!python -V #same as !python --version
!which python

Python 3.11.5
/home/keith/anaconda3/envs/p311/bin/python


In [5]:
#when debugging enabled, notice that you can see all defined 
#variables and their values in the Variables window to the right
print(kk)

3


In [21]:
#also be careful when importing, you can easily polute your symbol space
#if you execute the following line, all numpy's exported variables will be 
#imported and appear in the debuggers VARIABLES view 
from numpy import *

### Using %debug for post mortem failure analysis

In [4]:
import pandas as pd

#create columns of unequal lengths
data = {'Column1': [1, 2, 3], 'Column2': [4, 5], 'Column3': [6, 7, 8, 9]}

#so when we create a dataframe we crash
df = pd.DataFrame(data)


ValueError: All arrays must be of the same length

In [None]:
%debug

> [0;32m/home/keith/anaconda3/envs/p311/lib/python3.11/site-packages/pandas/core/internals/construction.py[0m(677)[0;36m_extract_index[0;34m()[0m
[0;32m    675 [0;31m        [0mlengths[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mraw_lengths[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    676 [0;31m        [0;32mif[0m [0mlen[0m[0;34m([0m[0mlengths[0m[0;34m)[0m [0;34m>[0m [0;36m1[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 677 [0;31m            [0;32mraise[0m [0mValueError[0m[0;34m([0m[0;34m"All arrays must be of the same length"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    678 [0;31m[0;34m[0m[0m
[0m[0;32m    679 [0;31m        [0;32mif[0m [0mhave_dicts[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  u


> [0;32m/home/keith/anaconda3/envs/p311/lib/python3.11/site-packages/pandas/core/internals/construction.py[0m(114)[0;36marrays_to_mgr[0;34m()[0m
[0;32m    112 [0;31m        [0;31m# figure out the index, if necessary[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    113 [0;31m        [0;32mif[0m [0mindex[0m [0;32mis[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 114 [0;31m            [0mindex[0m [0;34m=[0m [0m_extract_index[0m[0;34m([0m[0marrays[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    115 [0;31m        [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    116 [0;31m            [0mindex[0m [0;34m=[0m [0mensure_index[0m[0;34m([0m[0mindex[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  d


> [0;32m/home/keith/anaconda3/envs/p311/lib/python3.11/site-packages/pandas/core/internals/construction.py[0m(677)[0;36m_extract_index[0;34m()[0m
[0;32m    675 [0;31m        [0mlengths[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mset[0m[0;34m([0m[0mraw_lengths[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    676 [0;31m        [0;32mif[0m [0mlen[0m[0;34m([0m[0mlengths[0m[0;34m)[0m [0;34m>[0m [0;36m1[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 677 [0;31m            [0;32mraise[0m [0mValueError[0m[0;34m([0m[0;34m"All arrays must be of the same length"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    678 [0;31m[0;34m[0m[0m
[0m[0;32m    679 [0;31m        [0;32mif[0m [0mhave_dicts[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  p raw_lengths


[3, 2, 4]


u