<img src="../assets/packt-banner.png" alt="">

# Chapter 1: Introduction to Jupyter Notebooks

We look at core Jupyter features to use in your Notebooks, along with additional functionality that you may find useful and Python libraries we'll be using in this book.

---

## Basic functionality and features

Examples of useful notebook features such as getting help, using tab, and Jupyter's magic functions.

---

### Jupyter Features    

---

#### Basic Keyboard Shortcuts
- `Shift + Enter` to run cell
- `Escape` to leave cell
- `a` to add a cell above
- `b` to add a cell below
- `dd` to delete a cell
- `m` to change cell to Markdown (after pressing escape)
- `y` to change cell to Code (after pressing escape)
- Arrow keys move cells (after pressing escape)
- `Enter` to enter cell   


---

#### Getting Help
- add question mark to end of object

In [1]:
# Get the numpy arange docstring
import numpy as np
np.arange?

In [2]:
# Get the python sort function docstring
sorted?

In [3]:
# See what functions are available for an object
a = np.array([1, 2, 3])
a.*?

---

#### Tab Completion

Example of Jupyter tab completion include:
- listing available modules on import   
`import <tab>`   
`from numpy import <tab>`
- listing available modules after import         
`np.<tab>`   
- function completion    
`np.ar<tab>`   
`sor<tab>([2, 3, 1])`   
- variable completion    
`myvar_1 = 5`   
`myvar_2 = 6`   
`my<tab>`   
- listing relative path directory contents   
`../<tab>`   
(then press enter on a folder and tab again to show its contents)

---

#### Jupyter Magic Functions
List of the available magic commands:

In [4]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

---

In [5]:
# Display plots inline
%matplotlib inline

---
**Timers**

In [6]:
a = [1, 2, 3, 4, 5] * int(1e5)

In [7]:
%%time
# Get runtime for the entire cell

for i in range(len(a)):
    a[i] += 5

CPU times: user 76.1 ms, sys: 2.31 ms, total: 78.4 ms
Wall time: 91.8 ms


In [8]:
# Get runtime for one line
%time a = [_a + 5 for _a in a]

CPU times: user 21 ms, sys: 15.4 ms, total: 36.4 ms
Wall time: 34.6 ms


In [9]:
# Average results of many runs
%timeit set(a)

4.4 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


---

**Using bash in the notebook**

*Note*: this section may not work on Windows machines unless you have configured your shell environment in Jupyter. You need not worry, as there's rarely any need to use bash in the Notebook. Feel free to skip over this section.

Alternatively, you can try to fix the issue using one of the following methods:

 - Open up [Anaconda Prompt](https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-anaconda-prompt), then navigate to the source code and start `jupyter notebook`
 - Open up [git bash](https://github.com/jupyter/help/issues/181#issuecomment-341059517), then navigate to the source code and start `jupyter notebook`
 - Try to point Jupyter towards the git shell by [adding the following code](https://medium.com/@konpat/using-git-bash-in-jupyter-noteobok-on-windows-c88d2c3c7b07) (or similar for your system) to `~/.jupyter/jupyter_notebook_config.py`:

```
c.NotebookApp.terminado_settings = {
    'shell_command': ['C:\\Program Files\\Git\\bin\\bash.exe']
}

```

In [10]:
%%bash

echo "using bash from inside Jupyter!" > test-file.txt
ls
echo ""
cat test-file.txt
rm test-file.txt

chapter_1_workbook.ipynb
test-file.txt
test.py

using bash from inside Jupyter!


In [11]:
%ls

chapter_1_workbook.ipynb  test.py


In [12]:
%%bash
pwd

/home/jovyan/chapter-01


---
**External magic functions**   
Note: these can be installed with pip by doing "pip install package_name"

- *ipython-sql* enables SQL code cells

Source: https://github.com/catherinedevlin/ipython-sql

Install with:
```
pip install ipython-sql
```

In [13]:
%load_ext sql

In [14]:
%%sql sqlite://

SELECT *
FROM (
    SELECT 'Hello' as msg1, 'World!' as msg2
);

Done.


msg1,msg2
Hello,World!


---
- *watermark* helps document python library versions for reproducability

Source: https://github.com/rasbt/watermark

Install with:
```
pip install watermark
```

In [15]:
%load_ext watermark
%watermark?

In [16]:
%watermark -d -v -m -p requests,numpy,pandas,matplotlib,seaborn,sklearn

2020-07-15 

CPython 3.7.6
IPython 7.10.1

requests 2.22.0
numpy 1.17.4
pandas 0.25.3
matplotlib 3.1.1
seaborn 0.9.0
sklearn 0.22

compiler   : GCC 7.5.0
system     : Linux
release    : 4.19.104+
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit


---

### Activity: Using Jupyter to learn about Pandas DataFrames

_Note: If desired, the following code can be removed from the student version of the notebook and replaced with empty cells._

---

In [17]:
# Load the pandas library

import pandas as pd

In [18]:
# Pull up the help docstring for a pandas DataFrame

pd.DataFrame?

In [19]:
# Use a dictionary to create a DataFrame with "fruit" and "score" columns

fruit_scores = {
    'fruit': ['apple', 'orange', 'banana', 'blueberry'],
    'score': [4, 2, 9, 8],
}
df = pd.DataFrame(data=fruit_scores)

In [20]:
# Display the DataFrame

df

Unnamed: 0,fruit,score
0,apple,4
1,orange,2
2,banana,9
3,blueberry,8


In [21]:
# Use tab completion to pull up a list of functions for df
# df.<tab>



In [22]:
# Pull up the docstring for the sort_values DataFrame function

df.sort_values?

In [23]:
# Sort the DataFrame by score in descending order

df.sort_values(by='score', ascending=False)

Unnamed: 0,fruit,score
2,banana,9
3,blueberry,8
0,apple,4
1,orange,2


In [24]:
# Use the timeit magic function to test how long sorting takes

%timeit df.sort_values(by='score', ascending=False)

416 µs ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


---