<img src="../assets/packt-banner.png" alt="">

# Chapter 1: Introduction to Jupyter Notebooks

We look at core Jupyter features to use in your Notebooks, along with additional functionality that you may find useful and Python libraries we'll be using in this book.

---

## Basic functionality and features

Examples of useful notebook features such as getting help, using tab, and Jupyter's magic functions.

---

### Jupyter Features    

---

#### Basic Keyboard Shortcuts
- `Shift + Enter` to run cell
- `Escape` to leave cell
- `a` to add a cell above
- `b` to add a cell below
- `dd` to delete a cell
- `m` to change cell to Markdown (after pressing escape)
- `y` to change cell to Code (after pressing escape)
- Arrow keys move cells (after pressing escape)
- `Enter` to enter cell   


---

#### Getting Help
- add question mark to end of object

In [1]:
# Get the numpy arange docstring
import numpy as np
np.arange?

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range` function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use `numpy.linspace` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values,

In [2]:
# Get the python sort function docstring
sorted?

[0;31mSignature:[0m [0msorted[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a new list containing all items from the iterable in ascending order.

A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
[0;31mType:[0m      builtin_function_or_method


In [3]:
# See what functions are available for an object
a = np.array([1, 2, 3])
a.*?

a.T
a.__abs__
a.__add__
a.__and__
a.__array__
a.__array_finalize__
a.__array_function__
a.__array_interface__
a.__array_prepare__
a.__array_priority__
a.__array_struct__
a.__array_ufunc__
a.__array_wrap__
a.__bool__
a.__class__
a.__complex__
a.__contains__
a.__copy__
a.__deepcopy__
a.__delattr__
a.__delitem__
a.__dir__
a.__divmod__
a.__doc__
a.__eq__
a.__float__
a.__floordiv__
a.__format__
a.__ge__
a.__getattribute__
a.__getitem__
a.__gt__
a.__hash__
a.__iadd__
a.__iand__
a.__ifloordiv__
a.__ilshift__
a.__imatmul__
a.__imod__
a.__imul__
a.__index__
a.__init__
a.__init_subclass__
a.__int__
a.__invert__
a.__ior__
a.__ipow__
a.__irshift__
a.__isub__
a.__iter__
a.__itruediv__
a.__ixor__
a.__le__
a.__len__
a.__lshift__
a.__lt__
a.__matmul__
a.__mod__
a.__mul__
a.__ne__
a.__neg__
a.__new__
a.__or__
a.__pos__
a.__pow__
a.__radd__
a.__rand__
a.__rdivmod__
a.__reduce__
a.__reduce_ex__
a.__repr__
a.__rfloordiv__
a.__rlshift__
a.__rmatmul__
a.__rmod__
a.__rmul__
a.__ror__
a.__rpow__
a.__rrshift__

---

#### Tab Completion

Example of Jupyter tab completion include:
- listing available modules on import   
`import <tab>`   
`from numpy import <tab>`
- listing available modules after import         
`np.<tab>`   
- function completion    
`np.ar<tab>`   
`sor<tab>([2, 3, 1])`   
- variable completion    
`myvar_1 = 5`   
`myvar_2 = 6`   
`my<tab>`   
- listing relative path directory contents   
`../<tab>`   
(then press enter on a folder and tab again to show its contents)

---

#### Jupyter Magic Functions
List of the available magic commands:

In [4]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

---

In [5]:
# Display plots inline
%matplotlib inline

---
**Timers**

In [6]:
a = [1, 2, 3, 4, 5] * int(1e5)

In [7]:
%%time
# Get runtime for the entire cell

for i in range(len(a)):
    a[i] += 5

CPU times: user 68.8 ms, sys: 2.04 ms, total: 70.8 ms
Wall time: 69.6 ms


In [8]:
# Get runtime for one line
%time a = [_a + 5 for _a in a]

CPU times: user 21.1 ms, sys: 2.6 ms, total: 23.7 ms
Wall time: 23.1 ms


In [9]:
# Average results of many runs
%timeit set(a)

4.72 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


---

**Using bash in the notebook**

*Note*: this section may not work on Windows machines unless you have configured your shell environment in Jupyter. You need not worry, as there's rarely any need to use bash in the Notebook. Feel free to skip over this section.

Alternatively, you can try to fix the issue using one of the following methods:

 - Open up [Anaconda Prompt](https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-anaconda-prompt), then navigate to the source code and start `jupyter notebook`
 - Open up [git bash](https://github.com/jupyter/help/issues/181#issuecomment-341059517), then navigate to the source code and start `jupyter notebook`
 - Try to point Jupyter towards the git shell by [adding the following code](https://medium.com/@konpat/using-git-bash-in-jupyter-noteobok-on-windows-c88d2c3c7b07) (or similar for your system) to `~/.jupyter/jupyter_notebook_config.py`:

```
c.NotebookApp.terminado_settings = {
    'shell_command': ['C:\\Program Files\\Git\\bin\\bash.exe']
}

```

In [10]:
%%bash

echo "using bash from inside Jupyter!" > test-file.txt
ls
echo ""
cat test-file.txt
rm test-file.txt

chapter_2_workbook.ipynb
test-file.txt

using bash from inside Jupyter!


In [11]:
%ls

chapter_2_workbook.ipynb


In [12]:
%%bash
pwd

/Users/alex/Documents/Applied-Data-Science-with-Python-and-Jupyter/chapter-2


---
**External magic functions**   
Note: these can be installed with pip by doing "pip install package_name"

- *ipython-sql* enables SQL code cells

Source: https://github.com/catherinedevlin/ipython-sql

Install with:
```
pip install ipython-sql
```

In [13]:
%load_ext sql

In [14]:
%%sql sqlite://

SELECT *
FROM (
    SELECT 'Hello' as msg1, 'World!' as msg2
);

Done.


msg1,msg2
Hello,World!


---
- *watermark* helps document python library versions for reproducability

Source: https://github.com/rasbt/watermark

Install with:
```
pip install watermark
```

In [15]:
%load_ext watermark
%watermark?

[0;31mDocstring:[0m
::

  %watermark [-a AUTHOR] [-d] [-n] [-t] [-i] [-z] [-u] [-c CUSTOM_TIME]
                 [-v] [-p PACKAGES] [-h] [-m] [-g] [-r] [-b] [-w] [-iv]

IPython magic function to print date/time stamps
and various system information.

optional arguments:
  -a AUTHOR, --author AUTHOR
                        prints author name
  -d, --date            prints current date as YYYY-mm-dd
  -n, --datename        prints date with abbrv. day and month names
  -t, --time            prints current time as HH-MM-SS
  -i, --iso8601         prints the combined date and time including the time
                        zone in the ISO 8601 standard with UTC offset
  -z, --timezone        appends the local time zone
  -u, --updated         appends a string "Last updated: "
  -c CUSTOM_TIME, --custom_time CUSTOM_TIME
                        prints a valid strftime() string
  -v, --python          prints Python and IPython version
  -p PACKAGES, --packages PACKAGES
                      

In [16]:
%watermark -d -v -m -p requests,numpy,pandas,matplotlib,seaborn,sklearn

2020-02-09 

CPython 3.7.5
IPython 7.10.1

requests 2.22.0
numpy 1.17.4
pandas 0.25.3
matplotlib 3.1.1
seaborn 0.9.0
sklearn 0.21.3

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 18.7.0
machine    : x86_64
processor  : i386
CPU cores  : 8
interpreter: 64bit


---

### Activity: Using Jupyter to learn about Pandas DataFrames

_Note: If desired, the following code can be removed from the student version of the notebook and replaced with empty cells._

---

In [17]:
# Load the pandas library

import pandas as pd

In [18]:
# Pull up the help docstring for a pandas DataFrame

pd.DataFrame?

[0;31mInit signature:[0m [0mpd[0m[0;34m.[0m[0mDataFrame[0m[0;34m([0m[0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mcolumns[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mcopy[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). Arithmetic operations
align on both row and column labels. Can be thought of as a dict-like
container for Series objects. The primary pandas data structure.

Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, or list-like objects

    .. versionchanged :: 0.23.0
       If data is a dict, column order follows insertion-order for
       Python 3.6 and later.

    .. versionchanged :: 0.25.0
  

In [19]:
# Use a dictionary to create a DataFrame with "fruit" and "score" columns

fruit_scores = {
    'fruit': ['apple', 'orange', 'banana', 'blueberry'],
    'score': [4, 2, 9, 8],
}
df = pd.DataFrame(data=fruit_scores)

In [20]:
# Display the DataFrame

df

Unnamed: 0,fruit,score
0,apple,4
1,orange,2
2,banana,9
3,blueberry,8


In [None]:
# Use tab completion to pull up a list of functions for df
# df.<tab>

df.

In [22]:
# Pull up the docstring for the sort_values DataFrame function

df.sort_values?

[0;31mSignature:[0m
[0mdf[0m[0;34m.[0m[0msort_values[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mby[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mascending[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minplace[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkind[0m[0;34m=[0m[0;34m'quicksort'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mna_position[0m[0;34m=[0m[0;34m'last'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Sort by the values along either axis.

Parameters
----------
        by : str or list of str
            Name or list of names to sort by.

            - if `axis` is 0 or `'index'` then `by` may contain index
              levels and/or column labels
            - if `axis` is 1 or `'columns'` then `by` may contain column
              levels and/or index labels

            .

In [23]:
# Sort the DataFrame by score in descending order

df.sort_values(by='score', ascending=False)

Unnamed: 0,fruit,score
2,banana,9
3,blueberry,8
0,apple,4
1,orange,2


In [24]:
# Use the timeit magic function to test how long sorting takes

%timeit df.sort_values(by='score', ascending=False)

349 µs ± 6.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


---