## Exploratory work with IPython Notebook and pandas

### IPython Notebooks

* Web based interactive computing environment
* Provide a browser based REPL (Read-eval-print loop)
* Notebook connects to an IPython server (backend is pluggable - see Project Jupyter)
* Export functionality (eg these slides)
* Not for engineering
* IPython Notebook docs: http://ipython.org/ipython-doc/3/notebook/notebook.html
* Project Jupyter: https://jupyter.org/
* Indepth tutorial: http://ipython.org/notebook.html#scipy-2013

####  Notebook good practice

* Notebooks should be re-runnable
* Notebooks should be kept under version control
* Self-documenting (read like a report)
* Make code modular (define functions outside, import as needed)
* Setting up a server: http://ipython.org/ipython-doc/3/notebook/public_server.html
* Security (executing code in a browser): http://ipython.org/ipython-doc/3/notebook/security.html

#### A few easy wins

* Run your iPython server from a virtual env
* User agnostic db access (eg .my.cnf)
* Public data (/var/data/client/... not /home/alex/stuff/faff/data/big.csv)
* Consider clearing output before committing

#### Magics

Mini command language inside IPython

https://ipython.org/ipython-doc/dev/interactive/magics.html

Some common magics within IPython Notebooks are:
* %lsmagic - list currently available magics
* %matplotlib inline - inline backend
* %env - manage environment variables



### NumPy

* 'Numerical Python'
* Key datatype is the ```ndarray```
* Fast and broadcast operations
* Linear algebra
* Integrate with C, C++, Fortran
* And lots more


In [1]:
import numpy as np

data = [1, 2, 3, 4]
array = np.array(data)
array.shape

(4,)

In [2]:
np.max(array)

4

In [3]:
np.ones((10, 10))

array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]])

In [4]:
array

array([1, 2, 3, 4])

In [5]:
array[1:]

array([2, 3, 4])

In [6]:
np.sqrt(array)

array([ 1.        ,  1.41421356,  1.73205081,  2.        ])

In [7]:
np.mean(array)

2.5

Also:
* Linear algebra (see ```np.linalg``` package)
* Set logic
* Sorting
* Input/output

### pandas

* Built on top of NumPy (so plays well with NumPy based libraries eg scikit learn)
* Two key data structures are ```Series``` and ```DataFrame```
* ```Series``` is a one dimensional array with an ```index```
* ```DataFrame``` is a tabular/spreadsheet with ordered collection of columns (can be thought of as a ```dict``` of ```Series```)
* Functionality for selecting, filtering, broadcast operations, time series, plotting and lots more


In [8]:
from pandas import Series, DataFrame
import pandas as pd

s = Series(np.random.rand(10))
s

0    0.038249
1    0.906241
2    0.655702
3    0.179862
4    0.911835
5    0.647058
6    0.513898
7    0.097437
8    0.036378
9    0.731498
dtype: float64

In [9]:
s.index

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [10]:
s[s > 0.5]

1    0.906241
2    0.655702
4    0.911835
5    0.647058
6    0.513898
9    0.731498
dtype: float64

In [11]:
df = DataFrame({'a':np.random.rand(5), 'b':np.random.rand(5)})
df

Unnamed: 0,a,b
0,0.842936,0.896743
1,0.610544,0.459129
2,0.975861,0.551429
3,0.464602,0.611888
4,0.548805,0.603664


Many constructors:
* ndarray
* dict of arrays
* dict of Series
* dict of dicts
* List of dicts 
* List of lists or tuples
* Another DataFrame

In [12]:
df.values

array([[ 0.84293596,  0.89674254],
       [ 0.61054449,  0.45912941],
       [ 0.975861  ,  0.5514288 ],
       [ 0.46460221,  0.61188816],
       [ 0.54880518,  0.60366386]])

In [13]:
type(df.values)

numpy.ndarray