![PyData_logo](./static/pydata-logo-madrid-2016.png)

# Remove Before Flight
## Analyzing Flight Safety Data with Python

###### Siro Moreno Martín
###### Alejandro Sáez Mollejo

### 0. Introduction

#### Python in the Scientific environment  

##### Principal Python Packages for scientific purposes 

##### Anaconda & conda

![conda](./static/conda.png)

http://conda.pydata.org/docs/intro.html

Conda is a package manager application that quickly installs, runs, and updates packages and their dependencies. The conda command is the primary interface for managing installations of various packages. It can query and search the package index and current installation, create new environments, and install and update packages into existing conda environments. 

In [2]:
from IPython.display import HTML
HTML('<iframe src="http://conda.pydata.org/docs/_downloads/conda-cheatsheet.pdf" width="700" height="400"></iframe>')

##### Main objectives of this workshop 

* Provide you with a first insight into the principal Python tools & libraries used in Science:
    - conda.
    - Jupyter Notebook.
    - NumPy, matplotlib, SciPy
* Provide you with the basic skills to face basic tasks such as:
    - 
* Show other common libraries:
    - Pandas, scikit-learn (some talks & workshops will focus on these packages)
    - SymPy
    - Numba ¿?

### 1. Jupyter Notebook

![jupyter](./static/jupyter-logo.png)

### 2. Using arrays: NumPy 

![numpy-logo](./static/numpy.png)

#### ndarray object 

| index     | 0     | 1     | 2     | 3     | ...   | n-1   | n  |
| ---------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| value      | 2.1   | 3.6   | 7.8   | 1.5   | ...   | 5.4   | 6.3 |

* N-dimensional data structure.
* Homogeneously typed.
* Efficient!

A universal function (or ufunc for short) is a function that operates on ndarrays. It is a “vectorized function".

In [3]:
import numpy as np

In [6]:
my_list  = list(range(0,100000))
%timeit sum(my_list)

1000 loops, best of 3: 1.54 ms per loop


In [7]:
array = np.arange(0, 100000)
%timeit np.sum(array)

The slowest run took 6.73 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 95.7 µs per loop


#### Array creation

In [11]:
one_dim_array = np.array([1, 2, 3, 4])
one_dim_array

array([1, 2, 3, 4])

In [10]:
two_dim_array = np.array([[1, 2, 3],
                                           [4, 5, 6],
                                           [7, 8, 9]])
two_dim_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [12]:
two_dim_array.size

9

In [13]:
two_dim_array.shape

(3, 3)

In [14]:
two_dim_array.dtype

dtype('int64')

In [18]:
zeros_arr = np.zeros([3, 3])
ones_arr = np.ones([10])
eye_arr = np.eye(5)

In [19]:
range_arr = np.arange(15)
range_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [20]:
range_arr.reshape([3, 5])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [21]:
np.linspace(0, 10, 21)

array([  0. ,   0.5,   1. ,   1.5,   2. ,   2.5,   3. ,   3.5,   4. ,
         4.5,   5. ,   5.5,   6. ,   6.5,   7. ,   7.5,   8. ,   8.5,
         9. ,   9.5,  10. ])

#### Basic slicing 

In [37]:
one_dim_array[0]

2.5

In [38]:
two_dim_array[-1, -1]

91

`[start:stop:step]`

In [40]:
my_arr = np.arange(100)
my_arr[0::2]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

In [36]:
chess_board = np.zeros([8, 8], dtype=int)

chess_board[0::2, 1::2] = 1
chess_board[1::2, 0::2] = 1

chess_board

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]])

#### Operations & linalg 

In [23]:
x = np.linspace(0, 10)
y = np.sin(x)

In [25]:
y_2 = (1 + np.log(x)) ** 2

  if __name__ == '__main__':


In [29]:
two_dim_array = np.array([[10, 25, 33],
                                           [40, 25, 16],
                                           [77, 68, 91]])

two_dim_array.T

array([[10, 40, 77],
       [25, 25, 68],
       [33, 16, 91]])

In [34]:
two_dim_array @ two_dim_array

array([[ 3641,  3119,  3733],
       [ 2632,  2713,  3176],
       [10497,  9813, 11910]])

In [35]:
one_dim_array = np.array([2.5, 3.6, 3.8])

two_dim_array @ one_dim_array

array([ 240.4,  250.8,  783.1])

In [27]:
np.linalg.inv(two_dim_array)

array([[-0.05372256,  0.00140303,  0.01923512],
       [ 0.10898393,  0.07381761, -0.05250057],
       [-0.03598099, -0.05634759,  0.03394433]])

In [28]:
np.linalg.eig(two_dim_array)

(array([ 133.6946629,  -17.266221 ,    9.5715581]),
 array([[-0.29580975, -0.74274264,  0.0661375 ],
        [-0.24477775,  0.65983255, -0.79576005],
        [-0.92335283,  0.1138173 ,  0.60198985]]))

### 3. Graphical Representation: matplotlib

![matplotlib](./static/matplotlib.png)

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

#### Other options... 



### 4. Scientific functions: SciPy

![matplotlib](./static/scipy_logo.png)

### 5. Other packages

#### Symbolic calculations with SymPy 

![sympy](./static/sympy.png)

#### Data Analysis with pandas 

![pandas](./static/pandas_logo.png)

#### Machine Learning with scikit-learn 

![scikit-learn](./static/scikit-learn-logo.png)

##### A world of possibilities... 

![scikit-learn](./static/cheatsheet-scikit-learn.png)

### Conclusions 

# Thanks for yor attention! 

![PyData_logo](./static/pydata-logo-madrid-2016.png)

## Any Questions?


---


In [1]:
# Notebook style
from IPython.core.display import HTML
css_file = './static/style.css'
HTML(open(css_file, "r").read())