# Day 1. Statistical and Machine Learning Packages

Welcome back!  Before jumping into today, let's review what we covered yesterday and answer any of your questions.

## 1.1 Recap

Python values come in several types:

In [1]:
a = 1                 # int
b = 5.4               # float
c = "Hello"           # str
d = True              # bool
e = [1,2,3]           # list
f = (5,6)             # tuple
g = {'a': 1, 'b': 2}  # dict

Arithmetic operations work as you expect:

In [2]:
print(100 * (1.07**7 - 1))  # Interest on €100 over 7 years at 7%

60.5781476478


Lists can be accessed and modified in various ways:

In [3]:
fruits = ['apple', 'orange', 'pear']
print(fruits[1])    # 2nd item (0, 1, 2, ...)
print(fruits[0:2])  # Slice from 0 (inclusive) to 2 (exclusive)
print(fruits[:2])   # First 2 items
print(fruits[-1])   # Last item
print(fruits[-2:])  # Last 2 items

orange
['apple', 'orange']
['apple', 'orange']
pear
['orange', 'pear']


In [4]:
fruits.append('lemon')  # Add
del fruits[0]           # Remove
fruits[0:2] = ['banana', 'kiwi', 'grape']  # Slice replacement

List comprehensions are a powerful way to build one list from another

In [5]:
print([i*i for i in range(1,10+1)])  # First 10 squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


In [6]:
# Ways to express 24 as a product of 2 positive integers
n = 24
print([(a,b) for a in range(1,n+1) for b in range(a, n+1) if a*b == n])

[(1, 24), (2, 12), (3, 8), (4, 6)]


Tuples give you lots of idiomatic ways of dealing with compound data:

In [7]:
ages = {'John': 30, 'Jane': 25, 'Jack': 10, 'Jill': 42}
for name, age in sorted(ages.items()):
    print("{0} is {1} years old".format(name, age))

Jack is 10 years old
Jane is 25 years old
Jill is 42 years old
John is 30 years old


In [8]:
for i, fruit in enumerate(fruits):
    print("Fruit number {0} is '{1}'".format(i, fruit))

Fruit number 0 is 'banana'
Fruit number 1 is 'kiwi'
Fruit number 2 is 'grape'
Fruit number 3 is 'lemon'


Functions allow you to build complex behaviours from simpler parts:

In [9]:
# Breaking up the above loop over a dictionary into smaller pieces
# (this is a bit artificial: normally, only break up pieces of code
# that are conceptually separate or more than about 10 lines long)

def format_output(name, age):
    return "{0} is {1} years old".format(name, age)

def print_all_ages(ages):
    for name, age in sorted(ages.items()):
        print(format_output(name, age))
        
def make_ages_dict():
    return {
        'John': 30,
        'Jane': 25,
        'Jack': 10,
        'Jill': 42
    }

def main():
    print_all_ages(make_ages_dict())
    
main()

Jack is 10 years old
Jane is 25 years old
Jill is 42 years old
John is 30 years old


Related definitions can be packaged into a module (also called a library).  You can then import these modules in other code.

In [10]:
from math import pi, e, sin
import re

print(sin(1.5*pi)+e)
m = re.match(r'(\d{4})-(\d{2})-(\d{2})', '2015-12-06')
print(m.groups())

1.71828182846
('2015', '12', '06')


---
### Question time!
---

### Plan for today

Today, we'll discuss several powerful statistical and machine learning libraries in Python.  This will be a very hands-on introduction and we will not dive into the mathematical theories behind them.

After today, you should be able to:

* Import and export data in csv
* Use numpy/scipy to perform mathematical computations
* Slice and dice data
* Use pandas to wrangle data
* Plot data and perform exploratory analysis
* Use `scikit-learn`
* Perform regression analysis in Python
* Perform classification analysis in Python
* Model selection and model validation  [_postponed to next week_]

## 1.2 NumPy and SciPy

[NumPy](http://www.numpy.org/) adds efficient vectors and matrices to Python that support vectorized operations

In [11]:
import numpy as np  # Idiomatic import

In [12]:
# Create a NumPy Array
A = np.array([1,1.5,2,2.5])
A

array([ 1. ,  1.5,  2. ,  2.5])

Unlike Python lists, all the values in a NumPy array have the same type.  This allows NumPy to implement operations on vectors much more efficiently than Python's operations on lists.

In [13]:
# Vector operations
A = np.array([1.0, 2.0, 3.0])
B = np.array([7.0, 8.0, 9.0])
C = A + 3*B
C

array([ 22.,  26.,  30.])

**Ex 1.2.1.  How would you do this operation with two lists [1.0,2.0,3.0] and [7.0, 8.0, 9.0]**  
**Hint: `zip`**

Indexing for NumPy arrays works like for lists:

In [14]:
print(A[0])
print(A[1:3])
print(A[:2])
print(A[-1])

1.0
[ 2.  3.]
[ 1.  2.]
3.0


In [15]:
# Can apply operations to particular slices of an array
print(B)
B[1:3] += 1
print(B)

[ 7.  8.  9.]
[  7.   9.  10.]


There are various ways of creating vectors with some structure.  Here are some examples:

In [16]:
np.zeros(3)

array([ 0.,  0.,  0.])

In [17]:
np.ones(5)

array([ 1.,  1.,  1.,  1.,  1.])

A useful operation is to create a vector of $n$ items regularly spaced in the interval $[start,end]$.  That's what `np.linspace` is for:
```
np.linspace(start, end, n)
```

In [18]:
x = np.linspace(1.0, 4.0, 7)
x

array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ])

Sometimes all you want is the equivalent of `range` as a NumPy array.  Use `np.arange` for that:

In [19]:
print(range(10))
print(np.arange(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0 1 2 3 4 5 6 7 8 9]


**Ex 1.2.2 Using only NumPy operations, create a vector containing the numbers from 1 to 10.  Then construct from there a vector with the squares from $1^2$ to $10^2$.**

Sometimes, it's more useful to space the elements equally in log-space:

In [20]:
# [base**start, base**(start+1), ..., base**(stop)]
np.logspace(1, 4, num=4, base=10)

array([    10.,    100.,   1000.,  10000.])

---

The `np` module has element-wise versions of most common math operations:

In [21]:
print(np.exp(x))
print(np.sin(x))
print(np.max(x))
print(np.sum(x))

[  2.71828183   4.48168907   7.3890561   12.18249396  20.08553692
  33.11545196  54.59815003]
[ 0.84147098  0.99749499  0.90929743  0.59847214  0.14112001 -0.35078323
 -0.7568025 ]
4.0
17.5


In [22]:
# Can apply Boolean expressions element-wise..
print(x)
x % 2 == 1   # Which elements are odd?

[ 1.   1.5  2.   2.5  3.   3.5  4. ]


array([ True, False, False, False,  True, False, False], dtype=bool)

Element-wise Boolean expression, such as the one above, are useful for selecting the elements in an array with a certain property:

In [23]:
x[x % 2 == 1]          # Which members of x are odd?

array([ 1.,  3.])

This leads to very idiomatic expressions, such as this one:

In [24]:
np.sum(x[x % 2 == 1])  # Sum odd numbers in x

4.0

Of course, vectors have more interesting operations that simply element-wise math:

In [25]:
# Dot products
print(A)
print(B)
print(A.dot(B))
print(np.dot(A,B))  # Equivalent syntax, more symmetric

[ 1.  2.  3.]
[  7.   9.  10.]
55.0
55.0


**Ex 1.2.3. Calculate the dot product of A and B without using np.dot**

In [26]:
# Cumulative sums and products
print(x)
print(x.cumsum())
print(x.cumprod())

[ 1.   1.5  2.   2.5  3.   3.5  4. ]
[  1.    2.5   4.5   7.   10.   13.5  17.5]
[   1.      1.5     3.      7.5    22.5    78.75  315.  ]


**Ex 1.2.4  Use `dot` and `linspace` to calculate the following sum:**

$$
1\cdot 2 + 2\cdot 3 + \cdots + 10\cdot 11.
$$

**Ex 1.2.5 Write your own my_cumsum() and my_cumprod() functions that operate on lists instead of NumPy arrays**

We can test if `any` or `all` of the elements of a vector have some property:

In [27]:
one_to_ten = np.arange(1,10+1)
np.all(one_to_ten > 0)

True

In [28]:
print(one_to_ten % 5 == 0)
np.any(one_to_ten % 5 == 0)  # Any multiples of five in there?

[False False False False  True False False False False  True]


True

---

Matrices are represented as 2D arrays

In [29]:
mat = np.array([
        [1., 2.],
        [3., 4.]
    ])
mat

array([[ 1.,  2.],
       [ 3.,  4.]])

Some simple matrices can be constructed directly, and are useful for building up more complicated operations:

In [30]:
# nxn Identity
I = np.eye(3)
I

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [31]:
# nxn zeros
Z = np.zeros((3,3))  # Pass shape as a tuple
Z

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [32]:
# A diagonal matrix
D = np.diag((1,2,3))  # Pass diagonal elements as a tuple
D

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

We can also build larger matrices by stacking smaller ones:

In [33]:
x = np.array([1., 2., 3.])
y = np.array([10., 20., 30.])
print np.hstack((x, y))  # [x y]
print np.vstack((x, y))  # / x \
                         # \ y /

[  1.   2.   3.  10.  20.  30.]
[[  1.   2.   3.]
 [ 10.  20.  30.]]


The shape of a matrix can be accessed with `shape` and changed with `reshape()`:

In [34]:
A = np.array([[1,2,3], [4,5,6]])
print(A.shape)
print(A)
print(A.reshape((3,2)))
print(A.reshape((6,1)))

(2, 3)
[[1 2 3]
 [4 5 6]]
[[1 2]
 [3 4]
 [5 6]]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]]


In [35]:
B = np.array([[1,2,3,4,5,6,7,8]])
B

array([[1, 2, 3, 4, 5, 6, 7, 8]])

In [36]:
B.shape

(1, 8)

In [37]:
B.reshape((8,1))

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

In [38]:
print (B)
print(B.T)

[[1 2 3 4 5 6 7 8]]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]


Indexing 2D arrays can be quite interesting.  Here are some examples:

In [39]:
print(A[0,2])      # One element
print(A[0:1,1:2])  # Slicing in multiple dimensions
print(A[1,:])      # Whole 2nd row
print(A[:,2])      # Whole 3rd column

3
[[2]]
[4 5 6]
[3 6]


Normal matrix operations are there:

In [40]:
mat = np.array([
        [1., 2.],
        [3., 4.]
    ])

In [41]:
# Transpose
mat.T

array([[ 1.,  3.],
       [ 2.,  4.]])

In [42]:
# Trace
mat.trace()

5.0

In [43]:
# Matrix-vector product
v = np.array([5., 6.])
mat.dot(v)

array([ 17.,  39.])

In [44]:
# NOTE: `mat * v` is element-wise multiplication, with the elements of `v`
# broadcast (repeated) along the additional dimensions of `mat`.
# Broadcasting is an advanced topic that we won't cover, but be
# aware of the actual meaning of `mat * v`, which is not matrix-vector
# multiplication in the mathematical sense!
mat * v

array([[  5.,  12.],
       [ 15.,  24.]])

More advanced linear algebra operations are also implemented within the `np.linalg` module:

In [45]:
# Eigenstuff
eigvals, eigvecs = np.linalg.eig(mat)
print eigvals
print eigvecs[0]
print eigvecs[1]

[-0.37228132  5.37228132]
[-0.82456484 -0.41597356]
[ 0.56576746 -0.90937671]


In [46]:
# Determinants
np.linalg.det(mat)

-2.0000000000000004

In [47]:
# Matrix inverse
np.linalg.inv(mat)

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

In [48]:
# Solve linear systems: M*x = v
print np.linalg.solve(mat, v)
print np.linalg.inv(mat).dot(v)

[-4.   4.5]
[-4.   4.5]


**Ex 1.2.6. With the help of IPython's autocomplete, calculate the Singular Value Decomposition $U S V$ of the matrix `mat` above.  Combine the resulting matrices to verify that the decomposition is correct.**

## Random Numbers

Plenty of common probability distributions are implemented in NumPy.  You can probe information about these distributions, as well as sample random variates from them.

We'll only scratch the surface here, in order to generate random data for our machine learning examples later today.

In [49]:
np.random.randint(100)  # Random integer in [0,100)

17

In [50]:
np.random.rand()  # Random number in [0,1)

0.12054533603703499

In [51]:
# Random 3x4 matrix, elements are uniformly sampled from [0,1)
np.random.rand(3,4)

array([[ 0.84485502,  0.93113617,  0.24028584,  0.5565359 ],
       [ 0.57800915,  0.64983393,  0.08862662,  0.47107447],
       [ 0.74746244,  0.63511684,  0.22167597,  0.45374678]])

In [52]:
np.random.seed(1)
np.random.rand()

0.417022004702574

In [53]:
# Generators for lots of distributions
# There's more in scipy.random.distributions later today...
mu = 2.0
sigma = 1.5
print np.random.normal(mu, sigma)

0.796740742027


In [54]:
# Generate lots of numbers and check statistics on them
N = 1000
nums = np.random.normal(mu, sigma, N)
print nums[:10]  # Print first 10 for debugging
print 'Mean: %.2f, Std Dev %.2f' % (np.mean(nums), np.std(nums))

[ 1.32668329  0.34109739 -0.48177318 -1.54520291  3.70301802  0.4744788
  2.95604272  0.71014009  4.65891144  0.33445542]
Mean: 2.06, Std Dev 1.46


**(!) Ex 1.2.7.  We're having elections in a few weeks!  If we could ask everyone their opinion, we'd find the vote split like this:**
* **60%: Oligarch's Party**
* **40%: Plutocrat's Party**

**But we can't ask everyone: we have to do a survey.**

**Write a function to run a survey with N people, returning the fraction that supports the Oligarch's Party.  Now write another function to run K surveys, and reports the mean and standard deviation of the results of the surveys.**

**By picking a large value for K (say, 1000), estimate the margin of error of a survey of 100 people.  What about 1000 people?  Would it be much better to ask 10,000 people? 1,000,000 people?**

---
# 15-20 minute break
---

### SciPy

[SciPy](http://docs.scipy.org/doc/scipy/reference/) adds lots of operations common in scientific and engineering contexts.  It's a huge library with lots to explore.

Many SciPy functions are wrappers around heavy-duty Fortran libraries that have stood the test of time (e.g., LAPACK, MINPACK, ...)

Again, we'll just scratch the surface here.

### Lambdas

We need to introduce one new Python concept before moving on: anonymous functions, or **lambdas**.

Lambdas are single-expression functions that you don't need to name explicitly.  You typically use them when you want to pass a function as a parameter, instead of just data.

Here's an example of a simple function, and its lambda equivalent:

In [55]:
def add_two_named(a,b):
    return a + b

add_two_lambda = (lambda a, b: a + b)

In [56]:
print(add_two_named(2,3))
print(add_two_lambda(2,3))

5
5


The general syntax is:
```
lambda <param1>, <param2>, ...: <expression>
```

Let's build an example where this is useful.  The `map` built-in function in Python applies a function to each item in a list.  Here's an example:

In [57]:
def f(x):
    return x**2

map(f, [1,2,3])

[1, 4, 9]

In [58]:
[f(1), f(2), f(3)]

[1, 4, 9]

In idiomatic Python, you'd usually use a list comprehension for this:

In [59]:
[x**2 for x in [1,2,3]]

[1, 4, 9]

But in other contexts (such as when using `PySpark`, which we'll talk about next week), the `map` way is more natural.

In most cases, you'd pass in a simple function, so it's overkill (and less readable) to define a named function and then pass it by name.  Instead, you'd do this:

In [60]:
map(lambda x: x**2, [1,2,3])

[1, 4, 9]

Another common use case is when sorting a list.  Sometimes you want to sort not by the contents of the list, but by something closely related.  For instance, here's one way of sorting a list of strings without regards to case:

In [61]:
fruits = ['BANANA', 'cherry', 'DaTE', 'apple']
print(sorted(fruits))  # Case-sensitive
print(sorted(fruits, key=lambda x: x.lower()))  # Case insensitive

['BANANA', 'DaTE', 'apple', 'cherry']
['apple', 'BANANA', 'cherry', 'DaTE']


**Ex 1.2.8  Sort the following list of full names by surname:**
```
names = ["John Doe", "Mary Jane", "Jake Williamson", "Jack Ripper"]
```

In SciPy, it's common to apply operations on functions (e.g., integration, differentiation, root-finding, etc.).  For small functions, you'd use a lambda.  We'll see examples below.

### Numerical integration

SciPy is good at calculating integrals of arbitrary functions, as follows:

In [62]:
from scipy import integrate

Single integrals:
$$
\int_0^1 3 x^2 = \left[x^3\right]^1_0 = 1
$$

In [63]:
answer, err = integrate.quad(lambda x: 3 * x**2, 0, 1)  # [x**3]^1_0 == 1
print 'Answer: %f +/- %g' % (answer, err)

Answer: 1.000000 +/- 1.11022e-14


Double integrals:
$$
\int_0^\infty dt\, \int_1^\infty dx\, \frac{e^{-x t}}{t^n}
$$

In [64]:
from scipy import Inf, exp
n = 2
integrate.dblquad(lambda t, x: exp(-x*t)/t**n, 0, Inf,
                  lambda t: 1, lambda t: Inf)

(0.4999999999985751, 1.8855521033422917e-09)

## Optimization

Let's say we want to know where a function $f(x)$ attains its minimum.  Let's use a super-simple function to illustrate:

$$
f(x) = 2 x^2 + 5 x - 7.
$$

In [65]:
def f(x):
    return 2 * x**2 + 5*x - 7

Without any further information, SciPy can find this function's minimum by astute trial and error (technically, using the Nelder-Mead algorithm).  We need to provide a starting guess:

In [66]:
from scipy.optimize import minimize
x0 = np.array([0.0])
res = minimize(f, x0, method='nelder-mead',
               options={'xtol': 1e-8, 'disp': True})
print res

Optimization terminated successfully.
         Current function value: -10.125000
         Iterations: 38
         Function evaluations: 77
  status: 0
    nfev: 77
 success: True
     fun: -10.125
       x: array([-1.25])
 message: 'Optimization terminated successfully.'
     nit: 38


We get see that $f(x)$ is minimized at $x=-1.25$, where it attains the value $-10.125$.

Note that we had to provide the starting guess, and SciPy gives us its answer, as a 1-element NumPy array.  This is because SciPy can minimize multi-variable functions.

For this function, trial and error is quick, but for more complicated functions, SciPy can do much better if we give it gradient information.  In our example:

$$
f'(x) = 4 x + 5.
$$

In [67]:
def f_prime(x):
    return 4*x + 5

In [68]:
x0 = np.array([0.0])
res = minimize(f, x0, method='BFGS', jac=f_prime,
               options={'disp': True})
print res

Optimization terminated successfully.
         Current function value: -10.125000
         Iterations: 1
         Function evaluations: 3
         Gradient evaluations: 3
   status: 0
  success: True
     njev: 3
     nfev: 3
 hess_inv: array([[1]])
      fun: -10.125
        x: array([-1.25])
  message: 'Optimization terminated successfully.'
      jac: array([ 0.])


With derivative information, it took SciPy 3 attempts instead of 77.  Quite a difference!

Let's put this all together in a machine learning example.  We'll implement least squares line fitting from first principles.  You'd never do this in production code, but its instructive to see how to put it together with what you know thus far.

**(!!) Ex 1.2.9  Least-squares fitting from first principles**

**(a) Write a function f(x, m, c) that represents a straight line according to the following equation:**

$$
f(x, m, c) = m x + c.
$$

**(b) Using a lambda, how would you specialize this function to a 1-parameter function g(x), defined as follows:**

$$
g(x) = f(x, 5.0, 2.0).
$$

_As in the survey exercise above, we'll generate some simulated data to test out our least-squares fit._

**(c) Write a function to generate N random values of x between 0.0 and 10.0.**

**(d) Write a function to generate the corresponding values of g(x), plus some normal random noise of standard deviation 0.1 and mean 0.0.**

**(e) Least-squares fitting tries to find values of $m$ and $c$ that can be plugged into $y = f(x, m, c)$ that minimize the distance between the actual $y$s and the model $y$s.  In other words, it minimizes the following loss function:**

$$
L(m, c) = \sum_{i=1}^N [y_i - f(x_i, m, c)]^2.
$$

**Write a Python function that calculate L(m, c, xs, ys) given a set of $x_i$ and corresponding $y_i$.**

**(f) Use SciPy to minimize this function and estimate the values of $m$ and $c$.  How close are they to the values used to generate the simulated data?**

## Root finding

We'll quickly highlight a few more things SciPy can do:

Solve
$$
x + 2 \cos(x) = 0.
$$

In [69]:
from scipy.optimize import root

# x_0 = 0.3 is our starting guess
sol = root(lambda x: x + 2 * np.cos(x), 0.3)

print sol

  status: 1
 success: True
     qtf: array([ -1.20746968e-09])
    nfev: 10
       r: array([-2.71445911])
     fun: array([ -6.66133815e-16])
       x: array([-1.02986653])
 message: 'The solution converged.'
    fjac: array([[-1.]])


In [70]:
import math
-1.02986653 + 2*math.cos(-1.02986653)

-1.8397017242932634e-09

Solve
\begin{eqnarray}
x_0 \cdot cos(x_1)&=4,\\
x_0 x_1 - x_1 &=5
\end{eqnarray}

In [71]:
def func2(x):
    f = [x[0] * np.cos(x[1]) - 4,  # == 0
         x[1]*x[0] - x[1]    - 5]  # == 0
    
    # Provide Jacobian to use better solver than above
    df = np.array([[np.cos(x[1]), -x[0] * np.sin(x[1])],
                   [x[1],         x[0] - 1              ]])
    
    return f, df

# method='lm' => use Levenberg-Marquardt algorithm
sol = root(func2, [1, 1], jac=True, method='lm')

print sol

  status: 2
   cov_x: array([[ 0.87470958, -0.02852752],
       [-0.02852752,  0.01859874]])
 success: True
     qtf: array([  9.53474074e-13,   1.20388645e-13])
    nfev: 8
    ipvt: array([2, 1], dtype=int32)
     fun: array([ 0.,  0.])
       x: array([ 6.50409711,  0.90841421])
 message: 'The relative error between two consecutive iterates is at most 0.000000'
    fjac: array([[ 7.52318843, -0.73161761],
       [ 0.24535902, -1.06922242]])
    njev: 7


## Statistics

**Lots** of predefined distributions, with generators, pdfs, cdfs, inverse cdfs, etc.

In [72]:
from scipy import stats

In [73]:
# Like TAB completion in IPython if you type "stats.<TAB>"
', '.join(_ for _ in dir(stats) if not _.startswith('_'))



In [74]:
mu = 1.0
sigma = 2.0
N = stats.norm(loc=mu, scale=sigma)

In [75]:
# Like TAB-completion in IPython if you type "N.<TAB>"
', '.join(_ for _ in dir(N) if not _.startswith('_'))

'args, cdf, dist, entropy, interval, isf, kwds, logcdf, logpdf, logpmf, logsf, mean, median, moment, pdf, pmf, ppf, rvs, sf, stats, std, var'

In [76]:
print(N.mean())
print(N.median())
print(N.std())

1.0
1.0
2.0


In [77]:
print N.pdf(3.5)  # P(N = 3.5) / dx

0.0913245426945


In [78]:
print N.cdf(3.5)  # P(N < 3.5)

0.894350226333


In [79]:
print N.ppf(0.95) # n such that P(N < n) = 95%

4.2897072539


In [80]:
# sample a few random variates
print N.rvs(5)

[-0.66800735  0.63070068  2.38797834 -2.37056008  0.90228352]


---
There is **A LOT** more functionality in NumPy and SciPy than we've covered here.  Do explore the documentation for both:
* [NumPy Reference documentation](http://docs.scipy.org/doc/numpy/reference/index.html)
* [SciPy Reference documentation](http://docs.scipy.org/doc/scipy/reference/index.html)