# Numpy and matplotlib
Multi-dimensional arrays and 2D visualization in Python.

Author: Chloé-Agathe Azencott chloe-agathe.azencott@mines-paristech.fr
    
With many thanks to [Alexandre Gramfort](http://alexandre.gramfort.net/) (Telecom ParisTech).

__This notebook contains 5 small problems. Make sure you have done them before moving on to the next notebook.__

## 1. Numpy

`numpy` is a python package used in most scientific and numerical computing in Python. It provides performing data structures for manipulating vectors, matrices and tensors of arbitrary dimensions.

It is written in C and Fortran, and therefore performs very well on vector/matrix computations.

In [None]:
# Import the numpy module
import numpy as np

### 1.1 Numpy arrays

NumPy arrays are a fundamental structure for scientific computing. Numpy arrays are homogeneous (i.e. all objects it contains have the same type) multi-dimensional arrays, which we’ll use among other things to represent vectors and matrices.

Let us explore some basic Numpy commands.

### 1.2 Creating arrays

In [None]:
# Create a 1D vector from a list
v = np.array([1, 2, 3, 4])
print v
print type(v)

In [None]:
# Create a random array of size 3 x 5
X = np.random.random((3, 5))
print X
print type(X)

In [None]:
# Create an array of zeros of size 3 x 5
np.zeros((3, 5))

In [None]:
# Create an array of ones of size 3 x 5
np.ones((3, 5))

In [None]:
# Create the identity matrix of size 4 x 4
np.eye(4)

In [None]:
# Create a range
np.arange(0, 10, 2) # arguments: start, stop, step

In [None]:
# Create n points evenly spreading the [stop, start] interval 
# with linspace, both start and stop are included
np.linspace(0, 10, 3) # start, stop, n

In [None]:
# Same in log scale
np.logspace(0, 10, 3, base=np.e)

### 1.3 Information about arrays

In [None]:
# The dimensions of an array are accessible via shape
print v.shape
print X.shape

In [None]:
# The total number of elements of an array are accessible via size
print v.size
print X.size

In [None]:
# The number of dimensions of an array
print v.ndim
print X.ndim

### 1.4 Array types
Proceed with caution!
All elements of an array have the same type, accessible via dtype

In [None]:
print v.dtype
print X.dtype

In [None]:
a = np.array([1, 2, 3])
print a.dtype

a[0] = 3.2
print a.dtype
print a

The type of the elements of an array can be explicitely defined with the `dtype` keyword, to be chosen among `int`, `float`, `complex`, `bool`, `int64`, etc.

In [None]:
# A first solution
a = np.array([1,2,3], dtype=np.float)
print a.dtype

a[0] = 3.2
print a.dtype
print a

The type of the elements of an array can be modified with the `astype` function:

In [None]:
# A second solution
a = np.array([1,2,3])
print a.dtype

a = a.astype(float)

a[0] = 3.2
print a.dtype
print a

### 1.5 Accessing elements, rows, and columns of arrays
Remember, in Python indices start at 0.

In [None]:
# Get a single element of a vector
v[0]

In [None]:
# Get a single element of a matrix
print X[0, 1]

In [None]:
# Get a row
print X[0, :]
print X[0]
print "shape of a row vector:", X[0].shape

In [None]:
# Get a column
print X[:, 3]

### 1.6 Slicing
The `[start:stop:step]` syntax applicable to strings and lists also applies to arrays.

In [None]:
# Access a sub-matrix
print X[1:3, :]

In [None]:
# Modify a slice
X[1:3] = np.zeros((2, 5))
print X

In [None]:
X[::2, ::2]

### 1.7 Fancy indexing

It is possible to use lists or arrays to define slices

In [None]:
A = np.array([range(5), range(0, 10, 2), range(1, 6), range(5, 15, 2)])
print A

In [None]:
row_indices = [0, 1, 3]
print A[row_indices]

In [None]:
print(A[[1, 2], [3, 4]])

In [None]:
A[np.ix_([1, 2], [3, 4])] = 0
print A

### 1.8 Masks

[Masks](https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html) provide a convenient way to deal with array entries that should not be used in computations, and are used in particular to deal with missing data. Those entries are not removed from the array but simply masked.

In [None]:
B = np.arange(5)
print B

In [None]:
row_mask = np.array([True, False, True, False, False])
print B[row_mask]

In [None]:
# Equivalently
row_mask = np.array([1,0,1,0,0], dtype=bool)
print B[row_mask]

In [None]:
print B<3

In [None]:
# Convert a binary mask in position indices using *where*
positions = np.where(B<3)
print positions

print B[positions]

In [None]:
a = np.array([1, 2, 3, 4, 5])
print a < 3
print B[a < 3]

In [None]:
print A[:, a<3]

### 1.9 Reshaping arrays

In [None]:
print A

In [None]:
n, m = A.shape

In [None]:
# Reshape A
B = A.reshape((1, n*m))
print B

In [None]:
print A

In [None]:
B[0:5] = -1
print B

In [None]:
print A

The original variable was also modified! B is only a new _view_ of A.

To create a new array that's the vector (1D) version of A, we use `flatten`:

In [None]:
B = A.flatten()
print B

In [None]:
B.shape

In [None]:
B[0:5] = 10
print B

In [None]:
print A

### 1.10 Concatenating arrays

#### 1.10.1 Repeat and tile

In [None]:
a = np.array([[1, 2], [3, 4]])
print a

In [None]:
# repeat each item 3 times
print np.repeat(a, 3)

In [None]:
# The result was 1D!
print np.repeat(a, 3, axis=1)

In [None]:
# repeat the matrix 3 times
print np.tile(a, 3)

#### 1.10.2 Concatenate

In [None]:
b = np.array([[5, 6]])
print b.shape

In [None]:
print np.concatenate((a, b), axis=0)

In [None]:
print np.concatenate((a, b.T), axis=1)

#### 1.10.3 Stack arrays

In [None]:
np.vstack((a,b))

In [None]:
np.hstack((a,b.T))

### 1.11 Array manipulation
We use 2-dimensional arrays to represent matrices, and can do basic linear algebra operations on them.
#### 1.11.1 Transposing an array

In [None]:
print X.T

#### 1.11.2 Applying a transformation to all entries of an array

In [None]:
# Multiply all entries of X by 2:
print 2*X

## Problem 2.1
Compute the following matrices:
* the one obtained by adding 1 to all entries of X
* the one which entries are the logarithm (base 2) of the previous one
* the one which entries are the squares of the entries of X

In [None]:
# Add 1 to all entries of x
# TODO

# Compute the array that has as entries the logarithm (base 2) of the entries of X
# TODO

# Square all entries of X
# TODO 


#### 1.11.3 Matrix multiplication

In [None]:
# Element-wise matrix multiplication
print X*X

In [None]:
# Matrix multiplication 
print np.dot(X, X.T)
print X.dot(X.T)

## Problem 2.2
* Create a random array B of size 5 x 4
* Multiply X by B

In [None]:
# TODO

#### 1.11.4 Diagonal of a matrix

In [None]:
X = np.random.random((3, 5))
# Get the diagonal of  X. Note that X is not square.
np.diag(X)

In [None]:
# Compute the trace of  X
np.trace(X)

#### 1.11.5 Linear algebra

More complex linear algebra operations are available via numpy.linalg:
http://docs.scipy.org/doc/numpy/reference/routines.linalg.html

In [None]:
# Compute the determinant of M=XX'
M = np.dot(X, X.T)
np.linalg.det(M)

In [None]:
# Compute the eigenvalues and eigenvectors of M
v, w = np.linalg.eig(M)
print "eigenvalues:", v
print "eigenvectors:", w

In [None]:
# Compute the inverse of M
print np.linalg.inv(M)

# Check the product of M by its inverse is the identity
print np.dot(M, np.linalg.inv(M))

### 1.12 Simple statistics

We'll start from a [Vandermonde matrix](http://mathworld.wolfram.com/VandermondeMatrix.html)

In [None]:
# Generate a Vandermonde matrix
data = np.vander([1, 2, 3, 4])
print data

#### 1.12.1 Mean

In [None]:
print np.mean(data)

In [None]:
print np.mean(data, axis=0) # column-wise

In [None]:
print np.mean(data, axis=1) # row-wise

In [None]:
print np.mean(data[:, 2]) # for a given column

#### 1.12.2 Standard deviation

In [None]:
# variance
print np.var(data[:, 2]) # for a given column

# standard deviation
print np.std(data[:, 2])

# Check the relationship between variance and standard deviation
print np.sqrt(np.var(data[:, 2]))

In [None]:
print np.var(data) # for the entire matrix

In [None]:
print np.std(data, axis=1) # row-wise

#### 1.12.3 Min and max

In [None]:
data.min()

In [None]:
data.max(axis=0)

In [None]:
data.sum()

#### 1.12.4 sum and product

In [None]:
print np.sum(data[:, 2])

In [None]:
data[:, 2].sum()

In [None]:
data[:, 2].prod()

In [None]:
# cumulative sum
np.cumsum(data[:, 2])

In [None]:
np.cumsum(data[:, 2]+1)

## Problem 2.3

Using Numpy and no loop, compute an approximation of $\pi$ with Wallis' product: $\pi = 2 \prod_{n=1}^\infty \frac{4 n^2}{4 n^2 - 1}$

In [None]:
# TODO

### 1.13 Reading/writing arrays from/to csv files

`data.csv` is a toy data set (4 rows of 5 columns). Let us read it:

In [None]:
!cat data.csv

In [None]:
# Create a numpy array directly from a csv file
M = np.genfromtxt('data.csv', delimiter=',', dtype='int')
print M

In [None]:
# Save a numpy array to a csv file
M = np.random.rand(3, 3)
print M
np.savetxt("random-matrix.txt", M)

In [None]:
! cat random-matrix.txt

In [None]:
np.savetxt("random-matrix.txt", M, fmt='%.3f', delimiter=',')

! cat random-matrix.txt

For more on arrays, you can refer to http://docs.scipy.org/doc/numpy/reference/arrays.html.

For more about NumPy, you can refer to:
* The Tentative NumPy Tutorial at http://wiki.scipy.org/Tentative_NumPy_Tutorial;
* The NumPy documentation at http://docs.scipy.org/doc/numpy/index.html.

## 2. Matplotlib

Visualization is an important part of machine learning. Plotting your data will allow you to have a better feel for it (how are the features distributed, are there outliers, etc.). Plotting measures of performance (whether ROC curves or single-valued performance measures, with error bars) allows you to rapidly compare methods.

matplotlib is a very flexible data visualization package, partially inspired by MATLAB.

matplotlib can be imported with

    from matplotlib import *
    
but in a Jupyter notebook, using the 

    % pylab inline
    
will include your plots in your notebook.

It is equivalent to: 

    import numpy as np
    from matplotlib import pyplot as plt

In [None]:
% pylab inline

### 2.1 Lines

In [None]:
# Plotting a sinusoide

# Create an array of 100 equally-spaced points between 0 and 10 (to serve as x coordinates)
x = np.linspace(0, 10, 100)

# Create the y coordinates
y = np.sin(x)

# Create the plot
plt.plot(x, y)
plt.show() # only necessary if plt.ion() has been called (default in pylab)

In [None]:
# Change plot style
plt.plot(x, y, color='orange', linestyle='--', linewidth=3)

In [None]:
# Plot the individual points
plt.plot(x, y, color='orange', marker='x', linestyle='')

In [None]:
plt.scatter(x, y, marker='x')

In [None]:
# Plot multiple lines
plt.plot(x, y, linewidth=2, label='sine')
plt.plot(x, np.cos(x), linewidth=2, label='cosine')
plt.legend(fontsize=16)

In [None]:
# Add a title and caption and label the axes
plt.plot(x, y, linewidth=2, label='sine')
plt.plot(x, np.cos(x), linewidth=2, label='cosine')
plt.legend(loc='lower left', fontsize=14)
plt.title("Sinusoides", fontsize=14)
plt.xlabel("$f(x)$", fontsize=16)
plt.ylabel("$\sin(x)$", fontsize=16)

In [None]:
# Save the plot
plt.plot(x, y, linewidth=2, label='sine')
plt.plot(x, np.cos(x), linewidth=2, label='cosine')
plt.legend(loc='lower left', fontsize=14)
plt.title("Sinusoides", fontsize=14)
plt.xlabel("$f(x)$", fontsize=16)
plt.ylabel("$sin(x)$", fontsize=16)
plt.savefig("my_sinusoide.png")

## Problem 2.4
Plot a sinusoide of half the amplitude and twice the frequency of the above sine curve.
Make it green.

In [None]:
plt.plot(x, y, linewidth=2, label='sine')
# TODO

### 2.2 Clouds of points

In [None]:
# Create a cloud of points, uniformely distributed over [0, 1] x [0, 1]
X = np.random.rand(1000, 1000)
print X.shape

In [None]:
# Create a square figure
fig = plt.figure(figsize=(4, 4))

# Plot them
plt.scatter(X[0], X[1], marker="+")

# Use the same ranges for both axes
plt.xlim([0, 1])
plt.ylim([0, 1])

### 2.3 Histograms

In [None]:
# Plot the histogram of the first dimension of X
hh = plt.hist(X[0], bins=50)

## Problem 2.5

Create a cloud of 500 normally distributed points. Plot them on a square figure, with orange markers.

In [None]:
# TODO

### 2.4 Visualizing vectors

In [None]:
v = np.array([1, 3, 2, 4])
x = np.array(range(len(v)))

plt.figure()
plt.scatter(x, v, label='v(x)')

plt.xlabel('x', fontsize='16')
plt.ylabel('v', fontsize='16')

### 2.5 Heatmaps and matrix visualization

Matplotlib will automatically assign a color to each numerical value, based on a color map. For more about color maps see:
http://matplotlib.org/users/colormaps.html
http://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html


In [None]:
C = np.random.random((50, 100))

# View matrix (row indices from top to botton)
plt.imshow(C, interpolation='nearest')
plt.colorbar()
plt.show()

In [None]:
# View matrix (row indices from bottom to top)
heatmap = plt.pcolor(C) #, cmap=plt.cm.Blues)
plt.colorbar(heatmap)

### 2.6 2D plots of functions

In [None]:
# Create a grid of points
xx, yy = np.mgrid[0:5, 0:5]
print xx
print yy

In [None]:
# Create a grid of points
xx, yy = np.mgrid[-50:50, -50:50] 

# Plot |x+iy| on this grid 
plt.imshow(np.abs(xx + 1j*yy))
plt.axis('on')
cc = plt.colorbar()

Many more types of plots and functionalities to label axes, display legends, etc. are available. The matplotlib gallery (http://matplotlib.org/gallery.html) is a good place to start to get an idea of what is possible and how to do it.

Note that there are many more plotting libraries for Python. Two of the more popular are:

* Seaborn (based on Matplotlib, but with more aesthetically pleasing defaults):
    http://stanford.edu/~mwaskom/software/seaborn/index.html
* Bokeh (creates interactive plots):
    http://bokeh.pydata.org/

