# Lecture 3:  Numpy Arrays
In this lecture, we will learn a new type: `array`(which is like matrix/tensor in MATLAB)

## Multi-dimensional arrays. 

Why do we need them?

For example, let's say I have a whole bunch of data points like the preference for movies, like

    Person    The Matrix      Inception      Force Awakens     Wall-E ...
         A         5              4                5              3
         B         4              ?                4              ?
         C         ?              ?                5              5
         ...
         
We would like to guess how Person C would rate Inception. Not an easy problem, Netflix actually posted a million dollar prize for this problem.

BUT in the first place, how would we store this kind of data?

If we wanted to just store the matrix, we can do a list of lists (nested lists), let's see an example.

In [None]:
l1 = [1,2]
l2 = [3,4]
l3 = [l1, l2]
print(l3)

In [None]:
type(l3)

In [None]:
l3[0][0] # l3[0,0] does NOT work for list of lists

Back to the movie pref example. There are 3 lists (one per person) and each one has 4 entries. So it's a `3x4` matrix. We can do a lot with lists of lists, but it will be slow. We would eventually need a more efficient and flexible ways of using multi-dimensional arrays:

    xs = [[5, 4, 5, 3], [4, -1, 4, -1], [-1, -1, 5, 5]] # list of lists

The standard way of working with data sets in Python is to use the **Numpy** library. A Numpy array is a multi-dimensional array. 

## Run the following line before you proceed

In [None]:
# the almighty linear algebra package in Python
import numpy as np # np means numpy in any Python code

To see why `numpy` array is superior to lists, let us consider the following example:

In [None]:
# np.array()
x = np.array([1, 2, 3, 4]) # the input can be a list
print(x/3)

In [None]:
# what if y is a list
y = [1,2,3,4]
print(y/3)

Now let us get back to the `xs` earlier.

In [None]:
arr = np.array([[5, 4, 5, 3], [4, -1, 4, -1], [-1, -1, 5, 5]])

In [None]:
print(arr)

In [None]:
arr

In [None]:
type(arr)

In [None]:
#arr is now an object
arr.T # transpose

## Tuple
Every array has a `shape` (dimension), and the shape of an array is a tuple which is initialized by `()`, opposing to list initialized by `[]`.

In [None]:
arr.shape
# the shape has type tuple which we will get to that later

In [None]:
type(arr.shape)

### How to access elements

In [None]:
arr = np.array([[1,2,3], [4,5,6]])
print(arr)

In [None]:
# accessing elements:
arr[0]    # 1st row

In [None]:
arr[1]     # 2nd row

In [None]:
arr[0,0]    # very different from lists of lists, for those, we would have done arr[0][0]

In [None]:
arr[0,1]

In [None]:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr)

## Slicing
We can use slicing `:` which is similar to `:` in MATLAB, which is a tricking to obtain certain indexed elements very fast. The syntax for slicing is to put `start:stop:step` as indices for an array, if there is no `step`, it is 1.

In [None]:
arr[0,:]

In [None]:
arr[:,0]

### Remark: 
Index `i`, returns the same values as `i:i+1`. In particular, a selection tuple with the $p$-th element an integer (and all other entries `:`) returns the corresponding sub-array with dimension $N - 1$. If $N = 1$ then the returned object is an array scalar.

If the selection tuple has all entries : except the $p$-th entry which is a slice object `i:j:k`, then the returned array has dimension $N$ formed by concatenating the sub-arrays returned by integer indexing of elements $i, i+k, ..., i + (m - 1) k < j$.

In [None]:
print(arr)

In [None]:
arr[1:3,1:3]

In [None]:
arr[1:2,1:2]

In [None]:
arr[1:-1,:]

The indexing tricks above are enough for our class. For advanced slicing and indexing tricks, please refer to [https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#advanced-indexing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#advanced-indexing)

#### nd-array
3 or more dimensional arrays too

In [None]:
arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3)

In [None]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(arr1)
print(arr2)

In [None]:
arr3 = np.array([arr1,arr2])
print(arr3)

In [None]:
arr3.shape  # in matlab it would be shape(arr3)

In [None]:
arr3[0,1,2]

### Building Arrays

In [None]:
arr = np.zeros(3)

In [None]:
arr = np.zeros([3,3])   # you put the shape in as a list

In [None]:
arr = np.zeros([3,3,2])
arr

The identity matrix, for example, $\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1\end{pmatrix}$

In [None]:
arr = np.identity(3)
arr # 1. means 1.0, which is a float

Equally distant points:

In [None]:
arr = np.arange(10) # this is not arrange!!!!
print(arr)

In [None]:
arr = np.arange(1,2,0.1)
print(arr) # be careful with the indexing!

In [None]:
arr = np.linspace(1,2,11)   # 11 points between 0 and and 1 (inclusive)
print(arr)

### Vectorization:

In [None]:
# guess what will happen?
arr = np.zeros([3,3])
arr = arr + 1
print(arr)

Similarly

In [None]:
def f(x):
    return x*x + x + 1

In [None]:
f(arr)    # again this would never work for lists

### Remark: 
for library functions you may need to use a function called `np.vectorize`.

Let us see the following example:

In [None]:
np.array([1,2,3]) + np.array([4,5,6])  

 Try the following code: if these were lists, it would be concatenation

In [None]:
[1,2,3]+[4,5,6]

It added the arrays as if they were vectors. Numpy figures out how to use the function with the array you gave. 

### Reshape
Alternatively, maybe you want to control it youself by `reshape`:

In [None]:
arr = np.array(range(16))

In [None]:
arr.reshape(4,4)

In [None]:
arr.reshape(2,2,-1)    # if you put -1, it figures out what the shape should be

In [None]:
arr

In [None]:
np.reshape(arr, (2,2,-1)) # -1 means unspecified

### Vectorization

In [None]:
arr = np.array(range(9)).reshape(3,3) + 1

In [None]:
arr

In [None]:
np.sum(arr)

In [None]:
np.mean(arr)

In [None]:
np.apply_along_axis()

In [None]:
np.apply_along_axis(np.sum, 0, arr)

In [None]:
np.apply_along_axis(np.sum, 1, arr)

In [None]:
np.sum(arr, axis=1)

There is also: `np.apply_over_axes`, be aware of this plural `axes`.

In [None]:
np.apply_over_axes(np.sum, arr, 0)