# Introduction to numpy

The code from this notebook is largely inspired by [this tutorial from Stanford University's CS231n course](https://cs231n.github.io/python-numpy-tutorial) about Convolutional Neural Networks for Visual Recognition and [Dataquest's "Data Analyst in Python course"](www.dataquest.io%2Fpath%2Fdata-analyst%2F).


## Doing math with lists
Last week we discussed lists which are a way to store an **ordered collection of objects**, they are however not designed to perform scientific computing in Python. Try for instance the following code: 

In [1]:
my_list = [1., 4., 2.5]
my_list * 3

[1.0, 4.0, 2.5, 1.0, 4.0, 2.5, 1.0, 4.0, 2.5]

In [4]:
my_list = [1., 4., 2.5]
my_list + 1

TypeError: can only concatenate list (not "int") to list

In [5]:
my_list1 = [1., 4.,  2.5]
my_list2 = [5., 3.1, 6. ]
my_list1 + my_list2

[1.0, 4.0, 2.5, 5.0, 3.1, 6.0]

**Exercise**:
Write the code allowing to:
- Multiply each element of a list by a scalar
- Add a scalar to each element of a list
- Add the elements of two lists of same length

In [None]:
def multiply_list_by_scalar(my_list, scalar):
    "Multiply each element of my_list by scalar"
    # TODO


multiply_list_by_scalar([1., 4., 2.5], 3)

In [None]:
def add_scalar_to_list(my_list, scalar):
    "Add scalar to each element of my_list"
    # TODO


add_scalar_to_list([1., 4., 2.5], 1.)

In [None]:
# Bonus here if you use the "zip" function
def add_elements_of_two_lists(my_list1, my_list2):
    "Return a list containing the element-wise sum of two input lists"
    # TODO
    

add_elements_of_two_lists([1., 4.,  2.5], [5., 3.1, 6. ])

## Basic operations using Numpy arrays
[Numpy](https://numpy.org/) is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The shape of an array is a tuple of integers giving the size of the array along each dimension.

### Initializing an array from lists 
We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [6]:
import numpy as np  # np is a common alias for numpy

a = np.array([1, 2, 3])   # Create a 1-dimensional array
a

array([1, 2, 3])

In [7]:
print(type(a))  # This will display <class 'numpy.ndarray'>

<type 'numpy.ndarray'>


In [8]:
b = np.array([[2., 3.], [0, 1]])  # Create a 2-dimensional array
b

array([[2., 3.],
       [0., 1.]])

In [9]:
# Get the shape of an array
print(a.shape)
print(b.shape)

(3L,)
(2L, 2L)


Different datatypes (integers, float, boolean...) can be stored in a numpy array, see [numpy's documentation](https://numpy.org/doc/stable/reference/arrays.dtypes.html) for an extensive list.

In [10]:
# Get the data type of an array
print(a.dtype)
print(b.dtype)

int32
float64


### Alternatives for creating an array

In [11]:
# TODO: Create a 2x3 array filled with 0. 
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [12]:
# TODO: Create a 2x3 array filled with 1. 
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [13]:
# TODO: Create an array with integers from 0 to 9
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
# Create a 3x3x3 array filled with random values between 0 and 1
np.random.random((3,3,3))

array([[[0.637048  , 0.65554731, 0.23760336],
        [0.78595782, 0.28242004, 0.02269487],
        [0.79179207, 0.43628013, 0.11416447]],

       [[0.21315621, 0.36231136, 0.80943883],
        [0.3811823 , 0.81720689, 0.20779209],
        [0.51537368, 0.37356749, 0.66167835]],

       [[0.405087  , 0.49084935, 0.49337965],
        [0.79164567, 0.90472504, 0.64662571],
        [0.55805839, 0.5274889 , 0.12217666]]])

In [None]:
# TODO: Create a 2x3 array filled with False 
np.zeros

In [None]:
# TODO: Create a 2x3 array filled with True 
np.ones

And more (np.eye, np.full, np.linspace...)

In [None]:
# 2x2 Identity matrix
np.eye(2)

In [None]:
# 3x3 array filled with 5
np.full((3,3), 5)

In [None]:
# Array of 5 elements linearly spaced between 0 and 1
np.linspace(0, 1, 5)

### Accessing elements of a numpy array
Numpy offers several ways to index into arrays.

#### Slicing
Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [None]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2
b = a[:2, 1:3]
b

In [None]:
# You can also directly modify a value stored in an array as follows:
a[1, 1] = 42
a

#### Mixing integer indexing with slice indexing
You can also mix integer indexing with slice indexing.

**WARNING** Doing so will yield an array of lower rank than the original array !

In [15]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

# Grab column with index 1 from a:
a[:, 1]

array([ 2,  6, 10])

In [16]:
# Note that the resulting array is 1-dimensional
a[:, 1].shape

(3L,)

**Exercise**: Select second value of the first and last rows of `a` to obtain a one dimensional array:


In [18]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
a[::2,1] ou a[(0,2),1]

array([ 2, 10])

#### Boolean array indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition.

In [None]:
a = np.arange(5)
a

In [None]:
bool_idx = np.array([False, True, True, False, False])
a[bool_idx]

### Dealing with arrays' shape

In [39]:
# Reminder: you can access an array's shape as follows:
a = np.zeros((2, 6))
a.shape

(2L, 6L)

#### Reshaping
You can reshape an array as follows:

In [40]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
a.reshape(2, 6)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

Or alternatively:

In [41]:
np.reshape(a, (2, 6))  # ! Note that we used the tuple (2, 6) here

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [43]:
# TODO: Combine np.arange and the reshape method to generate an array with content:
# [[1,  2,  3,  4],
#  [5,  6,  7,  8],
#  [9, 10, 11, 12]]
np.arange(1,13).reshape(3,4)


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

### Main operations using one numpy array
#### Array math
Mathematical operations (+, -, *, /, **...) between a numpy array and a scalar are applied to each of their elements:

In [62]:
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [21]:
a + 2.5  # Add 2.5 to each element

array([ 2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5, 11.5, 12.5,
       13.5])

In [22]:
a * 5  # Multiply each element by 5

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55])

In [23]:
a ** 3  # each of the elements to the power 3

array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331])

#### Boolean operations
Numpy arrays support the classical boolean operations (==, !=, <, <=, >, >=) that are here applied to each element:

In [63]:
a = np.arange(8)
a

array([0, 1, 2, 3, 4, 5, 6, 7])

In [25]:
a > 3

array([False, False, False, False,  True,  True,  True,  True])

**Exercise**:

Combine array math, boolean operations and Boolean array indexing to select the values of `a` whose square is bigger than 40 **without using any loop**:

In [65]:
a = np.arange(-10, 10, 2)
a[a**2 > 40]



array([-10,  -8,   8])

#### Other useful functions
Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum` (many others such as `mean`, `diff` or `count` are also availble, see [numpy's documentation](https://numpy.org/doc/stable/reference/routines.math.html) for a full list):

In [66]:
a = np.array([[1.5, 2.],
             [2.1, 5.]])
a

array([[1.5, 2. ],
       [2.1, 5. ]])

In [67]:
# Sum all elements of a
np.sum(a)

10.6

In [70]:
# Sum the elements of a along an axis, 0colonne 1ligne
np.sum(a, axis=1)

array([3.6, 7. ])

In [71]:
# sum can also be called as follows
a.sum()

10.6

### Main operations using two (or more) numpy arrays
#### Mathematical operators on two arrays
Basic element-wise mathematical operations (+, -, /, **...) between two arrays are also available as follows:

In [73]:
a = np.array([1., 1.5, 2.])
b = np.array([2., 1.5, 0.])
a + b

array([3., 3., 2.])

Equivalently, you can call the corresponding numpy function like for instance:

In [74]:
np.add(a, b)

array([3., 3., 2.])

**WARNING** the `*` operator between two arrays corresponds to element-wise multplication, not matrix multiplication (you can use `numpy.dot` for that).

In [75]:
a * b

array([2.  , 2.25, 0.  ])

Many more operations are available in [numpy's documentation](https://numpy.org/doc/stable/reference/routines.math.html)

**Exercise:**
Write a function that computes the euclidean norm of an array along an axis:

In [88]:

def norm(arr, axis=None):
    """Return the euclidean norm of an array along an axis.
    
    By default (axis=None), return the euclidean norm using all values.
    """
    sum_of_squares = np.sum(arr**2, axis=axis)
    return np.sqrt(sum_of_squares)
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
print(norm(a), axis=1)
print(norm(a))

25.495097567963924


In [80]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
np.

78

In [None]:
np.sum(a**2, axis=1)

#### Boolean operations
Boolean operations between two arrays can also be performed as follows (more operations can be found in the [documentation](https://numpy.org/doc/stable/reference/routines.logic.html)):

In [89]:
a = np.array([True, False, True, False])
b = np.array([False, True, True, False])

# a AND b
a & b  # equivalent to np.logical_and(a, b)

array([False, False,  True, False])

In [90]:
# a OR b
a | b  # equivalent to np.logical_or(a, b)

array([ True,  True,  True, False])

In [91]:
# not a
~a  # np.logical_not(a)


array([False,  True, False,  True])

You can find the indexes for which an array of boolean is True by using `np.where` (note the format of the output):

In [93]:
a = np.array([True, False, True, False])
np.where(True)

(array([0], dtype=int64),)

**Exercise** Find the indexes where the following array's values switch from False to True **without using any loop**:

In [96]:
a = np.array([False, True, False, True, True, True, False, False, True, False, True])
  
# TODO: should print out array([ 1,  3,  8, 10])
value_is_false=~a[:-1]
followinf_value_is_true=a[1:]
both=value_is_false=&followinf_value_is_tru
np.where(~a[:-1]& a[1:])[0] +1

array([ 1,  3,  8, 10], dtype=int64)

#### Broadcasting
Broadcasting is a mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array. Numpy broadcasting allows us to perform this kind of computation using only one copy of each array.

In [None]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[ 1,  2,  3],
              [ 4,  5,  6],
              [ 7,  8,  9],
              [10, 11, 12]])

v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

### A few tricks that could make your life easier
#### You will never beat numpy's performance by coding a function yourself
If you can avoid a loop by using a numpy function, by all means DO IT.

In [97]:
%%timeit
n = 10000
scalar = 3.2

l = list(range(n))
for i, e in enumerate(l):
    l[i] = e * scalar

1000 loops, best of 3: 744 µs per loop


In [100]:
%%timeit
n = 10000
scalar = 3.2
l = np.arange(n)
l = scalar * l

The slowest run took 7.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 15.7 µs per loop


#### Tuples as arguments for specifying shape
Many numpy functions require a tuple to specify the shape of the output, most of the time a `TypeError` will be raised if sevaral integers are directly provided: 

In [None]:
# TODO: correct the following code to get a 3x2 array
np.zeros(3, 2)

#### Using -1 and reshape
Using -1 for one dimension allows to directly get the correct shape if the others are specified: 

In [101]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [102]:
# TODO: reshape a so that it has 2 rows

In [103]:
a.reshape(2, -1)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

#### Using -1 and reshape or np.newaxis to add a new dimension
Sometimes, you will need to artificially add a dimension to an array for instance so that the shape of two arrays match for a particular operation, this can be done as follows:

In [None]:
a = np.arange(10)
a

In [None]:
# Reshape a as a 10x1 array
a.reshape(-1, 1)

In [None]:
# Idem using np.newaxis
a[:, np.newaxis]

#### Transposing an array
You can transpose an array by using the `numpy.transpose` function or by calling `array.T`:

In [None]:
a = np.arange(10).reshape(2, 5)
a

In [None]:
a.T  # Equivalent to np.transpose(a)

#### A slice is a view into the same data
**WARNING** A slice of an array is a view into the same data, so modifying it will modify the original array:

In [None]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

b = a[:2, 1:3]
# b is now: 
# array([[2, 3],
#        [6, 7]])

a[1, 1] = 42


# b is a view of the same data as the ones in a,
# modifying a also modified b
b 

### Exercises
Your are not allowed to use loops !

Swap rows 1 and 2 in the array `arr`

In [104]:
arr = np.arange(9).reshape(3,3)
# TODO
arr[::-1]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

Write a function allowing to rescale the values of an array:

In [None]:
def rescale_array(arr, new_min=0., new_max=1.):
    "Rescale an array's values between new_min and new_max."
arr=(arr-arr.min())/(arr.max()-arr.min())
arr=arr*(new_max-new_min)


arr = np.arange(9).reshape(3,3)
rescale_array(arr, new_min=-1., new_max=1.)
# Should print:
# array([[-1.  , -0.75, -0.5 ],
#        [-0.25,  0.  ,  0.25],
#        [ 0.5 ,  0.75,  1.  ]])

Write a function that replace missing values by a constant:

In [None]:
def replace_nan_by_value(arr, new_value=0.):
    "Replace the nan values in an array by a new value."
    # TODO


arr = np.array([0., 1.5, .75, np.nan, 0., np.nan])
arr

Write a function that return the count of unique values in a numpy array:

In [None]:
def count_unique(arr):
    "Return the number of unique values in an array"
    # TODO


arr = np.array([0, 1, 2, 2, 2, 3, 1, 3, 0]).reshape(3, -1)
arr