## Numpy code along


In [1]:
# 'np' is the conventional alias for numpy
import numpy as np
np.__version__

'1.20.3'

#### Numpy arrays

In [2]:
np.array([4,5,6,7])

array([4, 5, 6, 7])

#### Generate an array with random numbers

Documentation:

[function seed](https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html?highlight=seed)

[function random.random](https://numpy.org/doc/stable/reference/random/index.html?highlight=random%20random#module-numpy.random)

In [6]:
# define a random seed first!
#np.random.seed(12)
np.random.random(10)

array([0.78974975, 0.8020134 , 0.28733775, 0.28617917, 0.27947832,
       0.31676376, 0.90571407, 0.53885257, 0.97848519, 0.82758938])

### Other ways to create arrays

#### List to array

This works the same way whether you have a list of lists, a list of tuples, a tuple of lists, or a tuple of tuples.

All the elements get coerced to the same type.

In [9]:
lst = [43,356,2,"Hello",5,6,1]
lst
#lst.remove("Hello")
lst

[43, 356, 2, 'Hello', 5, 6, 1]

The `+` sign works differently for lists and for numpy arrays

In [13]:
[1,2,3] + [4,5,"HELLO"]

[1, 2, 3, 4, 5, 'HELLO']

In [14]:
np.array([1,2,3]) + np.array([4,5,"HELLO"])

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

In [15]:
[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

In [17]:
np.array([1,2,3]) + np.array([4,5,6])

array([5, 7, 9])

In [18]:
type([1,2,3])

list

In [19]:
type(np.array([1,2,3]))

numpy.ndarray

In [20]:
np.concatenate([np.array([1,2,3]),np.array([4,5,6])])

array([1, 2, 3, 4, 5, 6])

#### Constant arrays

Documentation:

[function zeros](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ma.zeros.html?highlight=numpy%20zeros#numpy.ma.zeros)

[function ones](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html?highlight=zeros#numpy.zeros)

In [23]:
print(np.zeros((2,3)))

[[0. 0. 0.]
 [0. 0. 0.]]


In [54]:
# Create an array of all zeros
print(np.zeros(12))

# Create an array of all ones
print(np.ones(12))

# Create any constant array
print(np.full(8, 10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[10 10 10 10 10 10 10 10]


In [26]:
np.full(8, 10)

array([10, 10, 10, 10, 10, 10, 10, 10])

#### Sequential arrays

Documentation:

[function arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html?highlight=arange#numpy.arange)

[function linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html?highlight=linspace#numpy.linspace)

[function normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html?highlight=normal#numpy.random.normal)

[function exponential](https://numpy.org/doc/stable/reference/random/generated/numpy.random.exponential.html?highlight=exponential#numpy.random.exponential)

In [28]:
# Use np.arange to create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)

np.arange(0,50,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48])

In [29]:
np.arange(0,20,2).tolist()

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [31]:
np.array( list( range(0,20,2) ) )

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [34]:
# Use np.linspace to create an array of 20 values evenly 
# spaced between 0 and 10
np.linspace(0,100,20)

array([  0.        ,   5.26315789,  10.52631579,  15.78947368,
        21.05263158,  26.31578947,  31.57894737,  36.84210526,
        42.10526316,  47.36842105,  52.63157895,  57.89473684,
        63.15789474,  68.42105263,  73.68421053,  78.94736842,
        84.21052632,  89.47368421,  94.73684211, 100.        ])

In [33]:
# Use np.random.normal to create a 3x3 array of 
# normally distributed random values
# with mean 0 and standard deviation 1
# np.random.random((3,3))
np.random.normal(size=(3,3))

array([[ 0.75314283, -1.53472134,  0.00512708],
       [-0.12022767, -0.80698188,  2.87181939],
       [-0.59782292,  0.47245699,  1.09595612]])

In [34]:
np.random.random((3,3))

array([[0.47122978, 0.8161683 , 0.28958678],
       [0.73312598, 0.70262236, 0.32756948],
       [0.33464753, 0.97805808, 0.62458211]])

In [35]:
np.random.exponential(size=(3,3))

array([[3.00202251, 1.45876033, 1.74302218],
       [0.52195449, 0.60021876, 0.51187889],
       [5.32633838, 0.19548482, 3.28600177]])

In [36]:
# set a random seed with np.random.seed
# so that you always get the same "random" numbers
np.random.seed(12)
print(np.random.normal(size=(3,3)))
print()
print(np.random.normal(size=(3,3)))

[[ 0.47298583 -0.68142588  0.2424395 ]
 [-1.70073563  0.75314283 -1.53472134]
 [ 0.00512708 -0.12022767 -0.80698188]]

[[ 2.87181939 -0.59782292  0.47245699]
 [ 1.09595612 -1.2151688   1.34235637]
 [-0.12214979  1.01251548 -0.91386915]]


In [36]:
# Use np.random.randint() to create a 
# 3x3 array of random integers in the interval [0, 10)
a = np.random.randint(0,10, (2,3))
a

array([[8, 5, 5],
       [8, 1, 1]])

#### Array Attributes

shape, size, number of dimentions, data type, item size, number of bytes...

In [37]:
a.shape

(2, 3)

In [38]:
type(a.shape)

tuple

In [39]:
a.shape[1]

3

In [40]:
a.size

6

In [41]:
a.ndim

2

In [46]:
a.dtype

dtype('int64')

In [47]:
a.nbytes

48

#### Array indexing

In [45]:
a

array([[8, 5, 5],
       [8, 1, 1]])

In [46]:
a[0,0]

8

In [47]:
# last column
a[:, -1]

array([5, 1])

In [66]:
###. get the second row of the array
a[1,:]

array([2, 9, 5])

#### Exercise



* Print the first row of the previous array
* Print the second row of the previous array
* Print the second column of the previous array
* Print the element of the array which contains the number '3'

In [52]:
a[0,:]

array([1, 4, 9])

In [53]:
a[1,:]

array([5, 3, 5])

In [54]:
a[:,-2]

array([4, 3])

In [68]:
a[1,1]

9

#### Reshaping arrays

Numpy arrays can be casted into a different shape

[function reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html?highlight=reshape)

If the first index of reshape is negative, then your are asking that you to ignore the current shape and have as many columns as specified.

In [70]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:
np.arange(1,10).shape

(9,)

In [58]:
np.arange(1,10).reshape(3,3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [66]:
np.arange(1,13).reshape(3,4).reshape(-1,2)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

#### Exercise

* Reshape the previous array to have only one column
* Reshape the previous array to have four rows and three columns

In [83]:
np.arange(1,13).reshape(3,4).reshape(-1,1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [70]:
np.arange(1,13).reshape(3,4).reshape(-2,1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [69]:
np.arange(1,13).reshape(4,3)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

#### Creating copies

"Dependent" copies 

In [91]:
b = a
print("A is equal to: ",a)
print("B is equal to: ",b)

A is equal to:  [[3 2 8]
 [2 9 5]]
B is equal to:  [[3 2 8]
 [2 9 5]]


In [92]:
a[0,0] = 0
print("A is equal to: ",a)
print("B is equal to: ",b)

A is equal to:  [[0 2 8]
 [2 9 5]]
B is equal to:  [[0 2 8]
 [2 9 5]]


"Independent copies"

In [93]:
b_copy = a.copy()

In [94]:
print("A is equal to: ",a)
print("B is equal to: ",b_copy)

A is equal to:  [[0 2 8]
 [2 9 5]]
B is equal to:  [[0 2 8]
 [2 9 5]]


In [96]:
a[0,0] = -1
print("A is equal to: ",a)
print("B is equal to: ",b_copy)

A is equal to:  [[-1  2  8]
 [ 2  9  5]]
B is equal to:  [[0 2 8]
 [2 9 5]]


In [77]:
def mod_array(arr):
    b = arr.copy()
    b[0,0] = -25
    return b

In [78]:
mod_array(a)

array([[-25,   4,   9],
       [  5,   3,   5]])

In [79]:
a

array([[-1,  4,  9],
       [ 5,  3,  5]])

In [None]:
a = mod_array(a)

#### Arrays with 3+ dimensions

In [97]:
ab = np.random.randint(low=1, high=1000, size=(5,2,3))
ab

array([[[302, 272, 831],
        [524, 195, 614]],

       [[  3, 368, 725],
        [946, 970, 552]],

       [[214, 252, 445],
        [898,  17, 981]],

       [[691, 179, 733],
        [430,  19, 537]],

       [[726, 812, 982],
        [614, 800, 273]]])

In [98]:
ab[2]

array([[214, 252, 445],
       [898,  17, 981]])

#### Excercise

* Retrive:


[[476, 930, 781],
        [241, 851, 272]],

### Operations:

[function sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html?highlight=sum#numpy.sum): np.sum

[function multiply](https://numpy.org/doc/stable/reference/generated/numpy.multiply.html?highlight=multiply#numpy.multiply): np.multiply

[function power](https://numpy.org/doc/stable/reference/generated/numpy.power.html?highlight=power#numpy.power): np.power...


In [87]:
np.sum(ab)

16932

In [88]:
a

array([[-1,  4,  9],
       [ 5,  3,  5]])

In [89]:
b

array([[-1,  4,  9],
       [ 5,  3,  5]])

In [90]:
np.multiply(a,b)

array([[ 1, 16, 81],
       [25,  9, 25]])

In [91]:
np.power(a,b)

ValueError: ignored

In [92]:
np.power(2,3) # 2**3

8

In [93]:
a = np.array([[1,4,9],[5,3,5]])
b = np.array([[2,3,4],[5,6,7]])

In [94]:
np.power(a,b)

array([[    1,    64,  6561],
       [ 3125,   729, 78125]])

#### Aggregations, statistics and numerical: 

* np.max
* np.mean = $\mu=\frac{1}{n}\sum_{i}x_{i}$
* np.std = $\sqrt{\sum_{i}\frac{(x_{i}-\mu)^{2}}{n}}$
* np.sqrt = $\sqrt{x}$

In [95]:
# compute the standard deviation of this array, 
# first using np.std() and then without using this function
np.random.seed(123)
rand = np.random.random(10)
rand

array([0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897,
       0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752])

In [96]:
np.sqrt(rand)

array([0.83454729, 0.53491993, 0.47628925, 0.74250574, 0.84821517,
       0.65046634, 0.9903354 , 0.8275444 , 0.69349254, 0.62619288])

In [98]:
my_mean = np.mean(rand)
my_mean

0.544199352975335

In [99]:
rand-my_mean

array([ 0.15226983, -0.25806002, -0.3173479 ,  0.00711542,  0.17526962,
       -0.12109289,  0.43656485,  0.14063039, -0.06326745, -0.15208183])

In [100]:
(rand-my_mean)**2

array([2.31861019e-02, 6.65949729e-02, 1.00709689e-01, 5.06291464e-05,
       3.07194386e-02, 1.46634887e-02, 1.90588864e-01, 1.97769054e-02,
       4.00277042e-03, 2.31288845e-02])

In [101]:
np.power(rand-my_mean,2)

array([2.31861019e-02, 6.65949729e-02, 1.00709689e-01, 5.06291464e-05,
       3.07194386e-02, 1.46634887e-02, 1.90588864e-01, 1.97769054e-02,
       4.00277042e-03, 2.31288845e-02])

In [102]:
np.mean( np.power(rand-my_mean,2) )

0.04734217450052595

In [103]:
np.sqrt( np.mean( np.power(rand-my_mean,2) ))

0.21758256938579879

In [104]:
np.std(rand)

0.21758256938579879

#### Performance of numpy operations vs lists

In [105]:
from time import time

n = 10000000 # n = 10M

In [106]:
start_time = time()

list_of_numbers = []

for i in range(n):
    list_of_numbers.append(i**5)

end_time = time()

print("Lapsed time: ",end_time - start_time)
print(list_of_numbers[0:5])

Lapsed time:  4.922485113143921
[0, 1, 32, 243, 1024]


In [107]:
start_time = time()

array_of_numbers = np.arange(n)**5

end_time = time()

end_time - start_time

print("Lapsed time: ",end_time - start_time)
print(array_of_numbers[0:5])

Lapsed time:  0.07657647132873535
[   0    1   32  243 1024]


#### Concatenate

We can combine two arrays into a single one.

In [108]:
first = np.array([[1,2,3],[4,5,6]])
print("First array is: ", first)
print()
second = np.array([[0,0,0], [9,9,9]])
print("Second array is: ", second)

First array is:  [[1 2 3]
 [4 5 6]]

Second array is:  [[0 0 0]
 [9 9 9]]


In [109]:
np.concatenate([first, second])

array([[1, 2, 3],
       [4, 5, 6],
       [0, 0, 0],
       [9, 9, 9]])

In [110]:
np.concatenate([first, second], axis= 1)

array([[1, 2, 3, 0, 0, 0],
       [4, 5, 6, 9, 9, 9]])

#### Transpose

The transpose operation exchanges the rows and the columns of an array. In other words, rows become columns and columns become rows.

It can be a usefull operation later when you want to echange the role of columns and rows on a table.

In [111]:
a

array([[1, 4, 9],
       [5, 3, 5]])

In [112]:
a.T

array([[1, 5],
       [4, 3],
       [9, 5]])

In [113]:
a.transpose()

array([[1, 5],
       [4, 3],
       [9, 5]])

#### Splitting arrays

In [115]:
#hundred = np.array(range(1,101))
hundred = np.arange(1,101)
hundred

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100])

[function split](https://numpy.org/doc/stable/reference/generated/numpy.split.html?highlight=numpy%20split#numpy.split)

In [116]:
a, b, c = np.split(hundred, [30, 60])
print(a)
print()
print(b)
print()
print(c)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30]

[31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
 55 56 57 58 59 60]

[ 61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78
  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
  97  98  99 100]


### Bonus: broadcasting

In [117]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])

print("x is equal to: ",x)
print("v is equal to: ",v)

x is equal to:  [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
v is equal to:  [1 0 1]


In [119]:
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. 

Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing elementwise summation of x and vv. We could implement this approach like this:



In [120]:
vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
print(vv)                # Prints "[[1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]]"

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


In [121]:
x

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

Now both arrays can be added element-wise

In [122]:
y = x + vv  # Add x and vv elementwise
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


In [None]:
v

array([1, 0, 1])

In [123]:
y = x + v  # Add v to each row of x using broadcasting
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


[how broadcasting works](https://numpy.org/doc/stable/user/basics.broadcasting.html)

Some applications of broadcasting:

In [124]:
# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

print(np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [11]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:

print(x + v)

[[2 4 6]
 [5 7 9]]


In [12]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:

print((x.T + w).T)

[[ 5  6  7]
 [ 9 10 11]]


In [13]:
x

array([[1, 2, 3],
       [4, 5, 6]])

In [16]:
#x = c[0,0]

print("x", "\n", x)

#y = c[0,1]
print("y", "\n", y)

x 
 [[1 2 3]
 [4 5 6]]
y 
 [[2 4 6]
 [5 7 9]]


[function add](https://numpy.org/doc/stable/reference/generated/numpy.add.html?highlight=numpy%20add#numpy.add)

In [125]:
# Add elements of x and y together
print(np.add(x, y))

[[ 3  4  7]
 [ 9 10 13]
 [15 16 19]
 [21 22 25]]


[function substract](https://numpy.org/doc/stable/reference/generated/numpy.subtract.html?highlight=numpy%20subtract#numpy.subtract)

In [126]:
# Subtract elements of x from elements of y
print(np.subtract(y, x))

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


[function multiply](https://numpy.org/doc/stable/reference/generated/numpy.multiply.html?highlight=numpy%20multiply#numpy.multiply)

In [127]:
# Multiply elements of x and y together
print(np.multiply(x, y))

[[  2   4  12]
 [ 20  25  42]
 [ 56  64  90]
 [110 121 156]]


[function divide](https://numpy.org/doc/stable/reference/generated/numpy.multiply.html?highlight=numpy%20multiply#numpy.multiply)

In [128]:
# Divide elements of y by elements of x
print(np.divide(y, x))

[[2.         1.         1.33333333]
 [1.25       1.         1.16666667]
 [1.14285714 1.         1.11111111]
 [1.1        1.         1.08333333]]
