# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Task List 04: Numpy 

### Intro

Handling NumPy arrays can seem like a nightmare at first. But it's all about practice. Take a deep breath, and good luck!

In [3]:
import numpy as np

**01.** Using `np.array()` create
- A *vector* (i.e. one-dimensional array) of lenght 5
- A *matrix* (i.e. two-dimensional array) of shape 2x3

In [4]:
np.array([0, 1, 2, 3, 4])

array([0, 1, 2, 3, 4])

In [5]:
np.array([[1, 2, 3], [3, 2, 1]])

array([[1, 2, 3],
       [3, 2, 1]])

**02.** `.reshape()` array method is most commonly used to reshape vectors into matrices of desired (allowable) shape. Let's see it in action:

In [6]:
v_2 = np.ones(6)
print(v_2)
print(v_2.shape)

[1. 1. 1. 1. 1. 1.]
(6,)


`v_2` is a vector of ones of lenght 6. Run the cells below to see what happens when we reshape it.

In [7]:
v_2a = v_2.reshape((2, 3))
v_2a

array([[1., 1., 1.],
       [1., 1., 1.]])

In [8]:
v_2a.shape

(2, 3)

And what happens if we do this?

In [9]:
v_2b = v_2.reshape((3, 2))
v_2b

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [13]:
v_2b.shape

(3, 2)

And this? 

In [14]:
v_2.reshape((4, 2))

ValueError: cannot reshape array of size 6 into shape (4,2)

Why the error?

Now, create a vector of zeros of length 12 and reshape it to some matrix.

In [None]:
np.zeros(12).reshape(4, 3)

**03.** -1 is a very special number when used with reshape. Let's make a vector of ones of lenght 20:

In [2]:
v_3 = np.ones(20)
v_3

NameError: name 'np' is not defined

Now see what happens if we use `.reshape((5, -1))` on it:

In [None]:
v_3a = v_3.reshape((5, -1))
print(v_3a)
print(v_3a.shape)

What do you think will happen if we use `.reshape((-1, 10))` on `v_3`?

In [15]:
v_3.reshape(-1, 10)

NameError: name 'v_3' is not defined

Later on in the course, when implementing Machine Learning, you'll be sometimes required to reshape a vector into *row matrix* or *column matrix*. What are those? See for yourself, on the example of the following vector:

In [None]:
v = np.ones(5)
print(v)
print(v.shape)

- Row matrix

In [None]:
v_row = v.reshape(1, -1)
print(v_row)
print(v_row.shape)

- Column matrix

In [None]:
v_col = v.reshape(-1, 1)
print(v_col)
print(v_col.shape)

**04.** Let

In [None]:
x = np.array([-1, 0, 1])
y = np.array([1, 0, 0])

A = np.eye(3) #this creates a 3x3 identity matrix
B = np.diag((2, 1, -2)) #this creates a diagonal matrix with the given elements on the main diagonal

print('x = ', x)
print('-------------')
print('y = ', y)
print('-------------')
print('A = \n', A)
print('-------------')
print('B = \n', B)

Compute 

- $x\cdot y$ (the inner product)
- $Ay$
- $AB$
- $xBy$
- $B^TA$
- $AB - 4BA$

In [None]:
x@y

In [None]:
A@y

In [None]:
A@B

In [None]:
x@B@y

In [None]:
B.T@A

In [None]:
A@B - 4*B@A

Notice the outputs of the operations above? Depending on the operands involved, you can obtain either a vector, a matrix or a *scalar*, i.e. a number. Knowing what kind of output to expect for given inputs is very useful when doing *Linear Algebra*.

But why all this? Why Linear Algebra? Well, because

$$ {\rm No\ Linear\ Algebra} \Rightarrow {\rm No\ Machine\ Learning}. $$
a
But, don't worry - we aren't going to do some fancy Linear Algebra in this course. We'll just cover the most basic concepts, needed to understand the mathematics behind Machine Learning and utilizing ML in Python. And should you decide to venture into *Deep Learning* - area of ML dealing with *Neural Networks* (used for Computer Vision, Chatbots and Natural Language Understanding/Generation), be prepared to do Linear Algebra quite a bit:)

**05.** Using indexing and index slicing extract from matrix `A_5` the following:

- an element in its third row and second column
- an element in its last column and second row
- its first row
- its third column
- its last two columns
- its first and third row
- all its elements in the interesction of its last two rows and first two columns
- all its elements in the intesection of its two middle rows and two middle columns

In [4]:
A_5 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
A_5

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [None]:
A_5[2, 1]

In [None]:
A_5[1, -1]

In [None]:
A_5[0]

In [None]:
A_5[:, 2]

In [5]:
A_5[:, -2:]

array([[ 3,  4],
       [ 7,  8],
       [11, 12],
       [15, 16]])

In [6]:
A_5[[0, 2]]

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

In [9]:
A_5[-2:, :2]

array([[ 9, 10],
       [13, 14]])

In [7]:
A_5[1:3, 1:3]

array([[ 6,  7],
       [10, 11]])

**06.** Using conditional indexing, find all the elements of matrix `A_6` which are:

- strictly greater than 1
- not equal to zero
- strictly lesser than -1 or greater than 2
- in the [0, 3] segment (endpoints included) 

In [None]:
A_6 = np.array([[0, -1, 3, 4, 2], [0, 2, 0, -4, 1], [0, 0, -5, 2, 1], [5, -1, 2, 0, -2]])
A_6

In [None]:
A_6[A_6 > 1]

In [None]:
A_6[A_6 != 0]

In [None]:
A_6[(A_6 < -1) | (A_6 > 2)]

In [None]:
A_6[(0 <= A_6) & (A_6 <= 3)]

**07.** Generate a random 3x7 matrix of floats, and then select all its entries greater or equal than 0.5 .

In [None]:
A_7 = np.random.random(size=(3, 7))
A_7

In [None]:
A_7[A_7 >= .5]

**08.** `np.linspace()` creates a vector of equaly distant points between the two given points. Similar to this is `np.arange()` which produces a vector of equaly distant points between the two given points. Wait, what's the difference then? 

 - for `np.linspace()` you specify the <u>number of points</u> between the two given points
 - for `np.arange()` you specify the <u>stepsize</u> between the numbers in the interval between the given two points
 
To check this for yourself, do the following:

 - Make an array of 17 equally distributed points between 0 and 2
 - Make an array of points between 0 and 2 with stepsize 0.25

In [None]:
np.linspace(0, 2, 17)

In [None]:
np.arange(0, 2, .25)

Notice the difference? What happens to the endpoints of those two arrays?

**09.** Create the same array using both `np.linspace()` and `np.arange()`.

In [None]:
np.linspace(0, 2, 9)

In [None]:
np.arange(0, 2.25, .25)

c

In [44]:
v_10a = np.array([1, 2, 3])
v_10b = np.array([4, 5, 6])

In [None]:
np.concatenate((v_10a, v_10b), axis=0)

In [None]:
v_10a = v_10a.reshape(1, -1)
v_10b = v_10b.reshape(1, -1)

np.concatenate((v_10a, v_10b), axis=0)

Now, do the same for matrices `A_10a` and `A_10b`.

In [None]:
A_10a = np.array([[1, 2], [3, 4]])
A_10a

In [None]:
A_10b = np.array([[-3, -2], [1, 0]])
A_10b

In [None]:
np.concatenate((A_10a, A_10b), axis=1)

In [None]:
np.concatenate((A_10a, A_10b), axis=0)

What are the shapes of newly obtained matrices?

In [None]:
(2, 4)

In [None]:
(4, 2)

**11**. Given a 2x3 matrix `A_11` use

 - `np.vstack()` to expand it by two rows with some matrix `A_11a` you define
 - `np.hstack()` to expand it by two column with some matrix `A_11b` you define
 
*Hint: Watch for the shapes!*

In [None]:
A_11 = np.array([[1, 7, 1], [7, 1, 7]])
A_11

In [None]:
A_11a = np.array([[1, 0, 0], [0, 0, -1]])
A_11a

In [None]:
np.vstack((A_11, A_11a))

In [None]:
A_11b = np.array([[1, 0], [0, -1]])
A_11b

In [None]:
np.hstack((A_11, A_11b))

**12.** Find a way to turn matrix `A_12` into a vector containing its elements. 

*Hint: Try listening to Bolero.*

In [None]:
A_12 = np.array([[4, 7], [7, 4], [4, 7]])
A_12

In [None]:
A_12.ravel()

**13.** For the matrix `A_13` find 

- mean of all of its elements
- mean of all of its elements in the second column
- mean of all of its elements for every row

In [3]:
import numpy as np
A_13 = np.array([[1, 2, 3, 4], [-1, -2, -3, -4], [.1, .2, .3, .4]])
A_13

array([[ 1. ,  2. ,  3. ,  4. ],
       [-1. , -2. , -3. , -4. ],
       [ 0.1,  0.2,  0.3,  0.4]])

In [7]:
A_13.mean()

0.08333333333333333

In [4]:
A_13[:, 1].mean()

0.06666666666666667

In [5]:
np.mean(A_13, axis=1)

array([ 2.5 , -2.5 ,  0.25])

**14.** For matrix `A_14` find

- standard deviation of all of its elements
- standard deviation of all of its elements for every column

In [None]:
A_14 = np.array([[1, 2, np.nan, 4], [-1, np.nan, np.nan, -4], [np.nan, .2, .3, .4]])
A_14

In [None]:
np.nanstd(A_14)

In [None]:
np.nanstd(A_14, axis=0)

**15.** `np.argmin()` and `np.argmax()` are two very useful functions for finding an <u>index (position)</u> of the minimal/maximal value of an array. Given matrix `A_15` find

- position of its biggest element
- indices of every smallest element from its columns
- position of its biggest negative element

In [None]:
A_15 = np.array([[1, -1, 3, 4], [2, -6, -3, 0], [7, -4, 5, -5], [8, 9, 6, -7]])
A_15

In [None]:
np.argmax(A_15)

In [None]:
np.argmin(A_15, axis=0)

In [None]:
np.argmax(A_15 == np.max(A_15[A_15 < 0]))

**16.** `np.where()` is a function which can change elements of the array based on some condition, in a single line of code. For example, given a matrix `A_16`

In [None]:
A_16 = np.copy(A_15)
A_16

In [None]:
np.where(A_16 > 0, A_16**2, 2*A_16)

squares all its positive values and doubles all its negative values.

Now, using `np.where()` divide all even values of matrix A_16 by two, while leaving odd values the same. 

In [None]:
np.where(A_16 % 2 == 0, A_16//2, A_16)