# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Task List 04: Numpy 

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

![](../img/IntroRDataScience_NonTech-1.jpg)

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

![](../img/DK_Logo_100.png)

### Intro

Handling NumPy arrays can seem like a nightmare at first. But it's all about practice. Take a deep breath, and good luck!

In [1]:
import numpy as np

**01.** Using `np.array()` create
- A *vector* (i.e. one-dimensional array) of lenght 5
- A *matrix* (i.e. two-dimensional array) of shape 2x3

In [2]:
np.array([0, 1, 2, 3, 4])

array([0, 1, 2, 3, 4])

In [3]:
np.array([[1, 2, 3], [3, 2, 1]])

array([[1, 2, 3],
       [3, 2, 1]])

**02.** `.reshape()` array method is most commonly used to reshape vectors into matrices of desired (allowable) shape. Let's see it in action:

In [4]:
v_2 = np.ones(6)
print(v_2)
print(v_2.shape)

[1. 1. 1. 1. 1. 1.]
(6,)


`v_2` is a vector of ones of lenght 6. Run the cells below to see what happens when we reshape it.

In [5]:
v_2a = v_2.reshape((2, 3))
v_2a

array([[1., 1., 1.],
       [1., 1., 1.]])

In [6]:
v_2a.shape

(2, 3)

And what happens if we do this?

In [7]:
v_2b = v_2.reshape((3, 2))
v_2b

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [8]:
v_2b.shape

(3, 2)

And this? 

In [9]:
v_2.reshape((4, 2))

ValueError: cannot reshape array of size 6 into shape (4,2)

Why the error?

Now, create a vector of zeros of length 12 and reshape it to some matrix.

In [10]:
np.zeros(12).reshape(4, 3)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

**03.** -1 is a very special number when used with reshape. Let's make a vector of ones of lenght 20:

In [11]:
v_3 = np.ones(20)
v_3

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

Now see what happens if we use `.reshape((5, -1))` on it:

In [12]:
v_3a = v_3.reshape((5, -1))
print(v_3a)
print(v_3a.shape)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
(5, 4)


What do you think will happen if we use `.reshape((-1, 10))` on `v_3`?

In [13]:
v_3.reshape(-1, 10)

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

Later on in the course, when implementing Machine Learning, you'll be sometimes required to reshape a vector into *row matrix* or *column matrix*. What are those? See for yourself, on the example of the following vector:

In [14]:
v = np.ones(5)
print(v)
print(v.shape)

[1. 1. 1. 1. 1.]
(5,)


- Row matrix

In [15]:
v_row = v.reshape(1, -1)
print(v_row)
print(v_row.shape)

[[1. 1. 1. 1. 1.]]
(1, 5)


- Column matrix

In [16]:
v_col = v.reshape(-1, 1)
print(v_col)
print(v_col.shape)

[[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
(5, 1)


**04.** Let

In [17]:
x = np.array([-1, 0, 1])
y = np.array([1, 0, 0])

A = np.eye(3) #this creates a 3x3 identity matrix
B = np.diag((2, 1, -2)) #this creates a diagonal matrix with the given elements on the main diagonal

print('x = ', x)
print('-------------')
print('y = ', y)
print('-------------')
print('A = \n', A)
print('-------------')
print('B = \n', B)

x =  [-1  0  1]
-------------
y =  [1 0 0]
-------------
A = 
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
-------------
B = 
 [[ 2  0  0]
 [ 0  1  0]
 [ 0  0 -2]]


Compute 

- $x\cdot y$ (the inner product)
- $Ay$
- $AB$
- $xBy$
- $B^TA$
- $AB - 4BA$

In [18]:
x@y

-1

In [19]:
A@y

array([1., 0., 0.])

In [20]:
A@B

array([[ 2.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0., -2.]])

In [21]:
x@B@y

-2

In [22]:
B.T@A

array([[ 2.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0., -2.]])

In [23]:
A@B - 4*B@A

array([[-6.,  0.,  0.],
       [ 0., -3.,  0.],
       [ 0.,  0.,  6.]])

Notice the outputs of the operations above? Depending on the operands involved, you can obtain either a vector, a matrix or a *scalar*, i.e. a number. Knowing what kind of output to expect for given inputs is very useful when doing *Linear Algebra*.

But why all this? Why Linear Algebra? Well, because

$$ {\rm No\ Linear\ Algebra} \Rightarrow {\rm No\ Machine\ Learning}. $$

But, don't worry - we aren't going to do some fancy Linear Algebra in this course. We'll just cover the most basic concepts, needed to understand the mathematics behind Machine Learning and utilizing ML in Python. And should you decide to venture into *Deep Learning* - area of ML dealing with *Neural Networks* (used for Computer Vision, Chatbots and Natural Language Understanding/Generation), be prepared to do Linear Algebra quite a bit:)

**05.** Using indexing and index slicing extract from matrix `A_5` the following:

- an element in its third row and second column
- an element in its last column and second row
- its first row
- its third column
- its last two columns
- its first and third row
- all its elements in the interesction of its last two rows and first two columns
- all its elements in the intesection of its two middle rows and two middle columns

In [24]:
A_5 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
A_5

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [25]:
A_5[2, 1]

10

In [26]:
A_5[1, -1]

8

In [27]:
A_5[0]

array([1, 2, 3, 4])

In [28]:
A_5[:, 2]

array([ 3,  7, 11, 15])

In [29]:
A_5[:, -2:]

array([[ 3,  4],
       [ 7,  8],
       [11, 12],
       [15, 16]])

In [30]:
A_5[[0, 2]]

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

In [31]:
A_5[-2:, :2]

array([[ 9, 10],
       [13, 14]])

In [32]:
A_5[1:3, 1:3]

array([[ 6,  7],
       [10, 11]])

**06.** Using conditional indexing, find all the elements of matrix `A_6` which are:

- strictly greater than 1
- not equal to zero
- strictly lesser than -1 or greater than 2
- in the [0, 3] segment (endpoints included) 

In [33]:
A_6 = np.array([[0, -1, 3, 4, 2], [0, 2, 0, -4, 1], [0, 0, -5, 2, 1], [5, -1, 2, 0, -2]])
A_6

array([[ 0, -1,  3,  4,  2],
       [ 0,  2,  0, -4,  1],
       [ 0,  0, -5,  2,  1],
       [ 5, -1,  2,  0, -2]])

In [34]:
A_6[A_6 > 1]

array([3, 4, 2, 2, 2, 5, 2])

In [35]:
A_6[A_6 != 0]

array([-1,  3,  4,  2,  2, -4,  1, -5,  2,  1,  5, -1,  2, -2])

In [36]:
A_6[(A_6 < -1) | (A_6 > 2)]

array([ 3,  4, -4, -5,  5, -2])

In [37]:
A_6[(0 <= A_6) & (A_6 <= 3)]

array([0, 3, 2, 0, 2, 0, 1, 0, 0, 2, 1, 2, 0])

**07.** Generate a random 3x7 matrix of floats, and then select all its entries greater or equal than 0.5 .

In [38]:
A_7 = np.random.random(size=(3, 7))
A_7

array([[0.32218655, 0.42254094, 0.93416154, 0.49415356, 0.70406788,
        0.61466864, 0.11818428],
       [0.0835413 , 0.60577599, 0.58774302, 0.36734695, 0.83716114,
        0.19979126, 0.55713137],
       [0.52032081, 0.92721543, 0.99438839, 0.56422673, 0.94694738,
        0.72339524, 0.9699269 ]])

In [39]:
A_7[A_7 >= .5]

array([0.93416154, 0.70406788, 0.61466864, 0.60577599, 0.58774302,
       0.83716114, 0.55713137, 0.52032081, 0.92721543, 0.99438839,
       0.56422673, 0.94694738, 0.72339524, 0.9699269 ])

**08.** `np.linspace()` creates a vector of equaly distant points between the two given points. Similar to this is `np.arange()` which produces a vector of equaly distant points between the two given points. Wait, what's the difference then? 

 - for `np.linspace()` you specify the <u>number of points</u> between the two given points
 - for `np.arange()` you specify the <u>stepsize</u> between the numbers in the interval between the given two points
 
To check this for yourself, do the following:

 - Make an array of 17 equally distributed points between 0 and 2
 - Make an array of points between 0 and 2 with stepsize 0.25

In [40]:
np.linspace(0, 2, 17)

array([0.   , 0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ,
       1.125, 1.25 , 1.375, 1.5  , 1.625, 1.75 , 1.875, 2.   ])

In [41]:
np.arange(0, 2, .25)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75])

Notice the difference? What happens to the endpoints of those two arrays?

**09.** Create the same array using both `np.linspace()` and `np.arange()`.

In [42]:
np.linspace(0, 2, 9)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [43]:
np.arange(0, 2.25, .25)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

**10.** `np.concatenate()` let's you stack two (or more) vectors/matrices next to one antoher or on top of each other. Given to vectors `v_10a` and `v_10b` use it to:
   - produce a new vector by stacking them one next to another (horizontal stacking)
   - produce a new matrix by stacking them on top of each other (vertical stacking) 

In [44]:
v_10a = np.array([1, 2, 3])
v_10b = np.array([4, 5, 6])

In [45]:
np.concatenate((v_10a, v_10b), axis=0)

array([1, 2, 3, 4, 5, 6])

In [46]:
v_10a = v_10a.reshape(1, -1)
v_10b = v_10b.reshape(1, -1)

np.concatenate((v_10a, v_10b), axis=0)

array([[1, 2, 3],
       [4, 5, 6]])

Now, do the same for matrices `A_10a` and `A_10b`.

In [47]:
A_10a = np.array([[1, 2], [3, 4]])
A_10a

array([[1, 2],
       [3, 4]])

In [48]:
A_10b = np.array([[-3, -2], [1, 0]])
A_10b

array([[-3, -2],
       [ 1,  0]])

In [49]:
np.concatenate((A_10a, A_10b), axis=1)

array([[ 1,  2, -3, -2],
       [ 3,  4,  1,  0]])

In [50]:
np.concatenate((A_10a, A_10b), axis=0)

array([[ 1,  2],
       [ 3,  4],
       [-3, -2],
       [ 1,  0]])

What are the shapes of newly obtained matrices?

In [51]:
(2, 4)

(2, 4)

In [52]:
(4, 2)

(4, 2)

**11**. Given a 2x3 matrix `A_11` use

 - `np.vstack()` to expand it by two rows with some matrix `A_11a` you define
 - `np.hstack()` to expand it by two column with some matrix `A_11b` you define
 
*Hint: Watch for the shapes!*

In [53]:
A_11 = np.array([[1, 7, 1], [7, 1, 7]])
A_11

array([[1, 7, 1],
       [7, 1, 7]])

In [54]:
A_11a = np.array([[1, 0, 0], [0, 0, -1]])
A_11a

array([[ 1,  0,  0],
       [ 0,  0, -1]])

In [55]:
np.vstack((A_11, A_11a))

array([[ 1,  7,  1],
       [ 7,  1,  7],
       [ 1,  0,  0],
       [ 0,  0, -1]])

In [56]:
A_11b = np.array([[1, 0], [0, -1]])
A_11b

array([[ 1,  0],
       [ 0, -1]])

In [57]:
np.hstack((A_11, A_11b))

array([[ 1,  7,  1,  1,  0],
       [ 7,  1,  7,  0, -1]])

**12.** Find a way to turn matrix `A_12` into a vector containing its elements. 

*Hint: Try listening to Bolero.*

In [58]:
A_12 = np.array([[4, 7], [7, 4], [4, 7]])
A_12

array([[4, 7],
       [7, 4],
       [4, 7]])

In [59]:
A_12.ravel()

array([4, 7, 7, 4, 4, 7])

**13.** For the matrix `A_13` find 

- mean of all of its elements
- mean of all of its elements in the second column
- mean of all of its elements for every row

In [60]:
A_13 = np.array([[1, 2, 3, 4], [-1, -2, -3, -4], [.1, .2, .3, .4]])
A_13

array([[ 1. ,  2. ,  3. ,  4. ],
       [-1. , -2. , -3. , -4. ],
       [ 0.1,  0.2,  0.3,  0.4]])

In [61]:
A_13.mean()

0.08333333333333333

In [62]:
A_13[:, 1].mean()

0.06666666666666667

In [63]:
np.mean(A_13, axis=1)

array([ 2.5 , -2.5 ,  0.25])

**14.** For matrix `A_14` find

- standard deviation of all of its elements
- standard deviation of all of its elements for every column

In [64]:
A_14 = np.array([[1, 2, np.nan, 4], [-1, np.nan, np.nan, -4], [np.nan, .2, .3, .4]])
A_14

array([[ 1. ,  2. ,  nan,  4. ],
       [-1. ,  nan,  nan, -4. ],
       [ nan,  0.2,  0.3,  0.4]])

In [65]:
np.nanstd(A_14)

2.1575086905966336

In [66]:
np.nanstd(A_14, axis=0)

array([1.        , 0.9       , 0.        , 3.27142511])

**15.** `np.argmin()` and `np.argmax()` are two very useful functions for finding an <u>index (position)</u> of the minimal/maximal value of an array. Given matrix `A_15` find

- position of its biggest element
- indices of every smallest element from its columns
- position of its biggest negative element

In [67]:
A_15 = np.array([[1, -1, 3, 4], [2, -6, -3, 0], [7, -4, 5, -5], [8, 9, 6, -7]])
A_15

array([[ 1, -1,  3,  4],
       [ 2, -6, -3,  0],
       [ 7, -4,  5, -5],
       [ 8,  9,  6, -7]])

In [68]:
np.argmax(A_15)

13

In [69]:
np.argmin(A_15, axis=0)

array([0, 1, 1, 3], dtype=int64)

In [70]:
np.argmax(A_15 == np.max(A_15[A_15 < 0]))

1

**16.** `np.where()` is a function which can change elements of the array based on some condition, in a single line of code. For example, given a matrix `A_16`

In [71]:
A_16 = np.copy(A_15)
A_16

array([[ 1, -1,  3,  4],
       [ 2, -6, -3,  0],
       [ 7, -4,  5, -5],
       [ 8,  9,  6, -7]])

In [72]:
np.where(A_16 > 0, A_16**2, 2*A_16)

array([[  1,  -2,   9,  16],
       [  4, -12,  -6,   0],
       [ 49,  -8,  25, -10],
       [ 64,  81,  36, -14]])

squares all its positive values and doubles all its negative values.

Now, using `np.where()` divide all even values of matrix A_16 by two, while leaving odd values the same. 

In [73]:
np.where(A_16 % 2 == 0, A_16//2, A_16)

array([[ 1, -1,  3,  2],
       [ 1, -3, -3,  0],
       [ 7, -2,  5, -5],
       [ 4,  9,  3, -7]])

DataKolektiv, 2022/23.

[hello@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com)

![](../img/DK_Logo_100.png)

<font size=1>License: [GPLv3](https://www.gnu.org/licenses/gpl-3.0.txt) This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.</font>