<a href="https://colab.research.google.com/github/Krithika-Devi/training/blob/main/01_Python_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The first thing we want to do is import numpy.

In [1]:
import numpy as np

Let us first define a Python list containing the ages of 6 people.

In [2]:
ages_list = [10, 5, 8, 32, 65, 43]
print(ages_list)

[10, 5, 8, 32, 65, 43]


There are 3 main ways to instantiate a Numpy ndarray object. One of these is to use `np.array(<collection>)`

In [3]:
ages = np.array(ages_list)
print(type(ages))
print(ages)

<class 'numpy.ndarray'>
[10  5  8 32 65 43]


In [4]:
print(ages)
print("Size:\t" , ages.size)
print("Shape:\t", ages.shape)

[10  5  8 32 65 43]
Size:	 6
Shape:	 (6,)


In [5]:
zeroArr = np.zeros(5)
print(zeroArr)

[0. 0. 0. 0. 0.]


### Multi-dim

Now let us define a new list containing the weights of these 6 people.

In [6]:
weight_list = [32, 18, 26, 60, 55, 65]

Now, we define an ndarray containing all fo this information, and again print the size and shape of the array.

In [7]:
people = np.array([ages_list, weight_list])

print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10  5  8 32 65 43]
 [32 18 26 60 55 65]]
Size:	 12
Shape:	 (2, 6)


In [8]:
# 2 X 6 = 1 X 12 = 4 X 3 = 3 X 4 = 12 X 1 

people = people.reshape(12,1)
print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10]
 [ 5]
 [ 8]
 [32]
 [65]
 [43]
 [32]
 [18]
 [26]
 [60]
 [55]
 [65]]
Size:	 12
Shape:	 (12, 1)


###### Note: The new shape must be the same "size" as the old shape

### Exercise

* Generate a 1D numpy array with the values [7, 9, 65, 33, 85, 99]

* Generate a matrix (2D numpy array) of the values:

\begin{align}
  \mathbf{A} =
  \begin{pmatrix}
    1 & 2 & 4 \\
    2 & 3 & 0 \\
    0 & 5 & 1
  \end{pmatrix}
\end{align}

* Change the dimensions of this array to another permitted shape

## Array Generation

Instead of defining an array manually, we can ask numpy to do it for us.

The `np.arange()` method creates a range of numbers with user defined steps between each.

In [9]:
five_times_table = np.arange(0, 55, 5)
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

The `np.linspace()` method will produce a range of evenly spaced values, starting, ending, and taking as many steps as you specify.

In [10]:
five_spaced = np.linspace(0,50,11)
print(five_spaced)

[ 0.  5. 10. 15. 20. 25. 30. 35. 40. 45. 50.]


The `.repeat()` method will repeat an object you pas a specified number of times.

In [11]:
twoArr = np.repeat(2, 10)
print(twoArr)

[2 2 2 2 2 2 2 2 2 2]


The `np.eye()` functions will create an identity matrix/array for us.

In [12]:
identity_matrix = np.eye(6)
print(identity_matrix)

[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]


# Operations

There are many, many operations which we can perform on arrays. Below, we demonstrate a few.

What is happening in each line?

In [13]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [14]:
print("1:", 2 * five_times_table)
print("2:", 10 + five_times_table)
print("3:", five_times_table - 1)
print("4:", five_times_table/5)
print("5:", five_times_table **2)
print("6:", five_times_table < 20)

1: [  0  10  20  30  40  50  60  70  80  90 100]
2: [10 15 20 25 30 35 40 45 50 55 60]
3: [-1  4  9 14 19 24 29 34 39 44 49]
4: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
5: [   0   25  100  225  400  625  900 1225 1600 2025 2500]
6: [ True  True  True  True False False False False False False False]


### Speed Test

If we compare the speed at which we can do these operations compared to core python, we will notice a substantial difference.

In [15]:
fives_list = list(range(0,5001,5))
fives_list

[0,
 5,
 10,
 15,
 20,
 25,
 30,
 35,
 40,
 45,
 50,
 55,
 60,
 65,
 70,
 75,
 80,
 85,
 90,
 95,
 100,
 105,
 110,
 115,
 120,
 125,
 130,
 135,
 140,
 145,
 150,
 155,
 160,
 165,
 170,
 175,
 180,
 185,
 190,
 195,
 200,
 205,
 210,
 215,
 220,
 225,
 230,
 235,
 240,
 245,
 250,
 255,
 260,
 265,
 270,
 275,
 280,
 285,
 290,
 295,
 300,
 305,
 310,
 315,
 320,
 325,
 330,
 335,
 340,
 345,
 350,
 355,
 360,
 365,
 370,
 375,
 380,
 385,
 390,
 395,
 400,
 405,
 410,
 415,
 420,
 425,
 430,
 435,
 440,
 445,
 450,
 455,
 460,
 465,
 470,
 475,
 480,
 485,
 490,
 495,
 500,
 505,
 510,
 515,
 520,
 525,
 530,
 535,
 540,
 545,
 550,
 555,
 560,
 565,
 570,
 575,
 580,
 585,
 590,
 595,
 600,
 605,
 610,
 615,
 620,
 625,
 630,
 635,
 640,
 645,
 650,
 655,
 660,
 665,
 670,
 675,
 680,
 685,
 690,
 695,
 700,
 705,
 710,
 715,
 720,
 725,
 730,
 735,
 740,
 745,
 750,
 755,
 760,
 765,
 770,
 775,
 780,
 785,
 790,
 795,
 800,
 805,
 810,
 815,
 820,
 825,
 830,
 835,
 840,
 845,
 8

In [16]:
five_times_table_lge = np.arange(0,5001,5)
five_times_table_lge

array([   0,    5,   10, ..., 4990, 4995, 5000])

In [17]:
%timeit five_times_table_lge + 5

1.45 µs ± 53.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [18]:
%timeit [e + 5 for e in fives_list]

65 µs ± 3.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Boolean string operations can also be performed on ndarrays.

In [19]:
words = np.array(["ten", "nine", "eight", "seven", "six"])

print(np.isin(words, 'e'))

print("e" in words)
["e" in word for word in words]

[False False False False False]
False


[True, True, True, True, False]

# Transpose

In [20]:
people.shape = (2, 6)
print(people, "\n")
print(people.T)

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]] 

[[10 32]
 [ 5 18]
 [ 8 26]
 [32 60]
 [65 55]
 [43 65]]


# Data Types

As previously mentioned, ndarrays can only have one data type. If we want to obtain or change this, we use the `.dtype` attribute.

In [21]:
people.dtype

dtype('int64')

What is the data type of the below ndarray?

In [22]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
ages_with_strings

array(['10', '5', '8', '32', '65', '43'], dtype='<U21')

In [23]:
ages_with_strings.dtype.type

numpy.str_

What is the dtype of this array?

In [24]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'], dtype='int32')
ages_with_strings

array([10,  5,  8, 32, 65, 43], dtype=int32)

In [25]:
ages_with_strings.dtype.type

numpy.int32

What do you think has happened here?

In [26]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)

['10' '5' '8' '32' '65' '43']


In [27]:
ages_with_strings.dtype = 'int32'
print(ages_with_strings)

[49 48  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 53  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 56  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 51 50  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0 54 53  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0 52 51  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0]


In [28]:
ages_with_strings.size

126

In [29]:
ages_with_strings.size/21

6.0

In [30]:
np.array([10, 5, 8, '32', '65', '43']).size

6

The correct way to have changed the data type of the ndarray would have been to use the `.astype()` method, demonstrated below.

In [31]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)
print(ages_with_strings.astype('int32'))

['10' '5' '8' '32' '65' '43']
[10  5  8 32 65 43]


### Exercise

* #### Create an array of string numbers, but use dtype to make it an array of floats.
* #### Transpose the matrix, printing the new size and shape.
* #### Use the .astype() method to convert the array to boolean.

In [32]:
string_nums = np.array([['1','0','0','1','1']],dtype = 'float32')
string_nums 

array([[1., 0., 0., 1., 1.]], dtype=float32)

In [33]:
string_nums.shape

(1, 5)

In [34]:
string_nums = string_nums.transpose() #string_nums.T
string_nums

array([[1.],
       [0.],
       [0.],
       [1.],
       [1.]], dtype=float32)

In [35]:
string_nums.shape

(5, 1)

In [36]:
string_nums.size

5

In [37]:
print(string_nums.astype('bool'))

[[ True]
 [False]
 [False]
 [ True]
 [ True]]


## Array Slicing Operations

As before, we can use square brackets and indices to access individual values, and the colon operator to slice the array.

In [38]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [39]:
five_times_table[0]

0

In [40]:
five_times_table[-1]

50

In [41]:
five_times_table[:4]

array([ 0,  5, 10, 15])

In [42]:
five_times_table[4:]

array([20, 25, 30, 35, 40, 45, 50])

We can also slice an n-dim ndarray., specifying the slice operation accross each axis.

In [43]:
print(people)
people[:3, :3]

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]]


array([[10,  5,  8],
       [32, 18, 26]])

### Exercise

* Create a numpy array with 50 zeros
* Create a np array of 2 repeated 20 times
* Create a numpy array from 0 to 2 $\pi$ in steps of 0.1

For one of the arrays generated:
* Get the first five values
* Get the last 3 values
* Get the 4th value to the 7th value

In [44]:
zeros = np.zeros(50)
zeros

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [45]:
repeat = np.repeat(2,20)
repeat

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [46]:
ar = np.arange(0,2*3.14,0.1)
ar

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1,
       5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2])

In [47]:
ar[0:5]

array([0. , 0.1, 0.2, 0.3, 0.4])

In [48]:
ar[-3:]

array([6. , 6.1, 6.2])

In [49]:
ar[3:7]

array([0.3, 0.4, 0.5, 0.6])

We can reverse an array by using `.flip()` or by using the `::` operator.

In [50]:
reverse_five_times_table = np.flip(five_times_table)
reverse_five_times_table

array([50, 45, 40, 35, 30, 25, 20, 15, 10,  5,  0])

In [51]:
reverse_five_times_table = five_times_table[-1::-1]
print(reverse_five_times_table)
five_times_table

[50 45 40 35 30 25 20 15 10  5  0]


array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

We can also use the `::` operator to select steps of the original array.

In [52]:
five_times_table[0::3] #Every 3rd element starting from 0

array([ 0, 15, 30, 45])

### Exercise
Take one of the arrays you defined and
* #### Reverse it
* #### Only keep every 4th element.
* #### Get every 2nd element, starting from the last and moving backwards.

In [53]:
ar

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1,
       5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2])

In [54]:
reverse_ar = np.flip(ar)
reverse_ar

array([6.2, 6.1, 6. , 5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.1, 5. ,
       4.9, 4.8, 4.7, 4.6, 4.5, 4.4, 4.3, 4.2, 4.1, 4. , 3.9, 3.8, 3.7,
       3.6, 3.5, 3.4, 3.3, 3.2, 3.1, 3. , 2.9, 2.8, 2.7, 2.6, 2.5, 2.4,
       2.3, 2.2, 2.1, 2. , 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1,
       1. , 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0. ])

In [55]:
reverse_ar1 = ar[-1::-1]
reverse_ar1

array([6.2, 6.1, 6. , 5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.1, 5. ,
       4.9, 4.8, 4.7, 4.6, 4.5, 4.4, 4.3, 4.2, 4.1, 4. , 3.9, 3.8, 3.7,
       3.6, 3.5, 3.4, 3.3, 3.2, 3.1, 3. , 2.9, 2.8, 2.7, 2.6, 2.5, 2.4,
       2.3, 2.2, 2.1, 2. , 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1,
       1. , 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0. ])

In [56]:
ar[3::4]

array([0.3, 0.7, 1.1, 1.5, 1.9, 2.3, 2.7, 3.1, 3.5, 3.9, 4.3, 4.7, 5.1,
       5.5, 5.9])

In [57]:
ar[-1::-2]

array([6.2, 6. , 5.8, 5.6, 5.4, 5.2, 5. , 4.8, 4.6, 4.4, 4.2, 4. , 3.8,
       3.6, 3.4, 3.2, 3. , 2.8, 2.6, 2.4, 2.2, 2. , 1.8, 1.6, 1.4, 1.2,
       1. , 0.8, 0.6, 0.4, 0.2, 0. ])

# Stats

In [58]:
np.array([1.65432, 5.98765]).round(2)

array([1.65, 5.99])

In [59]:
nums = np.arange(0, 4, 0.2555)

In [60]:
nums

array([0.    , 0.2555, 0.511 , 0.7665, 1.022 , 1.2775, 1.533 , 1.7885,
       2.044 , 2.2995, 2.555 , 2.8105, 3.066 , 3.3215, 3.577 , 3.8325])

### Exercise

* Compute min, max, sum, mean, median, variance, and standard deviation of the above array, all to to 2 decimal places.

In [61]:
print("min = ", np.min(nums).round(2))
print("max = ", np.max(nums).round(2))
print("sum = ", np.sum(nums).round(2))
print("mean = ", np.mean(nums).round(2))
print("median = ", np.median(nums).round(2))
print("var = ", np.var(nums).round(2))
print("std = ", np.std(nums).round(2))

min =  0.0
max =  3.83
sum =  30.66
mean =  1.92
median =  1.92
var =  1.39
std =  1.18


## Random

With `np.random`, we can generate a number of types of dataset, and create training data.

The below code simulates a fair coin toss.

In [62]:
flip = np.random.choice([0,1], 10)
flip

array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0])

In [63]:
np.random.rand(10,20,9)

array([[[0.09525566, 0.15027138, 0.30575246, ..., 0.30972408,
         0.78015893, 0.98398594],
        [0.81466581, 0.64723608, 0.05916166, ..., 0.34819299,
         0.80027342, 0.19414103],
        [0.42958795, 0.23289644, 0.25121748, ..., 0.83435256,
         0.41718104, 0.40856997],
        ...,
        [0.21996249, 0.76282604, 0.03119874, ..., 0.82067411,
         0.38369708, 0.18058687],
        [0.06526141, 0.38375887, 0.60891985, ..., 0.05601261,
         0.62322527, 0.99529656],
        [0.67813492, 0.22753471, 0.6829043 , ..., 0.00950316,
         0.58869559, 0.03521564]],

       [[0.76838383, 0.04500277, 0.18737813, ..., 0.68826583,
         0.35431177, 0.84329091],
        [0.40876012, 0.72279098, 0.89815407, ..., 0.55917064,
         0.35646642, 0.05770123],
        [0.11601916, 0.10026003, 0.66917809, ..., 0.37682868,
         0.98114696, 0.62646666],
        ...,
        [0.38780255, 0.96316196, 0.31658811, ..., 0.8721944 ,
         0.90628284, 0.46712835],
        [0.1

We can produce 1000 datapoints of a normally distributed data set by using `np.random.normal()`

In [64]:
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
s

array([-0.08229322,  0.09336241, -0.06514644, -0.11364737, -0.00879319,
        0.18788662, -0.08297548, -0.01702394,  0.02948496,  0.03985549,
       -0.07664054, -0.1053703 , -0.09825329,  0.1180397 ,  0.12183613,
        0.02161194, -0.00513592,  0.02677276, -0.13668426, -0.04642751,
       -0.12830654, -0.0789017 ,  0.18365566, -0.03866692,  0.00341949,
        0.06234984, -0.08685467,  0.10065792, -0.09568919,  0.08907208,
       -0.09926038,  0.1086324 ,  0.05340553,  0.0194219 , -0.06206857,
       -0.1342091 , -0.10120026,  0.13881769,  0.05186812, -0.04640725,
        0.03176628,  0.06207936, -0.02216986, -0.05804632, -0.23122891,
       -0.0134155 , -0.01696118,  0.26863898, -0.02387443,  0.20822274,
       -0.09275854,  0.1652616 , -0.02470249, -0.17817725,  0.04450773,
       -0.102397  ,  0.00585928, -0.07528534,  0.08129542, -0.06951615,
        0.13757701, -0.04958699,  0.05715187,  0.01994124,  0.0323319 ,
       -0.12989689, -0.15793943, -0.02125814, -0.14196333,  0.04

### Exercise
* Simulate a six-sided dice using numpy.random.choice(), generate a list of values you would obtain from 10 throws.
* Simulate a two-sided coin toss that is NOT fair: it is twice as likely to have head than tails.


In [65]:
dice_throw = np.random.choice([1,2,3,4,5,6],10)
dice_throw

array([5, 1, 3, 4, 6, 2, 1, 4, 2, 1])

In [66]:
coin_throw = np.random.choice(['Head','Head','Tail'],10)
coin_throw

array(['Head', 'Head', 'Head', 'Head', 'Head', 'Head', 'Head', 'Tail',
       'Tail', 'Head'], dtype='<U4')