<a href="https://colab.research.google.com/github/JD950/Python_notebook/blob/main/Python_Numpy_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The first thing we want to do is import numpy.

In [1]:
import numpy as np

Let us first define a Python list containing the ages of 6 people.

In [2]:
ages_list = [10, 5, 8, 32, 65, 43]
print(ages_list)

[10, 5, 8, 32, 65, 43]


There are 3 main ways to instantiate a Numpy ndarray object. One of these is to use `np.array(<collection>)`

In [3]:
ages = np.array(ages_list)
print(type(ages))
print(ages)

<class 'numpy.ndarray'>
[10  5  8 32 65 43]


In [4]:
print(ages)
print("Size:\t" , ages.size)
print("Shape:\t", ages.shape)

[10  5  8 32 65 43]
Size:	 6
Shape:	 (6,)


In [5]:
zeroArr = np.zeros(5)
print(zeroArr)

[0. 0. 0. 0. 0.]


### Multi-dim

Now let us define a new list containing the weights of these 6 people.

In [6]:
weight_list = [32, 18, 26, 60, 55, 65]

Now, we define an ndarray containing all fo this information, and again print the size and shape of the array.

In [7]:
people = np.array([ages_list, weight_list])

print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10  5  8 32 65 43]
 [32 18 26 60 55 65]]
Size:	 12
Shape:	 (2, 6)


In [8]:
people = people.reshape(12,1)
print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10]
 [ 5]
 [ 8]
 [32]
 [65]
 [43]
 [32]
 [18]
 [26]
 [60]
 [55]
 [65]]
Size:	 12
Shape:	 (12, 1)


###### Note: The new shape must be the same "size" as the old shape

## Array Generation

Instead of defining an array manually, we can ask numpy to do it for us.

The `np.arange()` method creates a range of numbers with user defined steps between each.

In [21]:
five_times_table = np.arange(0, 55, 5)
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

The `np.linspace()` method will produce a range of evenly spaced values, starting, ending, and taking as many steps as you specify.

In [18]:
five_spaced = np.linspace(0,50,11)
print(five_spaced)

[ 0.  5. 10. 15. 20. 25. 30. 35. 40. 45. 50.]


The `.repeat()` method will repeat an object you pas a specified number of times.

In [19]:
twoArr = np.repeat(2, 10)
print(twoArr)

[2 2 2 2 2 2 2 2 2 2]


The `np.eye()` functions will create an identity matrix/array for us.

In [14]:
identity_matrix = np.eye(6)
print(identity_matrix)

[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]


# Operations

There are many, many operations which we can perform on arrays. 

In [22]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [23]:
print("1:", 2 * five_times_table)
print("2:", 10 + five_times_table)
print("3:", five_times_table - 1)
print("4:", five_times_table/5)
print("5:", five_times_table **2)
print("6:", five_times_table < 20)

1: [  0  10  20  30  40  50  60  70  80  90 100]
2: [10 15 20 25 30 35 40 45 50 55 60]
3: [-1  4  9 14 19 24 29 34 39 44 49]
4: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
5: [   0   25  100  225  400  625  900 1225 1600 2025 2500]
6: [ True  True  True  True False False False False False False False]


### Speed Test

If we compare the speed at which we can do these operations compared to core python, we will notice a substantial difference.

In [37]:
fives_list = list(range(0,500,5))
fives_list

[0,
 5,
 10,
 15,
 20,
 25,
 30,
 35,
 40,
 45,
 50,
 55,
 60,
 65,
 70,
 75,
 80,
 85,
 90,
 95,
 100,
 105,
 110,
 115,
 120,
 125,
 130,
 135,
 140,
 145,
 150,
 155,
 160,
 165,
 170,
 175,
 180,
 185,
 190,
 195,
 200,
 205,
 210,
 215,
 220,
 225,
 230,
 235,
 240,
 245,
 250,
 255,
 260,
 265,
 270,
 275,
 280,
 285,
 290,
 295,
 300,
 305,
 310,
 315,
 320,
 325,
 330,
 335,
 340,
 345,
 350,
 355,
 360,
 365,
 370,
 375,
 380,
 385,
 390,
 395,
 400,
 405,
 410,
 415,
 420,
 425,
 430,
 435,
 440,
 445,
 450,
 455,
 460,
 465,
 470,
 475,
 480,
 485,
 490,
 495]

In [38]:
five_times_table_lge = np.arange(0,5001,5)
five_times_table_lge

array([   0,    5,   10, ..., 4990, 4995, 5000])

In [39]:
%timeit five_times_table_lge + 5

The slowest run took 35.47 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.29 µs per loop


In [40]:
%timeit [e + 5 for e in fives_list]

100000 loops, best of 5: 6.16 µs per loop


Boolean string operations can also be performed on ndarrays.

In [71]:
words = np.array(["ten", "nine", "eight", "seven", "six"])

print(np.isin(words, 'e'))

print("e" in words)

["e" in word for word in words]

[False False False False False]
False


[True, True, True, True, False]

# Transpose

In [72]:
people.shape = (2, 6)
print(people, "\n")
print(people.T)

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]] 

[[10 32]
 [ 5 18]
 [ 8 26]
 [32 60]
 [65 55]
 [43 65]]


# Data Types

ndarrays can only have one data type. If we want to obtain or change this, we use the `.dtype` attribute.

In [44]:
people.dtype

dtype('int64')

What is the data type of the below ndarray?

In [45]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
ages_with_strings

array(['10', '5', '8', '32', '65', '43'], dtype='<U21')

What is the dtype of this array?

In [46]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'], dtype='int32')
ages_with_strings

array([10,  5,  8, 32, 65, 43], dtype=int32)

Create a NumPy array

In [47]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings) 

['10' '5' '8' '32' '65' '43']


In [78]:
ages_with_strings.dtype = 'int32'
print(ages_with_strings)

[49 48  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 53  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 56  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 51 50  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0 54 53  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0 52 51  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0]


In [74]:
ages_with_strings.size

6

In [73]:
ages_with_strings.size/21

0.2857142857142857

In [75]:
np.array([10, 5, 8, '32', '65', '43']).size

6

The correct way to have changed the data type of the ndarray would have been to use the `.astype()` method, demonstrated below.

In [76]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)
print(ages_with_strings.astype('int32'))

['10' '5' '8' '32' '65' '43']
[10  5  8 32 65 43]


## Array Slicing Operations

As before, we can use square brackets and indices to access individual values, and the colon operator to slice the array.

In [52]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [53]:
five_times_table[0]

0

In [54]:
five_times_table[-1]

50

In [55]:
five_times_table[:4]

array([ 0,  5, 10, 15])

In [56]:
five_times_table[4:]

array([20, 25, 30, 35, 40, 45, 50])

We can also slice an n-dim ndarray., specifying the slice operation accross each axis.

In [57]:
print(people)
people[:3, :3]

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]]


array([[10,  5,  8],
       [32, 18, 26]])

We can reverse an array by using `.flip()` or by using the `::` operator.

In [61]:
reverse_five_times_table = np.flip(five_times_table)
reverse_five_times_table

array([50, 45, 40, 35, 30, 25, 20, 15, 10,  5,  0])

In [62]:
reverse_five_times_table = five_times_table[-1::-1]
print(reverse_five_times_table)
five_times_table

[50 45 40 35 30 25 20 15 10  5  0]


array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

We can also use the `::` operator to select steps of the original array.

In [63]:
five_times_table[0::3] #Every 3rd element starting from 0

array([ 0, 15, 30, 45])

# Stats

In [82]:
np.array([1.65432, 5.98765]).round(2)

array([1.65, 5.99])

In [84]:
nums = np.arange(0, 4, 0.2555)
print(nums)

[0.     0.2555 0.511  0.7665 1.022  1.2775 1.533  1.7885 2.044  2.2995
 2.555  2.8105 3.066  3.3215 3.577  3.8325]



* Compute min, max, sum, mean, median, variance, and standard deviation of the above array, all to to 2 decimal places.

In [85]:
print("min = ", np.min(nums).round(2))
print("max = ", np.max(nums).round(2))
print("sum = ", np.sum(nums).round(2))
print("mean = ", np.mean(nums).round(2))
print("median = ", np.median(nums).round(2))
print("var = ", np.var(nums).round(2))
print("std = ", np.std(nums).round(2))

min =  0.0
max =  3.83
sum =  30.66
mean =  1.92
median =  1.92
var =  1.39
std =  1.18


## Random

With `np.random`, we can generate a number of types of dataset, and create training data.

The below code simulates a fair coin toss.

In [90]:
flip = np.random.choice([0,1], 100)
flip

array([0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
       1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1,
       1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1])

In [93]:
np.random.rand(10,20,9)

array([[[0.82577028, 0.03976829, 0.43832254, ..., 0.36227283,
         0.27335476, 0.79829335],
        [0.17181976, 0.32669245, 0.00136324, ..., 0.52273233,
         0.24495187, 0.74556949],
        [0.74428664, 0.47499167, 0.26004175, ..., 0.12959432,
         0.59577386, 0.90405703],
        ...,
        [0.71846293, 0.33274618, 0.98897776, ..., 0.49793593,
         0.30606363, 0.88952146],
        [0.20406367, 0.01212815, 0.01216391, ..., 0.01035917,
         0.05020097, 0.0535525 ],
        [0.25306498, 0.2481001 , 0.27577351, ..., 0.08587644,
         0.118761  , 0.51100054]],

       [[0.97546669, 0.86190723, 0.97783458, ..., 0.31614495,
         0.5149774 , 0.32322612],
        [0.72781539, 0.5314489 , 0.14952463, ..., 0.76905253,
         0.21679947, 0.78169221],
        [0.89623021, 0.71811618, 0.49120889, ..., 0.00427484,
         0.22953327, 0.48016881],
        ...,
        [0.83556257, 0.50137981, 0.64805565, ..., 0.56026422,
         0.30026811, 0.21463185],
        [0.0

We can produce 100 datapoints of a normally distributed data set by using `np.random.normal()`

In [99]:
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 100)
print(s)

[-0.05952461  0.28961714  0.30252547  0.00332994  0.02925113  0.15929293
  0.16229036  0.02971709 -0.11534069  0.0303982   0.01457253 -0.10431609
 -0.07747337 -0.14158755  0.00487287  0.06825909  0.13026946  0.05720965
  0.08436758  0.03185213 -0.09103042 -0.06340185  0.03670765  0.04164798
  0.05504773  0.07539229  0.13577989  0.04454499  0.01529405 -0.04350655
  0.05357871  0.01014693  0.04391702  0.25071228  0.15124785 -0.0249249
  0.01831947 -0.11201721  0.08044832 -0.1164089   0.15556317 -0.20655942
 -0.08125051 -0.02349499  0.32330036  0.00157866 -0.08489568 -0.03753999
 -0.20235456 -0.07871391  0.21752369  0.16513723 -0.02021224  0.00428114
  0.1201174   0.09063871 -0.01789995  0.07645302  0.18828781 -0.17574792
  0.01681907  0.12018386 -0.11391327  0.04124623 -0.14200331 -0.02639008
  0.06597591  0.08749058 -0.03748737 -0.06145118 -0.04091446 -0.01174503
  0.01034949 -0.03886912 -0.10802505  0.15853083  0.04613511 -0.05057809
  0.0040452  -0.00311491  0.10388931  0.10596132  0.