<a href="https://colab.research.google.com/github/DataBoss62/BASICS-TUTORIAL/blob/main/01_Python_Numpy_(2).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The first thing we want to do is import numpy.

In [1]:
import numpy as np

Let us first define a Python list containing the ages of 6 people.

In [2]:
ages_list = [10, 5, 8, 32, 65, 43]
print(ages_list)

[10, 5, 8, 32, 65, 43]


There are 3 main ways to instantiate a Numpy ndarray object. One of these is to use `np.array(<collection>)`

In [3]:
ages = np.array(ages_list)
print(type(ages))
print(ages)

<class 'numpy.ndarray'>
[10  5  8 32 65 43]


In [4]:
print(ages)
print("Size:\t" , ages.size)
print("Shape:\t", ages.shape)

[10  5  8 32 65 43]
Size:	 6
Shape:	 (6,)


In [6]:
zeroArr = np.zeros(5)
print(zeroArr)

[0. 0. 0. 0. 0.]


### Multi-dim

Now let us define a new list containing the weights of these 6 people.

In [7]:
weight_list = [32, 18, 26, 60, 55, 65]

Now, we define an ndarray containing all fo this information, and again print the size and shape of the array.

In [8]:
people = np.array([ages_list, weight_list])

print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10  5  8 32 65 43]
 [32 18 26 60 55 65]]
Size:	 12
Shape:	 (2, 6)


In [9]:
people = people.reshape(12,1)
print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10]
 [ 5]
 [ 8]
 [32]
 [65]
 [43]
 [32]
 [18]
 [26]
 [60]
 [55]
 [65]]
Size:	 12
Shape:	 (12, 1)


###### Note: The new shape must be the same "size" as the old shape

### Exercise

* Generate a 1D numpy array with the values [7, 9, 65, 33, 85, 99]

* Generate a matrix (2D numpy array) of the values:

\begin{align}
  \mathbf{A} =
  \begin{pmatrix}
    1 & 2 & 4 \\
    2 & 3 & 0 \\
    0 & 5 & 1
  \end{pmatrix}
\end{align}

* Change the dimensions of this array to another permitted shape

## Array Generation

Instead of defining an array manually, we can ask numpy to do it for us.

The `np.arange()` method creates a range of numbers with user defined steps between each.

In [10]:
five_times_table = np.arange(0, 55, 5)
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

The `np.linspace()` method will produce a range of evenly spaced values, starting, ending, and taking as many steps as you specify.

In [11]:
five_spaced = np.linspace(0,50,11)
print(five_spaced)

[ 0.  5. 10. 15. 20. 25. 30. 35. 40. 45. 50.]


The `.repeat()` method will repeat an object you pas a specified number of times.

In [12]:
twoArr = np.repeat(2, 10)
print(twoArr)

[2 2 2 2 2 2 2 2 2 2]


The `np.eye()` functions will create an identity matrix/array for us.

In [13]:
identity_matrix = np.eye(6)
print(identity_matrix)

[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]


# Operations

There are many, many operations which we can perform on arrays. Below, we demonstrate a few.

What is happening in each line?

In [14]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [15]:
print("1:", 2 * five_times_table)
print("2:", 10 + five_times_table)
print("3:", five_times_table - 1)
print("4:", five_times_table/5)
print("5:", five_times_table **2)
print("6:", five_times_table < 20)

1: [  0  10  20  30  40  50  60  70  80  90 100]
2: [10 15 20 25 30 35 40 45 50 55 60]
3: [-1  4  9 14 19 24 29 34 39 44 49]
4: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
5: [   0   25  100  225  400  625  900 1225 1600 2025 2500]
6: [ True  True  True  True False False False False False False False]


### Speed Test

If we compare the speed at which we can do these operations compared to core python, we will notice a substantial difference.

In [None]:
fives_list = list(range(0,5001,5))
fives_list

In [17]:
five_times_table_lge = np.arange(0,5001,5)
five_times_table_lge

array([   0,    5,   10, ..., 4990, 4995, 5000])

In [20]:
%timeit five_times_table_lge + 5

The slowest run took 733.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.33 µs per loop


In [21]:
%timeit [e + 5 for e in fives_list]

10000 loops, best of 5: 59.2 µs per loop


Boolean string operations can also be performed on ndarrays.

In [25]:
words = np.array(["ten", "nine", "eight", "seven", "six"])

print(np.isin(words, 'e'))

print("e" in words)
["e" in word for word in words]

[False False False False False]
False


[True, True, True, True, False]

# Transpose

In [26]:
people.shape = (2, 6)
print(people, "\n")
print(people.T)

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]] 

[[10 32]
 [ 5 18]
 [ 8 26]
 [32 60]
 [65 55]
 [43 65]]


# Data Types

As previously mentioned, ndarrays can only have one data type. If we want to obtain or change this, we use the `.dtype` attribute.

In [27]:
people.dtype

dtype('int64')

What is the data type of the below ndarray?

In [28]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
ages_with_strings

array(['10', '5', '8', '32', '65', '43'], dtype='<U21')

What is the dtype of this array?

In [29]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'], dtype='int32')
ages_with_strings

array([10,  5,  8, 32, 65, 43], dtype=int32)

What do you think has happened here?

In [30]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)

['10' '5' '8' '32' '65' '43']


In [31]:
ages_with_strings.dtype = 'int32'
print(ages_with_strings)

[49 48  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 53  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 56  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 51 50  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0 54 53  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0 52 51  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0]


In [32]:
ages_with_strings.size

126

In [33]:
ages_with_strings.size/21

6.0

In [34]:
np.array([10, 5, 8, '32', '65', '43']).size

6

The correct way to have changed the data type of the ndarray would have been to use the `.astype()` method, demonstrated below.

In [35]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)
print(ages_with_strings.astype('int32'))

['10' '5' '8' '32' '65' '43']
[10  5  8 32 65 43]


### Exercise

* #### Create an array of string numbers, but use dtype to make it an array of floats.
* #### Transpose the matrix, printing the new size and shape.
* #### Use the .astype() method to convert the array to boolean.

## Array Slicing Operations

As before, we can use square brackets and indices to access individual values, and the colon operator to slice the array.

In [36]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [40]:
five_times_table[0]

0

In [41]:
five_times_table[-1]

50

In [38]:
five_times_table[:4]

array([ 0,  5, 10, 15])

In [39]:
five_times_table[4:]

array([20, 25, 30, 35, 40, 45, 50])

We can also slice an n-dim ndarray., specifying the slice operation accross each axis.

In [42]:
print(people)
people[:3, :3]

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]]


array([[10,  5,  8],
       [32, 18, 26]])

### Exercise

* Create a numpy array with 50 zeros
* Create a np array of 2 repeated 20 times
* Create a numpy array from 0 to 2 $\pi$ in steps of 0.1

For one of the arrays generated:
* Get the first five values
* Get the last 3 values
* Get the 4th value to the 7th value

We can reverse an array by using `.flip()` or by using the `::` operator.

In [43]:
reverse_five_times_table = np.flip(five_times_table)
reverse_five_times_table

array([50, 45, 40, 35, 30, 25, 20, 15, 10,  5,  0])

In [44]:
reverse_five_times_table = five_times_table[-1::-1]
print(reverse_five_times_table)
five_times_table

[50 45 40 35 30 25 20 15 10  5  0]


array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

We can also use the `::` operator to select steps of the original array.

In [45]:
five_times_table[0::3] #Every 3rd element starting from 0

array([ 0, 15, 30, 45])

### Exercise
Take one of the arrays you defined and
* #### Reverse it
* #### Only keep every 4th element.
* #### Get every 2nd element, starting from the last and moving backwards.

# Stats

In [46]:
np.array([1.65432, 5.98765]).round(2)

array([1.65, 5.99])

In [53]:
nums = np.arange(0, 4, 0.2555)
print(nums)


[0.     0.2555 0.511  0.7665 1.022  1.2775 1.533  1.7885 2.044  2.2995
 2.555  2.8105 3.066  3.3215 3.577  3.8325]


### Exercise

* Compute min, max, sum, mean, median, variance, and standard deviation of the above array, all to to 2 decimal places.

In [50]:
print("min = ", np.min(nums).round(2))
print("max = ", np.max(nums).round(2))
print("sum = ", np.sum(nums).round(2))
print("mean = ", np.mean(nums).round(2))
print("median = ", np.median(nums).round(2))
print("var = ", np.var(nums).round(2))
print("std = ", np.std(nums).round(2))

min =  0.0
max =  3.83
sum =  30.66
mean =  1.92
median =  1.92
var =  1.39
std =  1.18


## Random

With `np.random`, we can generate a number of types of dataset, and create training data.

The below code simulates a fair coin toss.

In [63]:
flip = np.random.choice([0,1], 10)
flip

array([0, 1, 0, 0, 1, 1, 1, 0, 0, 0])

In [64]:
np.random.rand(10,20,9)

array([[[0.27936558, 0.22004276, 0.76511102, ..., 0.28892422,
         0.08914636, 0.91681944],
        [0.94122705, 0.35349268, 0.99179428, ..., 0.6103127 ,
         0.80911584, 0.50305192],
        [0.77479091, 0.94565195, 0.23718674, ..., 0.65914748,
         0.44910527, 0.7044699 ],
        ...,
        [0.56805022, 0.11746424, 0.70986205, ..., 0.02733194,
         0.28265283, 0.0895069 ],
        [0.37821928, 0.31413665, 0.22861843, ..., 0.65755062,
         0.6736465 , 0.02197178],
        [0.69790934, 0.47729171, 0.99422577, ..., 0.66730939,
         0.20683504, 0.93312463]],

       [[0.67606831, 0.91139633, 0.52920012, ..., 0.49804378,
         0.40389858, 0.01472275],
        [0.47051892, 0.8742938 , 0.52410817, ..., 0.7742399 ,
         0.39595548, 0.11767103],
        [0.25928808, 0.05499513, 0.00100028, ..., 0.33780217,
         0.19903755, 0.2762968 ],
        ...,
        [0.127438  , 0.77906968, 0.31145205, ..., 0.64064028,
         0.79712707, 0.9213714 ],
        [0.6

We can produce 1000 datapoints of a normally distributed data set by using `np.random.normal()`

In [65]:
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)

### Exercise
* Simulate a six-sided dice using numpy.random.choice(), generate a list of values you would obtain from 10 throws.
* Simulate a two-sided coin toss that is NOT fair: it is twice as likely to have head than tails.


In [95]:
throw = np.random.randint(1,6,10)
print(throw)

throw1 = np.random.choice([1,2,3,4,5,6])
for i in range (10):
   print(np.random.choice([1,2,3,4,5,6]))

 #toss = np.random.choice(["heads","tails"],[2,1])
for i in range (10):
   print(np.random.choice(["heads","tails"],p=[0.67,0.33]))



[5 3 1 2 1 5 4 5 3 3]
4
4
3
5
4
3
3
6
3
3
tails
tails
heads
heads
tails
heads
heads
heads
heads
heads
