<center>
<table style="border:none">
    <tr style="border:none">
    <th style="border:none">
        <a  href='https://colab.research.google.com/github/AmirMardan/ml_course/blob/main/2_numpy/0_intro_to_numpy.ipynb'><img src='https://colab.research.google.com/assets/colab-badge.svg'></a>
    </th>
    <th style="border:none">
        <a  href='https://github1s.com/AmirMardan/ml_course/blob/main/2_numpy/0_intro_to_numpy.ipynb'><img src='../imgs/open_vscode.svg' height=20px width=115px></a>
    </th>
    </tr>
</table>
</center>


This notebook is created by <a href='https://github.com/AmirMardan'> Amir Mardan</a>. For any feedback or suggestion, please contact me via my <a href="mailto:mardan.amir.h@gmail.com">email</a>, (mardan.amir.h@gmail.com).



<center>
<img src='img/numpy.png' width='300px'>
</center>

<a name='top'></a>
# Introdunction to NumPy

This notebook will cover the following topics:

- [Introduction](#introduction)
- [NumPy vs list](#numpy_list) 
- [1. Creating a NumPy array](#creating) 
    - [Creating arrays from lists](#creating_with_list) 
    - [Special arrays](#special_array) 
- [2 Attributes of arrays](#attributes_array) 
- [3. Data Selection](#data_election) 
    - [Array indexing](#indexing) 
    - [Array slicing](#slicing) 
    - [Array view vs copy](#view_copy) 
    - [Conditional selection](#conditional) 
- [4. Array manipulation](#manipulation) 
    - [Shape of an array](#shape) 
    - [Joining arrays](#joining) 
    - [Splitting of arrays](#splitting) 
- [5. Computation on NumPy arrays ](#computation) 
- [6. Aggregations](#aggregations) 
    - [Summation](#summation) 
    - [Minimum and maximum](#min_max) 
    - [Variance and standard deviation](#var_std) 
    - [Mean and median](#mean_median) 
    - [Find index](#find_index) 

<a name='introduction'></a>
## Introduction 


NumPy is a library for working large, multi-dimensional arrays and matrices.
Created by **Travis Oliphant**, first time released in 1995 as *Numeric* and changed to *NumPy* in 2006.

<center><img src='./img/travis.jpeg' alter='tavis' width=300px></center>

The array object in NumPy is called `ndarray`

<a name='numpy_list'></a>
## NumPy vs list 


**Advantages of using NumPy arrays over Python lists**

- Numpy takes less memory
- Numpy is faster
- Numpy has better functionality

Let's try if these statements are true. But first, we need to tell Python, that we want to use Numpy. To do so, we import this package. Generally, a Python package can be imported at the beginning of a script as

```Python
 import module
```
To make us comfortable, Python lets us pick a nickname for a module we import using the keyword `as`,

```Python
 import module as nickname
```

In [1]:
# Import numpy 

import numpy as np

n = 1000

# Make an array of zeros
numpy_version = np.zeros(n)

list_version = list(numpy_version)

print(type(list_version), type(numpy_version))

<class 'list'> <class 'numpy.ndarray'>


<div class="alert alert-block alert-danger">
<b>Danger:</b> Please note that the addition of two lists causes concatenation!
</div>

In [2]:
def numpy_based():
    return  numpy_version + 5
    
    
def list_based():
    return [list_version[i] + 5 for i in range(len(list_version))]

In [4]:
%timeit list_based()
%timeit numpy_based()


206 µs ± 2.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.25 µs ± 43.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In lot's of material, we see the above example is used to show NumPy is fater but the truth is the function `list_based` is not implemented well. Let's write it with `map()` as we learned in introduction to Python.

In [46]:
def list_based():
    add = lambda a: a + 5
    return map(add, list_version)


%timeit list_based()
%timeit numpy_based()
print('Voila, list is faster now.')


195 ns ± 8.84 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
1.33 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Voila, list is faster now.


In [83]:
# Let's check the size and values 

a = numpy_based()
b = list_based()

c = list(b)
print("They're equal!\nNumPy: ", a.shape, "\nlist: ", np.shape(c)) if [np.all(c == a)] else print('Wrong!') 


They're equal!
NumPy:  (1000,) 
list:  (1000,)


But for sure, NumPy gives us more functionality and NumPy's methods are faster using NumPy arrays.

In [84]:
def numpy_based():
    return np.mean(numpy_version)


def list_based():
    return np.mean(list_version)


In [91]:
%timeit list_based()
%timeit numpy_based()

42.3 µs ± 965 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
6.39 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


<a name='creating'></a>
## 1 . Creating a NumPy array 


`numpy.array` can be used to create a tensor

<center><img src='img/Tensor_01.webp' alt='tensor' width=400px></center>

In [92]:
# Create 0-D array

a = np.array(2)

print("a = ", a, "; shape: ", a.ndim)

a =  2 ; shape:  0


<a name='creating_with_list'></a>
### 1.1 Creating arrays from lists


We can use `np.array()` to create an array from Python lists

```Python
np.array(list_name)
```

In [93]:
# Create 1-D 'float' array

np.array([1.0, 4.3, 8., 9])

array([1. , 4.3, 8. , 9. ])

We can use the parameter `dtype` to specify the type of data,

```Python
np.array(list_name, dtype=desired_type)
```

In [94]:
# Create 1-D 'int' array

np.array([1.0, 4., 8., 9], dtype=np.int32)

array([1, 4, 8, 9], dtype=int32)

In [95]:
# Create 2-D array

np.array([[1.0, 4., 8., 9],
          [2, 4, 1, 3]], dtype=np.float32)

array([[1., 4., 8., 9.],
       [2., 4., 1., 3.]], dtype=float32)

<a name='special_array'></a>
### 1.2 Special arrays


It's better to use special methods in NumPy for larger arrays. These special arrays are:

- All zero array
- All one array
- Identity matrix
- Empty array
- Full array
- Random array
- Arrays based on a given range

These arrays are usually created with the following syntax:

```Python
np.array(shape=shape_in_tuple, dtype=type_of_data)
```


In [96]:
# Creating an array of zeros

np.zeros(shape=(5, 2), dtype=np.float32)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)

In [97]:
# Creating an array of zeros using zeros_like

np.zeros_like(np.ones(shape=(5, 2), dtype=np.float32))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)

To make it easier, we can let NumPy decide the type of data for the rest of this notebook.

In [98]:
# Creating an array of ones

np.ones(shape=(5, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [99]:
# Creating an identity matrix
# Note: np.eye doesn't get the sahpe as argument!

np.eye(5, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [100]:
# Creating an empty array

np.empty(shape=(5, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [101]:
# Creating an empty array using empty_like

np.empty_like(np.zeros((5, 2)))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [102]:
# Creating an array of random numbers (normally distributed)

np.random.normal(loc=0, scale=1, size=(5,2))

array([[ 2.46063116, -1.66138   ],
       [-0.68877872,  0.5324897 ],
       [ 1.70459242, -0.55436171],
       [-1.25038412,  1.53421687],
       [ 0.47963109,  0.708123  ]])

In [103]:
# Creating an array of random numbers (uniformly distributed)

np.random.random((5, 2))

array([[0.13027077, 0.20035318],
       [0.57908804, 0.18882952],
       [0.00391438, 0.59909874],
       [0.3945339 , 0.02842908],
       [0.17698491, 0.09593101]])

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
Generate an array with shape of [10, 10] and values between 5 and 7
</div>
<hr>

In [104]:
# Answer



In [105]:
# Creating an array of random numbers

np.random.rand(5, 2)

array([[0.99706409, 0.27631072],
       [0.17084136, 0.07686026],
       [0.6466758 , 0.30451209],
       [0.25820037, 0.95089223],
       [0.35967909, 0.59801397]])

In [106]:
# Creating an array of random integer numbers

np.random.randint(low=1, high=5, size=(5, 2))

array([[2, 3],
       [3, 1],
       [1, 3],
       [3, 4],
       [3, 4]])

In [107]:
# Creating an array in a range

np.arange(start=1, stop=4, step=0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9])

In [108]:
# Creating an array in a range

np.linspace(start=1, stop=4, num=10)

array([1.        , 1.33333333, 1.66666667, 2.        , 2.33333333,
       2.66666667, 3.        , 3.33333333, 3.66666667, 4.        ])

In [109]:
# Creating an array in a range

np.logspace(start=1, stop=4, num=10)

array([   10.        ,    21.5443469 ,    46.41588834,   100.        ,
         215.443469  ,   464.15888336,  1000.        ,  2154.43469003,
        4641.58883361, 10000.        ])

<a name='attributes_array'></a>
## 2. Attributes of arrays 


We use attributes to determine some properties of the array such as shape, size, etc.

In [110]:
x = np.random.randint(14, size=(4, 3))

print(x)

[[11 10  5]
 [ 4  6  1]
 [ 7  7  9]
 [ 9  5 12]]


In [111]:
print("Dimension: ", x.ndim) 
print("Shape: ", x.shape) 
print("Size: ", x.size) 
print("Type: ", x.dtype)
print("Element size: ", x.itemsize, 'bytes')  # Size of each element
print("Array size: ", x.nbytes, 'bytes')  # Size of the array

Dimension:  2
Shape:  (4, 3)
Size:  12
Type:  int64
Element size:  8 bytes
Array size:  96 bytes


<a name='data_election'></a>
## 3. Data Selection
 

***Indexing*** is used to select an individual element of an array

***Slicing*** is used to select a part of an array

<a name='indexing'></a>
### 3.1 Array indexing

Please note that Python counts from `0`.

In [112]:
x_1d = np.random.random((6, 1))
x_1d

array([[0.82810965],
       [0.56678088],
       [0.98476452],
       [0.84156425],
       [0.82068209],
       [0.4707897 ]])

In [113]:
# Access to the first element

x_1d[0]

array([0.82810965])

In [114]:
# Access to the last element

x_1d[-1]

array([0.4707897])

In [115]:
x_2d = np.random.random((4, 5))
x_2d

array([[0.7626298 , 0.77210095, 0.16127265, 0.86901824, 0.99142288],
       [0.11145316, 0.56758943, 0.585537  , 0.84567627, 0.80371882],
       [0.05046367, 0.08834924, 0.05348962, 0.61359812, 0.86721375],
       [0.12210604, 0.43084803, 0.03230398, 0.18576835, 0.15799014]])

In [116]:
x_2d[0, 0]

0.7626297954998034

In [117]:
x_2d[0][0]

0.7626297954998034

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
What would be the result of <code>x_2d[0, 0]</code>?
</div>
<hr>

In [118]:
# Answer

<a name='slicing'></a>
### 3.2 Array slicing

For a 1-D slicing we use the following syntax

```Python
sliced = original[begining:end:step]
```

If `step` is not given, it will be considered as `1`.


In [119]:
x_1d

array([[0.82810965],
       [0.56678088],
       [0.98476452],
       [0.84156425],
       [0.82068209],
       [0.4707897 ]])

In [120]:
# Specify the start and end of the desired section by number

x_1d[1:4]

array([[0.56678088],
       [0.98476452],
       [0.84156425]])

<div class="alert alert-block alert-info">
<b>Tip:</b> Please consider that element 1 is included but that's not the case for element 4.</div>

In [121]:
"""
Specify the start or the end of the desired section that corresponds to
the start or the end of the array
"""

x_1d[1:]

array([[0.56678088],
       [0.98476452],
       [0.84156425],
       [0.82068209],
       [0.4707897 ]])

In [122]:
# Indexing with step greater than 1

x_1d[1::2]

array([[0.56678088],
       [0.84156425],
       [0.4707897 ]])

In [123]:
# We can reverse the array by indexing

x_1d[-1:0:-1]

array([[0.4707897 ],
       [0.82068209],
       [0.84156425],
       [0.98476452],
       [0.56678088]])

In [124]:
# Slicing in 2-D

x_2d[1:, 3:5]

array([[0.84567627, 0.80371882],
       [0.61359812, 0.86721375],
       [0.18576835, 0.15799014]])

<a name='view_copy'></a>
### 3.3 Array view vs copy

An extremely important thing to know is array slices return *views* of data rather than *copy*.
This is another difference between NumPy arrays and Python lists. 

In [125]:
print(x_1d)

x_1d_sliced = x_1d[0]

x_1d_sliced *= 2

print("======== \n", x_1d)


[[0.82810965]
 [0.56678088]
 [0.98476452]
 [0.84156425]
 [0.82068209]
 [0.4707897 ]]
 [[1.65621929]
 [0.56678088]
 [0.98476452]
 [0.84156425]
 [0.82068209]
 [0.4707897 ]]


To prevent any problems, you should use the method `copy()`

In [126]:
print(x_1d)

x_1d_sliced = x_1d[0].copy()

x_1d_sliced *= 2

print("======== \n", x_1d)

[[1.65621929]
 [0.56678088]
 [0.98476452]
 [0.84156425]
 [0.82068209]
 [0.4707897 ]]
 [[1.65621929]
 [0.56678088]
 [0.98476452]
 [0.84156425]
 [0.82068209]
 [0.4707897 ]]


<a name='conditional'></a>
### 3.4 Conditional selection

We can use a condition to select a part of an array.

In [127]:
x_1d

array([[1.65621929],
       [0.56678088],
       [0.98476452],
       [0.84156425],
       [0.82068209],
       [0.4707897 ]])

In [128]:
# Let's create a condition

cond = x_1d > 0.7
cond

array([[ True],
       [False],
       [ True],
       [ True],
       [ True],
       [False]])

In [129]:
# Select the data based on the condition

x_1d[cond]

array([1.65621929, 0.98476452, 0.84156425, 0.82068209])

In [130]:
# Let's create a 2-D array

x_2d_int = np.random.randint(12, size=(5, 6))

x_2d_int

array([[ 6, 10,  7,  6,  3,  8],
       [ 5,  1,  4,  8,  6, 11],
       [ 2,  7,  2, 10,  9,  0],
       [ 5,  0,  1,  2,  9,  2],
       [ 6, 10,  8,  1, 11,  9]])

In [131]:
# Let's pull out the even numbers

x_2d_int[x_2d_int % 2 == 0]

array([ 6, 10,  6,  8,  4,  8,  6,  2,  2, 10,  0,  0,  2,  2,  6, 10,  8])

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
Using conditional indexing, pull out numbers in <code>x_2d_int</code> that are divisible by both 2 and 7.
</div>
<hr>

In [132]:
# Answer



<a name='manipulation'></a>
## 4. Array manipulation


Data manipulation is an important step in any study.

<a name='shape'></a>
### 4.1 Shape of an array


In [133]:
# Let's create some arrays

arr_1d = np.arange(10)
arr_2d = 5 * np.random.random((4, 5))


In [134]:
arr_1d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [135]:
arr_2d

array([[3.1096913 , 0.15088387, 3.65064933, 0.97717337, 3.97429903],
       [3.33919848, 1.11834939, 2.05667255, 0.824624  , 3.66193255],
       [1.42704621, 0.09108845, 3.27880682, 4.54264282, 1.3729306 ],
       [4.29856289, 0.47991767, 0.6478553 , 2.83540822, 0.15488189]])

In [136]:
# Shape of an array can be obtained in 2 ways

print(arr_1d.shape, np.shape(arr_1d))

(10,) (10,)


In [137]:
arr_2d.shape

(4, 5)

We can easily reshape a NumPy array


```Python

array.reshape(rows, columns)

np.reshape(array, newshape=(rows, columns))

```

In [138]:
arr_2d.reshape(5, 4)

array([[3.1096913 , 0.15088387, 3.65064933, 0.97717337],
       [3.97429903, 3.33919848, 1.11834939, 2.05667255],
       [0.824624  , 3.66193255, 1.42704621, 0.09108845],
       [3.27880682, 4.54264282, 1.3729306 , 4.29856289],
       [0.47991767, 0.6478553 , 2.83540822, 0.15488189]])

In [139]:
np.reshape(arr_2d, newshape=(5, 4))

array([[3.1096913 , 0.15088387, 3.65064933, 0.97717337],
       [3.97429903, 3.33919848, 1.11834939, 2.05667255],
       [0.824624  , 3.66193255, 1.42704621, 0.09108845],
       [3.27880682, 4.54264282, 1.3729306 , 4.29856289],
       [0.47991767, 0.6478553 , 2.83540822, 0.15488189]])

In [140]:
# We can flatten an array using attribute flatten

arr_2d.flatten()


array([3.1096913 , 0.15088387, 3.65064933, 0.97717337, 3.97429903,
       3.33919848, 1.11834939, 2.05667255, 0.824624  , 3.66193255,
       1.42704621, 0.09108845, 3.27880682, 4.54264282, 1.3729306 ,
       4.29856289, 0.47991767, 0.6478553 , 2.83540822, 0.15488189])

<div class="alert alert-block alert-info">
<b>Tip:</b> Reshape doesn't act in place while resize returns <code>None</code> and acts in place.</div>


In [None]:
print("Original shape: ", arr_2d.shape)

arr_2d.reshape(5, 4)

print("Shape after using reshape: ", arr_2d.shape)

a = arr_2d.resize(5, 4)

print("Shape after using resize: ", arr_2d.shape)


<span style='color:red; font-weight:bold;'>Note:</span> <code>np.resize()</code> acts like <code>np.reshape()</code>.

In [None]:
print("Original shape: ", arr_2d.shape)

np.reshape(arr_2d, (5, 4))

print("Shape after using reshape: ", arr_2d.shape)

np.resize(arr_2d, (5, 4))

print("Shape after using resize: ", arr_2d.shape)

<a name='joining'></a>
### 4.2 Joining arrays 


In [None]:
arr1 = np.array([[1, 2, 0, 1]])

arr2 = 6 * np.random.random((4, 4))

In [None]:
# Check the shape
print('arr1: ', arr1.shape)
print('arr2: ', arr2.shape)

In [None]:
# Joining arrays by row

np.concatenate((arr2, arr1), axis=0)

For using `axis = 0`, we should have the same number of columns and for `axis = 1`, we should have the same number of rows.

In [None]:
# Joining arrays by column
# .T takes the transpose as well as np.transpose()
np.concatenate((arr2, arr1.T), axis=1)

In [None]:
# Joining by vstack which acts as using axis = 0

np.vstack((arr2, arr1))

In [None]:
# Joining by hstack which acts as using axis = 1

np.hstack((arr2, arr1.T))

In [None]:
# Joining by column_stack / row_stack which acts as using axis = 1 / 0. 

np.column_stack((arr2, arr1.T))

<a name='splitting'></a>
### 4.3 Splitting of arrays 


The opposite of concatenation is splitting.
 
`np.split`, `np.hsplit`, `np.vsplit`

In [None]:
arr2

In [None]:
# Splitting with axis = 0

np.split(arr2, 2)

In [None]:
# Splitting with axis = 1

np.array_split(arr2, 2, axis=1)

In [None]:
# Using hsplit / vsplit

np.hsplit(arr2, 2)

<a name='computation'></a>
## 5. Computation on NumPy arrays 


Computation on NumPy arrays can be very fast if we use *vectorized* operators through *universal functions*, **ufuncs**.

In [None]:
# Creating two arrays

arr1d_1 = np.arange(7)
arr1d_2 = np.linspace(9, 12, len(arr1d_1))


arr2d_1 = np.random.random((4, 5))
arr2d_2 = 4 + 6 * np.random.random((4, 5))

In [None]:
# Addition of two arrays (equivalent of arr1d_1 + arr1d_2)

np.add(arr2d_1, arr2d_2)

In [None]:
# Subtraction of two arrays (equivalent of arr1d_1 - arr1d_2)

np.subtract(arr2d_1, arr2d_2)

In [None]:
# Multiplication of two arrays (equivalent of arr1d_1 * arr1d_2)

np.multiply(arr2d_1, arr2d_2)

In [None]:
# Division of two arrays (equivalent of arr1d_1 + arr1d_2)

np.divide(arr2d_1, arr2d_2)

In [None]:
# Logarithm of an array

np.log10(arr2d_1)

In [None]:
# Exponent of an array

np.exp(arr2d_1)

In [None]:
# Sin of an array

np.sin(arr2d_1)

In [None]:
# Comparison for greate, equivalent arr2d_1 > arr2d_2

np.greater(arr2d_1, arr2d_2)

In [None]:
# Comparison for greate, equivalent arr2d_1 < arr2d_2

np.less(arr2d_1, arr2d_2)

In [None]:
# FInding the absolute of an array, equivalent of np.absolute

np.abs([2, -1, 9, -1.2])

<a name='aggregations'></a>
## 6. Aggregations 


Before doing any operation, it's good to have a summary statistics of the data.

<a name='summation'></a>
###  6.1 Summation 


In [None]:
# Let's create an array

arr1 = np.random.random((100, 100))


In [None]:
# Sum of values in an array

np.sum(arr1)

In [None]:
# Comparing with Python built-in function

big_array = np.random.random(100000)

%timeit sum(big_array)
%timeit np.sum(big_array)

<a name='min_max'></a>
###  6.2 Minimum and maximum 


In [None]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [None]:
# Finding max along columns

arr1.max(axis=0)

In [None]:
# Finding max along rows
arr1.max(axis=1)

In [None]:
# Finding min
arr1.min(axis=0)

In [None]:
# We can use np.max / np.min as well.
np.max(arr1, axis=1)

<a name='var_std'></a>
### 6.3 Variance and standard deviation 


In [None]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [None]:
# calculating the standard deviation of the array

np.std(arr1)

In [None]:
arr1.std()

In [None]:
# Let's check if std is variance^0.5

arr1.std() == np.sqrt(arr1.var())

In [None]:
# std along rows

arr1.std(axis=1)

In [None]:
# variance along columns

arr1.var(axis=0)

<a name='mean_median'></a>
###  6.4 Mean and median


In [None]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [None]:
# Calculate the mean of the whole array

arr1.mean()

In [None]:
# Calculate the mean along an axis

arr1.mean(axis=1)

In [None]:
# Calculate the mean of the whole array

np.median(arr1)

<span style='color:red; font-weight:bold;'>Note: </span> An <code>ndarray</code> doesn't have attribute median.

<a name='find_index'></a>
###  6.5 Find index 


In [None]:
# Let's create an array

arr1 = np.random.random((6, 5))
arr1[3, 3] = 0
arr1

In [None]:
# Find the indix of maximum value

arr1.argmax()

In [None]:
# Find the indix of minimum value

arr1.argmin()

In [None]:
# Find the indix of an specific value

np.where(arr1 == 0)

### [TOP ☝️](#top)
