<center>
<table style="border:none">
    <tr style="border:none">
    <th style="border:none">
        <a  href='https://colab.research.google.com/github/AmirMardan/ml_course/blob/main/2_numpy/0_intro_to_numpy.ipynb'><img src='https://colab.research.google.com/assets/colab-badge.svg'></a>
    </th>
    <th style="border:none">
        <a  href='https://github1s.com/AmirMardan/ml_course/blob/main/2_numpy/0_intro_to_numpy.ipynb'><img src='../imgs/open_vscode.svg' height=20px width=115px></a>
    </th>
    </tr>
</table>
</center>


This notebook is created by <a href='https://github.com/AmirMardan'> Amir Mardan</a>. For any feedback or suggestion, please contact me via my <a href="mailto:mardan.amir.h@gmail.com">email</a>, (mardan.amir.h@gmail.com).



<center><img id='PYTHON' src='img/numpy.png' width='300px'></center>

This notebook will cover the following topics:

- <a href='#introduction'>Introduction </a>
- <a href='#numpy_list'> NumPy vs list </a>
- <a href='#creating'> 1. Creating a NumPy array </a>
    - <a href='#creating_with_list'> Creating arrays from lists </a>
    - <a href='#special_array'> Special arrays </a>
- <a href='#attributes_array'> 2 Attributes of arrays </a>
- <a href='#data_election'> 3. Data Selection </a>
    - <a href='#indexing'> Array indexing </a>
    - <a href='#slicing'> Array slicing </a>
    - <a href='#view_copy'> Array view vs copy </a>
    - <a href='#conditional'> Conditional selection </a>
- <a href='#manipulation'> 4. Array manipulation </a>
    - <a href='#shape'> Shape of an array </a>
    - <a href='#joining'> Joining arrays </a>
    - <a href='#splitting'> Splitting of arrays </a>
- <a href='#aggregations'> 5. Aggregations </a>
    - <a href='#summation'> Summation </a>
    - <a href='#min_max'> Minimum and maximum </a>
    - <a href='#var_std'> Variance and standard deviation </a>
    - <a href='#mean_median'> Mean and median </a>
    - <a href='#find_index'> Find index </a>

# <span id='introduction'>Introduction </span>


NumPy is a library for working large, multi-dimensional arrays and matrices.
Created by **Travis Oliphant**, first time released in 1995 as *Numeric* and changed to *NumPy* in 2006.

<center><img src='./img/travis.jpeg' alter='tavis' width=300px></center>

The array object in NumPy is called `ndarray`

# <span id='numpy_list'> NumPy vs list </span>


**Advantages of using NumPy arrays over Python lists**

- Numpy takes less memory
- Numpy is faster
- Numpy has better functionality

Let's try if these statements are true. But first, we need to tell Python, that we want to use Numpy. To do so, we import this package. Generally, a Python package can be imported at the beginning of a script as

```Python
 import module
```
To make us comfortable, Python lets us pick a nickname for a module we import using the keyword `as`,

```Python
 import module as nickname
```

In [1]:
# Import numpy 

import numpy as np

n = 1000

# Make an array of zeros
numpy_version = np.zeros(n)

list_version = list(numpy_version)

print(type(list_version), type(numpy_version))

<class 'list'> <class 'numpy.ndarray'>


<div class="alert alert-block alert-danger">
<b>Danger:</b> Please note that the addition of two lists causes concatenation!
</div>

In [2]:
# Check the efficiency

def numpy_based():
    # return [numpy_version[i] + 5 for i in range(len(numpy_version))]
    return  numpy_version + 5
    
    
def list_based():
    # return [list_version[i] + 5 for i in range(len(list_version))]
    return list_version + [5]


In [3]:
# Addition of 2 list causes concatenation!

a = numpy_based()
b = list_based()

print(len(a), len(b))

1000 1001


In [4]:
def numpy_based():
    return [numpy_version[i] + 5 for i in range(len(numpy_version))]
    # return  numpy_version + 5
    
    
def list_based():
    return [list_version[i] + 5 for i in range(len(list_version))]
    # return list_version + [5]

In [5]:
import timeit


t_numpy = timeit.timeit(numpy_based, number=10000)
t_list = timeit.timeit(list_based, number=10000)

print(t_numpy, t_list)

2.7164272049994906 2.309035999000116


As is shown it NumPy is slower!!! No, it's not true if we know how to work with NumPy.

Let's use the fact that NumPy has <a href='https://numpy.org/doc/stable/user/basics.broadcasting.html'>broadcasting</a>!

In [6]:
def numpy_based():
    return  numpy_version + 5
    
    
def list_based():
    return [list_version[i] + 5 for i in range(len(list_version))]


In [7]:
t_numpy = timeit.timeit(numpy_based, number=10000)
t_list = timeit.timeit(list_based, number=10000)

print("NumPy: ", t_numpy, "\nlist: ", t_list)

NumPy:  0.025848785000562202 
list:  2.138067833000605


# <span id='creating'>1 . Creating a NumPy array </span>


`numpy.array` can be used to create a tensor

<center><img src='img/Tensor_01.webp' alt='tensor' width=400px></center>

## <span id='creating_with_list'> 1.1 Creating arrays from lists </span>


We can use `np.array()` to create an array from Python lists

```Python
np.array(list_name)
```

In [8]:
# Create 0-D array

a = np.array(2)

print("a = ", a, "; shape: ", a.ndim)
               

a =  2 ; shape:  0


In [9]:
# Create 1-D 'float' array

np.array([1.0, 4.3, 8., 9])

array([1. , 4.3, 8. , 9. ])

We can use the parameter `dtype` to specify the type of data,

```Python
np.array(list_name, dtype=desired_type)
```

In [10]:
# Create 1-D 'int' array

np.array([1.0, 4., 8., 9], dtype=np.int32)

array([1, 4, 8, 9], dtype=int32)

In [11]:
# Create 2-D array

np.array([[1.0, 4., 8., 9],
          [2, 4, 1, 3]], dtype=np.float32)

array([[1., 4., 8., 9.],
       [2., 4., 1., 3.]], dtype=float32)

## <span id='special_array'> 1.2 Special arrays </span>


It's better to use special methods in NumPy for larger arrays. These special arrays are:

- All zero array
- All one array
- Identity matrix
- Empty array
- Full array
- Random array
- Arrays based on a given range

These arrays are usually created with the following syntax:

```Python
np.array(shape=shape_in_tuple, dtype=type_of_data)
```


In [12]:
# Creating an array of zeros

np.zeros(shape=(5, 2), dtype=np.float32)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)

In [13]:
# Creating an array of zeros using zeros_like

np.zeros_like(np.ones(shape=(5, 2), dtype=np.float32))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)

To make it easier, we can let NumPy decide the type of data for the rest of this notebook.

In [14]:
# Creating an array of ones

np.ones(shape=(5, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [15]:
# Creating an identity matrix
# Note: np.eye doesn't get the sahpe as argument!

np.eye(5, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [16]:
# Creating an empty array

np.empty(shape=(5, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [17]:
# Creating an empty array using empty_like

np.empty_like(np.zeros((5, 2)))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [18]:
# Creating an array of random numbers (normally distributed)

np.random.normal(loc=0, scale=1, size=(5,2))

array([[ 1.10950833, -0.18875835],
       [ 1.84775684, -1.38415394],
       [-0.0469232 ,  0.77736926],
       [-0.48782197, -0.0695717 ],
       [-1.62951045, -0.94290243]])

In [19]:
# Creating an array of random numbers (uniformly distributed)

np.random.random((5, 2))

array([[0.7824413 , 0.63774011],
       [0.95011946, 0.18495806],
       [0.55353524, 0.32902158],
       [0.4666023 , 0.97147196],
       [0.84481943, 0.25071222]])

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
Generate an array with shape of [10, 10] and values between 5 and 7
</div>
<hr>

In [20]:
# Answer



In [21]:
# Creating an array of random numbers

np.random.rand(5, 2)

array([[0.22350683, 0.88635252],
       [0.4011363 , 0.80940536],
       [0.92025272, 0.68246219],
       [0.80570299, 0.00873666],
       [0.94589966, 0.6470263 ]])

In [22]:
# Creating an array of random integer numbers

np.random.randint(low=1, high=5, size=(5, 2))

array([[2, 1],
       [1, 3],
       [2, 2],
       [1, 2],
       [4, 1]])

In [23]:
# Creating an array in a range

np.arange(start=1, stop=4, step=0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9])

In [24]:
# Creating an array in a range

np.linspace(start=1, stop=4, num=10)

array([1.        , 1.33333333, 1.66666667, 2.        , 2.33333333,
       2.66666667, 3.        , 3.33333333, 3.66666667, 4.        ])

In [25]:
# Creating an array in a range

np.logspace(start=1, stop=4, num=10)

array([   10.        ,    21.5443469 ,    46.41588834,   100.        ,
         215.443469  ,   464.15888336,  1000.        ,  2154.43469003,
        4641.58883361, 10000.        ])

# <span id='attributes_array'> 2. Attributes of arrays </span>


To determine some properties of the array such as shape, size, etc.

In [26]:
x = np.random.randint(14, size=(4, 3))

print(x)

[[ 9 10  1]
 [ 2  6  3]
 [11  4  9]
 [ 0 13  9]]


In [27]:
print("Dimension: ", x.ndim) 
print("Shape: ", x.shape) 
print("Size: ", x.size) 
print("Type: ", x.dtype)
print("Element size: ", x.itemsize, 'bytes')  # Size of each element
print("Array size: ", x.nbytes, 'bytes')  # Size of the array

Dimension:  2
Shape:  (4, 3)
Size:  12
Type:  int64
Element size:  8 bytes
Array size:  96 bytes


# <span id='data_election'> 3. Data Selection</span>
 

***Indexing*** is used to select an individual element of an array

***Slicing*** is used to select a part of an array

## <span id='indexing'> 3.1 Array indexing</span>

Please note that Python counts from `0`.

In [28]:
x_1d = np.random.random((6, 1))
x_1d

array([[0.48568789],
       [0.7592944 ],
       [0.04604526],
       [0.39567276],
       [0.50078409],
       [0.72811495]])

In [29]:
# Access to the first element

x_1d[0]

array([0.48568789])

In [30]:
# Access to the last element

x_1d[-1]

array([0.72811495])

In [31]:
x_2d = np.random.random((4, 5))
x_2d

array([[0.83273126, 0.39332936, 0.34728474, 0.66084268, 0.34138511],
       [0.00287288, 0.07943705, 0.44493572, 0.82078526, 0.5133178 ],
       [0.9737959 , 0.3275705 , 0.86096964, 0.78429359, 0.17964624],
       [0.51064123, 0.53977941, 0.81069538, 0.84556417, 0.14876148]])

In [32]:
x_2d[0, 0]

0.8327312636837114

In [33]:
x_2d[0][0]

0.8327312636837114

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
What would be the result of <code>x_2d[0, 0]</code>?
</div>
<hr>

In [34]:
# Answer

## <span id='slicing'> 3.2 Array slicing</span>

For a 1-D slicing we use the following syntax

```Python
sliced = original[begining:end:step]
```

If `step` is not given, it will be considered as `1`.


In [35]:
x_1d

array([[0.48568789],
       [0.7592944 ],
       [0.04604526],
       [0.39567276],
       [0.50078409],
       [0.72811495]])

In [36]:
# Specify the start and end of the desired section by number

x_1d[1:4]

array([[0.7592944 ],
       [0.04604526],
       [0.39567276]])

<div class="alert alert-block alert-info">
<b>Tip:</b> Please consider that element 1 is included but that's not the case for element 4.</div>

In [37]:
"""
Specify the start or the end of the desired section that corresponds to
the start or the end of the array
"""

x_1d[1:]

array([[0.7592944 ],
       [0.04604526],
       [0.39567276],
       [0.50078409],
       [0.72811495]])

In [38]:
# Indexing with step greater than 1

x_1d[1::2]

array([[0.7592944 ],
       [0.39567276],
       [0.72811495]])

In [39]:
# We can reverse the array by indexing

x_1d[-1:0:-1]

array([[0.72811495],
       [0.50078409],
       [0.39567276],
       [0.04604526],
       [0.7592944 ]])

In [40]:
# Slicing in 2-D

x_2d[1:, 3:5]

array([[0.82078526, 0.5133178 ],
       [0.78429359, 0.17964624],
       [0.84556417, 0.14876148]])

## <span id='view_copy'> 3.3 Array view vs copy</span>

An extremely important thing to know is array slices return *views* of data rather than *copy*.
This is another difference between NumPy arrays and Python lists. 

In [41]:
print(x_1d)

x_1d_sliced = x_1d[0]

x_1d_sliced *= 2

print("======== \n", x_1d)


[[0.48568789]
 [0.7592944 ]
 [0.04604526]
 [0.39567276]
 [0.50078409]
 [0.72811495]]
 [[0.97137578]
 [0.7592944 ]
 [0.04604526]
 [0.39567276]
 [0.50078409]
 [0.72811495]]


To prevent any problems, you should use the method `copy()`

In [42]:
print(x_1d)

x_1d_sliced = x_1d[0].copy()

x_1d_sliced *= 2

print("======== \n", x_1d)

[[0.97137578]
 [0.7592944 ]
 [0.04604526]
 [0.39567276]
 [0.50078409]
 [0.72811495]]
 [[0.97137578]
 [0.7592944 ]
 [0.04604526]
 [0.39567276]
 [0.50078409]
 [0.72811495]]


## <span id='conditional'> 3.4 Conditional selection</span>

We can use a condition to select a part of an array.

In [43]:
x_1d

array([[0.97137578],
       [0.7592944 ],
       [0.04604526],
       [0.39567276],
       [0.50078409],
       [0.72811495]])

In [44]:
# Let's create a condition

cond = x_1d > 0.7
cond

array([[ True],
       [ True],
       [False],
       [False],
       [False],
       [ True]])

In [45]:
# Select the data based on the condition

x_1d[cond]

array([0.97137578, 0.7592944 , 0.72811495])

In [46]:
# Let's create a 2-D array

x_2d_int = np.random.randint(12, size=(5, 6))

x_2d_int

array([[ 8,  3,  5,  1,  6,  1],
       [ 4,  4,  1,  8,  9,  7],
       [ 5,  3,  4,  4,  0,  5],
       [ 4,  5, 10, 10,  2,  8],
       [ 9,  8,  6,  7,  5,  3]])

In [47]:
# Let's pull out the even numbers

x_2d_int[x_2d_int % 2 == 0]

array([ 8,  6,  4,  4,  8,  4,  4,  0,  4, 10, 10,  2,  8,  8,  6])

<hr>
<div>
<span style="color:#151D3B; font-weight:bold">Question: 🤔</span><p>
Using conditional indexing, pull out numbers in <code>x_2d_int</code> that are divisible by both 2 and 7.
</div>
<hr>

In [48]:
# Answer



# <span id='manipulation'> 4. Array manipulation</span>


Data manipulation is an important step in any study.

## <span id='shape'> 4.1 Shape of an array</span>


In [49]:
# Let's create some arrays

arr_1d = np.arange(10)
arr_2d = 5 * np.random.random((4, 5))


In [50]:
arr_1d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
arr_2d

array([[1.38542513e+00, 1.51614518e+00, 4.79886762e+00, 1.17417577e+00,
        2.64664662e+00],
       [2.50197334e+00, 1.56799664e+00, 1.97896550e+00, 2.80830784e+00,
        3.22131771e+00],
       [3.39466041e+00, 2.67900878e+00, 3.49448888e+00, 3.08106142e+00,
        5.27794479e-02],
       [3.41745265e+00, 3.79390415e+00, 8.29475307e-04, 3.47004919e+00,
        2.15404469e+00]])

In [52]:
# Shape of an array can be obtained in 2 ways

print(arr_1d.shape, np.shape(arr_1d))

(10,) (10,)


In [53]:
arr_2d.shape

(4, 5)

We can easily reshape a NumPy array


```Python

array.reshape(rows, columns)

np.reshape(array, newshape=(rows, columns))

```

In [54]:
arr_2d.reshape(5, 4)

array([[1.38542513e+00, 1.51614518e+00, 4.79886762e+00, 1.17417577e+00],
       [2.64664662e+00, 2.50197334e+00, 1.56799664e+00, 1.97896550e+00],
       [2.80830784e+00, 3.22131771e+00, 3.39466041e+00, 2.67900878e+00],
       [3.49448888e+00, 3.08106142e+00, 5.27794479e-02, 3.41745265e+00],
       [3.79390415e+00, 8.29475307e-04, 3.47004919e+00, 2.15404469e+00]])

In [55]:
np.reshape(arr_2d, newshape=(5, 4))

array([[1.38542513e+00, 1.51614518e+00, 4.79886762e+00, 1.17417577e+00],
       [2.64664662e+00, 2.50197334e+00, 1.56799664e+00, 1.97896550e+00],
       [2.80830784e+00, 3.22131771e+00, 3.39466041e+00, 2.67900878e+00],
       [3.49448888e+00, 3.08106142e+00, 5.27794479e-02, 3.41745265e+00],
       [3.79390415e+00, 8.29475307e-04, 3.47004919e+00, 2.15404469e+00]])

In [56]:
# We can flatten an array using attribute flatten

arr_2d.flatten()


array([1.38542513e+00, 1.51614518e+00, 4.79886762e+00, 1.17417577e+00,
       2.64664662e+00, 2.50197334e+00, 1.56799664e+00, 1.97896550e+00,
       2.80830784e+00, 3.22131771e+00, 3.39466041e+00, 2.67900878e+00,
       3.49448888e+00, 3.08106142e+00, 5.27794479e-02, 3.41745265e+00,
       3.79390415e+00, 8.29475307e-04, 3.47004919e+00, 2.15404469e+00])

<div class="alert alert-block alert-info">
<b>Tip:</b> Reshape doesn't act in place while resize returns <code>None</code> and acts in place.</div>


In [57]:
print("Original shape: ", arr_2d.shape)

arr_2d.reshape(5, 4)

print("Shape after using reshape: ", arr_2d.shape)

a = arr_2d.resize(5, 4)

print("Shape after using resize: ", arr_2d.shape)


Original shape:  (4, 5)
Shape after using reshape:  (4, 5)
Shape after using resize:  (5, 4)


<span style='color:red; font-weight:bold;'>Note:</span> <code>np.resize()</code> acts like <code>np.reshape()</code>.

In [58]:
print("Original shape: ", arr_2d.shape)

np.reshape(arr_2d, (5, 4))

print("Shape after using reshape: ", arr_2d.shape)

np.resize(arr_2d, (5, 4))

print("Shape after using resize: ", arr_2d.shape)

Original shape:  (5, 4)
Shape after using reshape:  (5, 4)
Shape after using resize:  (5, 4)


## <span id='joining'> 4.2 Joining arrays </span>


In [59]:
arr1 = np.array([[1, 2, 0, 1]])

arr2 = 6 * np.random.random((4, 4))

In [60]:
# Check the shape
print('arr1: ', arr1.shape)
print('arr2: ', arr2.shape)

arr1:  (1, 4)
arr2:  (4, 4)


In [61]:
# Joining arrays by row

np.concatenate((arr2, arr1), axis=0)

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584],
       [1.        , 2.        , 0.        , 1.        ]])

For using `axis = 0`, we should have the same number of columns and for `axis = 1`, we should have the same number of rows.

In [62]:
# Joining arrays by column
# .T takes the transpose as well as np.transpose()
np.concatenate((arr2, arr1.T), axis=1)

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 , 1.        ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751, 2.        ],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088, 0.        ],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584, 1.        ]])

In [63]:
# Joining by vstack which acts as using axis = 0

np.vstack((arr2, arr1))

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584],
       [1.        , 2.        , 0.        , 1.        ]])

In [64]:
# Joining by hstack which acts as using axis = 1

np.hstack((arr2, arr1.T))

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 , 1.        ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751, 2.        ],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088, 0.        ],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584, 1.        ]])

In [65]:
# Joining by column_stack / row_stack which acts as using axis = 1 / 0. 

np.column_stack((arr2, arr1.T))

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 , 1.        ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751, 2.        ],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088, 0.        ],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584, 1.        ]])

## <span id='splitting'> 4.3 Splitting of arrays </span>


The opposite of concatenation is splitting.
 
`np.split`, `np.hsplit`, `np.vsplit`

In [66]:
arr2

array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 ],
       [5.46657132, 3.69804072, 3.56136423, 1.10292751],
       [2.22284924, 3.75818784, 1.95416011, 1.98513088],
       [2.64774917, 3.40969939, 4.79804767, 2.91778584]])

In [67]:
# Splitting with axis = 0

np.split(arr2, 2)

[array([[1.43797918, 3.08370409, 1.37835918, 5.9672563 ],
        [5.46657132, 3.69804072, 3.56136423, 1.10292751]]),
 array([[2.22284924, 3.75818784, 1.95416011, 1.98513088],
        [2.64774917, 3.40969939, 4.79804767, 2.91778584]])]

In [68]:
# Splitting with axis = 1

np.array_split(arr2, 2, axis=1)

[array([[1.43797918, 3.08370409],
        [5.46657132, 3.69804072],
        [2.22284924, 3.75818784],
        [2.64774917, 3.40969939]]),
 array([[1.37835918, 5.9672563 ],
        [3.56136423, 1.10292751],
        [1.95416011, 1.98513088],
        [4.79804767, 2.91778584]])]

In [69]:
# Using hsplit / vsplit

np.hsplit(arr2, 2)

[array([[1.43797918, 3.08370409],
        [5.46657132, 3.69804072],
        [2.22284924, 3.75818784],
        [2.64774917, 3.40969939]]),
 array([[1.37835918, 5.9672563 ],
        [3.56136423, 1.10292751],
        [1.95416011, 1.98513088],
        [4.79804767, 2.91778584]])]

# <span id='computation'> 5. Computation on NumPy arrays </span>


Computation on NumPy arrays can be very fast if we use *vectorized* operators through *universal functions*, **ufuncs**.

In [70]:
# Creating two arrays

arr1d_1 = np.arange(7)
arr1d_2 = np.linspace(9, 12, len(arr1d_1))


arr2d_1 = np.random.random((4, 5))
arr2d_2 = 4 + 6 * np.random.random((4, 5))

In [71]:
# Addition of two arrays (equivalent of arr1d_1 + arr1d_2)

np.add(arr2d_1, arr2d_2)

array([[7.75930913, 8.49997751, 7.52773965, 4.62392662, 4.82325292],
       [9.81257135, 6.28234518, 6.91979991, 8.36409127, 8.7820045 ],
       [8.3915722 , 6.44856878, 5.96401878, 8.2165082 , 8.48119528],
       [9.99641515, 8.67713518, 7.29307198, 5.54470499, 7.97909863]])

In [72]:
# Subtraction of two arrays (equivalent of arr1d_1 - arr1d_2)

np.subtract(arr2d_1, arr2d_2)

array([[-6.38435305, -8.41131312, -6.94033846, -4.07561066, -4.1624837 ],
       [-9.28634503, -4.57730311, -5.0473921 , -7.80028588, -7.68742435],
       [-7.97276731, -4.5835921 , -4.21999282, -8.0468603 , -6.83247076],
       [-8.13669295, -7.05139444, -5.70389709, -4.50677851, -7.54836934]])

In [73]:
# Multiplication of two arrays (equivalent of arr1d_1 * arr1d_2)

np.multiply(arr2d_1, arr2d_2)

array([[4.86172855, 0.37485733, 2.12464157, 1.19252379, 1.48437455],
       [2.51258808, 4.6290393 , 5.60186594, 2.27839076, 4.50677747],
       [1.71336635, 5.14368068, 4.44029514, 0.68976156, 6.31200415],
       [8.43063594, 6.39262783, 5.16361423, 2.60817522, 1.67203382]])

In [74]:
# Division of two arrays (equivalent of arr1d_1 + arr1d_2)

np.divide(arr2d_1, arr2d_2)

array([[0.09721358, 0.00524291, 0.04059981, 0.06302818, 0.07353534],
       [0.02755268, 0.15700712, 0.15646175, 0.0348795 , 0.06646133],
       [0.02559253, 0.16904908, 0.17125137, 0.01043129, 0.10766361],
       [0.10255948, 0.10336254, 0.12227273, 0.10326102, 0.02773983]])

In [75]:
# Logarithm of an array

np.log10(arr2d_1)

array([[-0.16274117, -1.35328075, -0.53209517, -0.5619991 , -0.48098019],
       [-0.57985744, -0.0692949 , -0.02862955, -0.54990077, -0.26178243],
       [-0.67901825, -0.03035659, -0.05947705, -1.07148152, -0.0838819 ],
       [-0.03158192, -0.08997871, -0.0998583 , -0.2848634 , -0.66682559]])

In [76]:
# Exponent of an array

np.exp(arr2d_1)

array([[1.98869379, 1.04532955, 1.34138223, 1.3154226 , 1.39150321],
       [1.30097392, 2.34555262, 2.55028191, 1.32564973, 1.72856239],
       [1.23294109, 2.54082374, 2.39172049, 1.08852541, 2.28042598],
       [2.53415716, 2.25436958, 2.21352761, 1.68028469, 1.24031409]])

In [77]:
# Sin of an array

np.sin(arr2d_1)

array([[0.63459011, 0.04431768, 0.28949633, 0.27073646, 0.32440686],
       [0.26008782, 0.75294186, 0.8053134 , 0.27818375, 0.52037503],
       [0.20787544, 0.80310507, 0.76562541, 0.08472226, 0.73411489],
       [0.8015369 , 0.7262633 , 0.71357464, 0.49598015, 0.21370366]])

In [78]:
# Comparison for greate, equivalent arr2d_1 > arr2d_2

np.greater(arr2d_1, arr2d_2)

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

In [79]:
# Comparison for greate, equivalent arr2d_1 < arr2d_2

np.less(arr2d_1, arr2d_2)

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

In [80]:
# FInding the absolute of an array, equivalent of np.absolute

np.abs([2, -1, 9, -1.2])

array([2. , 1. , 9. , 1.2])

# <span id='aggregations'> 5. Aggregations </span>


Before doing any operation, it's good to have a summary statistics of the data.

## <span id='summation'> 5.1 Summation </span>


In [81]:
# Let's create an array

arr1 = np.random.random((100, 100))


In [82]:
# Sum of values in an array

np.sum(arr1)

4998.274156770811

In [83]:
# Comparing with Python built-in function

big_array = np.random.random(100000)

%timeit sum(big_array)
%timeit np.sum(big_array)

6.38 ms ± 59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
34.6 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


## <span id='min_max'> 5.2 Minimum and maximum </span>


In [84]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [85]:
# Finding max along columns

arr1.max(axis=0)

array([0.76284775, 0.88241264, 0.41080389, 0.99720243, 0.9340556 ])

In [86]:
# Finding max along rows
arr1.max(axis=1)

array([0.88241264, 0.84031317, 0.87339475, 0.99720243, 0.9340556 ,
       0.48860081])

In [87]:
# Finding min
arr1.min(axis=0)

array([0.22657389, 0.3251742 , 0.02838964, 0.14588928, 0.14716531])

In [88]:
# We can use np.max / np.min as well.
np.max(arr1, axis=1)

array([0.88241264, 0.84031317, 0.87339475, 0.99720243, 0.9340556 ,
       0.48860081])

## <span id='var_std'> 5.3 Variance and standard deviation </span>


In [89]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [90]:
# calculating the standard deviation of the array

np.std(arr1)

0.26670309703717116

In [91]:
arr1.std()

0.26670309703717116

In [92]:
# Let's check if std is variance^0.5

arr1.std() == np.sqrt(arr1.var())

True

In [93]:
# std along rows

arr1.std(axis=1)

array([0.19723902, 0.29890297, 0.10744831, 0.23908449, 0.26260969,
       0.23893918])

In [94]:
# variance along columns

arr1.var(axis=0)

array([0.03991367, 0.07439458, 0.06169437, 0.04754803, 0.03695642])

## <span id='mean_median'> 5.4 Mean and median </span>


In [95]:
# Let's create an array

arr1 = np.random.random((6, 5))

In [96]:
# Calculate the mean of the whole array

arr1.mean()

0.43161506591530596

In [97]:
# Calculate the mean along an axis

arr1.mean(axis=1)

array([0.42323839, 0.53211463, 0.45771901, 0.3327895 , 0.28020037,
       0.56362848])

In [98]:
# Calculate the mean of the whole array

np.median(arr1)

0.435194273766601

<span style='color:red; font-weight:bold;'>Note: </span> An <code>ndarray</code> doesn't have attribute median.

## <span id='find_index'> 5.5 Find index </span>


In [99]:
# Let's create an array

arr1 = np.random.random((6, 5))
arr1[3, 3] = 0
arr1

array([[0.39186318, 0.15403847, 0.01160238, 0.68944436, 0.57876947],
       [0.73382217, 0.30286489, 0.88302937, 0.60859471, 0.81744685],
       [0.14421651, 0.44724879, 0.60699837, 0.96239542, 0.56248908],
       [0.87499291, 0.02157416, 0.62511732, 0.        , 0.12221023],
       [0.44505579, 0.91808003, 0.68690115, 0.93231476, 0.16587409],
       [0.17299195, 0.60723523, 0.95661858, 0.53529122, 0.84379222]])

In [100]:
# Find the indix of maximum value

arr1.argmax()

13

In [101]:
# Find the indix of minimum value

arr1.argmin()

18

In [102]:
# Find the indix of an specific value

np.where(arr1 == 0)

(array([3]), array([3]))

### <a href='#PYTHON'>TOP ☝️</a>
