# Day 01: Essentials of NumPy, Pandas, Matplotlib, Seaborn and Sklearn

We are goin to loook at the essential functions of NumPy, Pandas, Matplotlib, Seaborn and Sklearn for data analysis.

## 1.0 NumPy

NumPy is a Python library that is commonly used for
scientific computing and data analysis. It stands for
Numerical Python and provides fast and efficient numerical
computation on large datasets. NumPy is built on top of the
C language, which makes it faster than pure Python code.

We will cover some of the most important functions of
NumPy that are used for data analysis. These functions will
include:

### 1.1 Creating NumPy Arrays

NumPy arrays are similar to Python lists, but they are more
efficient when it comes to numerical computation. They
can be one-dimensional or multi-dimensional and can hold
homogeneous elements. Here are some functions that can
be used to create NumPy arrays:

#### 1.1.1 `np.array()`

If we want to create an array from a list of tuples. we can use the `np.array()` function. Let's create a one-dimensional array below:

In [1]:
# imports
import numpy as np
import pandas as pd

In [2]:
lst = [1, 2, 3, 4, 5, 6]

arr = np.array(lst)
arr

array([1, 2, 3, 4, 5, 6])

The array data type is inferred from the data. However, we can also set the data type using the __dtype__ parameter. See below:

In [3]:
arr = np.array(lst, dtype=float)
arr

array([1., 2., 3., 4., 5., 6.])

We can also create a multi-dimensional array from a nested list. We are going to create an array and check how any dimensions it has using the __ndim__ attribute. See below:

In [4]:
names = [["Jon", "Mary", "Paul"], ["Peter", "Ben", "Saul"]]
arr1 = np.array(names)
arr1

array([['Jon', 'Mary', 'Paul'],
       ['Peter', 'Ben', 'Saul']], dtype='<U5')

In [5]:
arr1.ndim

2

#### 1.1.2 `np.arange()`

This function creates an array with regularly spaced values
within a given range. It works similarly to the Python
arange() function when applied to numbers. Here is an
example of how to use it to generate an array.

In [6]:
arr2 = np.arange(0, 50, 5)
arr2

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

We have generated and array of values from 0 to 50 (50 excluded), spaced by 5.

#### 1.1.3 `np.zeros()`

The `np.zeros()` function creates an array with all elements
set to zero. It allows us to set the shape and data type of the
array. Below, we generate an array of 2 rows and 3 columns
of float __dtype__.

In [7]:
arr3 = np.zeros((2, 3), dtype=float)
arr3

array([[0., 0., 0.],
       [0., 0., 0.]])

#### 1.1.4 `np.ones()`

The `np.ones()` function works similarly to the `np.zeros()` function. The only difference is that the `np.ones()` function creates an array with all elements set to one (1).

In [8]:
arr3 = np.ones((2, 3), dtype=float)
arr3

array([[1., 1., 1.],
       [1., 1., 1.]])

#### 1.1.5 `numpy.random.Generator.integers`

The `numpy.random.Generator.integers` function is used by the NumPy library for generating random integers. It allows the generation of random integers within a specified range or from a specified set of values.

Let's say we want to create an array of random integers from 0 to 9, with 2 rows and 4 columns. First, we will create a random number generator, and then we will use the generator to generate the array. 

In [9]:
rng = np.random.default_rng()
rng.integers(low=0, high=10, size=(2, 4))

array([[8, 3, 7, 7],
       [6, 3, 2, 7]])

Note that the output will aways vary because it is generated randomly.

#### 1.1.6 `numpy.random.Generator.random`

This method creates an array of random floats. Let's create an array of random floats between 0 and 1 in the shape of 2 rows and 4 columns.

In [10]:
rng = np.random.default_rng(seed=1986)
rng.random((2, 4))

array([[0.17027901, 0.55975274, 0.08043695, 0.66573303],
       [0.83314157, 0.36443747, 0.74311032, 0.26498697]])

Since the output is random, to ensure that the output is reproducible, we set the seed parameter to 1986. This means that the generated random state will be repeated every time you run the code.

### Accessing Array Elements

Once you have created a NumPy array, you may want to
access its elements. NumPy arrays are indexed using
integers starting from zero (0). Here are some ways to
access NumPy array elements:

### 1.2.1 Slicing

This is similar to Python lists, where you can access a range
of elements using a colon (:). We are going to use the array
of names to demonstrate how to use indexing to select
elements from the array.

In [11]:
arr1

array([['Jon', 'Mary', 'Paul'],
       ['Peter', 'Ben', 'Saul']], dtype='<U5')

Let's say we want to select the names "Mary" and "Paul"
from the array. Here is how we can use slicing for such an
operation:

In [12]:
select_mary_paul = arr1[0, 1:]
select_mary_paul

array(['Mary', 'Paul'], dtype='<U5')

We first access row [0], which has the names "Mary" and
"Paul." Then we use the comma (,) to jump into the row.
Since Mary is sitting on index 1, we select `arr1[0,1]`. We then
use the colon (:) after 1 to tell the code that we want "Mary"
and everything in the row after Mary to be selected (since
the name "Paul" is the only name after Mary in the row).
Putting it all together as `arr1[0, 1:]` allows us to achieve this
selection.

### 1.2.2 Fancy Indexing

This allows you to select elements based on a list of indices.
If we want to select "Peter" and "Paul" from the array; here
is how we do it using fancy indexing:

In [13]:
# Creating an elements indices
select_peter = np.array([1,0])
select_paul = np.array([0,2])

In [15]:
select_peter_paul = arr1[select_peter, select_paul]
select_peter_paul

array(['Peter', 'Paul'], dtype='<U5')

Here, we created two arrays of indices, `select_peter` and
`select_paul`, that contain the row and column indices of the
elements we want to select from the original array, arr1. We
then passed these arrays of indices to `arr1` to select the
corresponding elements and create a new array. Note that
the arrays of indices must have the same shape to use fancy
indexing with multi-dimensional arrays.

### 1.2.3 Boolean Indexing

This allows you to select elements based on a Boolean
array. It involves creating a Boolean array of the same
shape as the original array, where the elements of the
Boolean array are true or false based on a given condition.
This Boolean array is then used to select the corresponding
elements from the original array. Let's say we want to
remove all the numbers from an array that are odd.

In [18]:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [20]:
filter_array = arr % 2 == 0
filter_array

array([[False,  True, False],
       [ True, False,  True],
       [False,  True, False]])

In [21]:
arr[filter_array]

array([2, 4, 6, 8])

## 1.3 Array Manipulation

NumPy provides a wide range of functins to manipulate arrays. These include:

### 1.3.1 `np.reshappe()`

We can change the shape of an array using the `np.reshape()` function. Here is how we can use reshape to flatten a muultidimensional array.

In [22]:
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [23]:
np.reshape(arr, 9)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In the code above, we create a 2-dimensional array and then use the `np.reshape()` function to flattern the array. The number 9 is the number of elements in the oriinal array.By passing the number 9, we are telling the reshape function to return a 1-dimensional array.

Here is another example of how to reshape an array. We will reshape the array of names below:

In [24]:
names 

[['Jon', 'Mary', 'Paul'], ['Peter', 'Ben', 'Saul']]

In [25]:
new_array = np.reshape(arr1, (3,2))
new_array

array([['Jon', 'Mary'],
       ['Paul', 'Peter'],
       ['Ben', 'Saul']], dtype='<U5')

Above, we have an `names` of names for shape (2, 3). We then use the `reshape()` function to reshape the array to a (3, 2)shape and assign it to a new variable.

### 1.3.2 

This function joins two or more arrays together. We can pick the axis on wich we want the arrays to be joined. If we do not provide the axis, it will be joined on axis 0.

In [39]:
arr1 = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr2 = np.array([[100,110, 120],[130,140,150]])

In [30]:
arr3 = np.concatenate((arr1, arr2), axis=0)
arr3

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [ 70,  80,  90],
       [100, 110, 120],
       [130, 140, 150]])

Note that for concatenation to work, the arrays must have the same number of dimensions. If we are to try to join these two arrays on axis 1 (rows), we would get an error because the two arrays do not have an equal number of rows.

### 1.3.3 `np.split()`

This function splits an array into smaller arrays. Below, we use the `np.split()` function to split the array into two parts. You can see in the output that we have created two arrays from `arr1:split_one` and `split_two`.

In [40]:
names = np.array(names)
names

array([['Jon', 'Mary', 'Paul'],
       ['Peter', 'Ben', 'Saul']], dtype='<U5')

In [43]:
split_one, split_two = np.split(names, 2)

In [44]:
split_one

array([['Jon', 'Mary', 'Paul']], dtype='<U5')

In [45]:
split_two

array([['Peter', 'Ben', 'Saul']], dtype='<U5')

### `np.transpose()`

This function transposes an array, switching its rows and columns. Transposing an array is especially useful when working with linear algebra operations such as matrix multiplication. In some cases, the multiplication of two matrices requires that one of the matrices be transposed in order for the operation to be performed correctly. For example, if we try to carry out a dot operation on the two arrays below, we get a value error because the arrays have different shapes

---
```python
arr1 = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr2 = np.array([[100,110, 120],[130,140,150]])

np.dot(arr1, arr2)
```
---
```bash
ValueError                            Traceback (most recent call last)
Input In [21]......................
```
---

So to make the dot operation possible, we have to transpose one of the arrays. We transpose `arr2`.

In [46]:
arr1

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [47]:
arr2

array([[100, 110, 120],
       [130, 140, 150]])

In [48]:
np.dot(arr1, arr2.transpose())

array([[ 6800,  8600],
       [16700, 21200],
       [26600, 33800]])

You can see in the output that the operation was successful after transposing the array.

## 1.4 Mathematical Functions

NumPy provides a wide range of mathematical functions that can be used on arrays. These include:

### 1.4.1 `np.add()` and `np.subtract()`

The `np.add()` adds two arrays together, and the `np.subtract()` function subtracts one array from another (element-wise). Here are the examples below:

In [49]:
arr1 = np.array([[1,2,4], [6,7,8]])
arr2 = np.array([[3,3,6], [4,5,7]])

In [50]:
np.add(arr1, arr2)

array([[ 4,  5, 10],
       [10, 12, 15]])

In [51]:
np.subtract(arr1, arr2)

array([[-2, -1, -2],
       [ 2,  2,  1]])

### 1.4.2 `np.multiply()` and `np.divide()`

The `np.multiply()` function multiplies two arrays together.
It performs element-wise multiplication of two arrays,
where the corresponding elements from the two arrays are
multiplied together. The `np.divide()` function divides one
array by another (element-wise). Note that for these
operations to work, the arrays must be broadcastable in a
common shape.

In [52]:
np.multiply(arr1, arr2)

array([[ 3,  6, 24],
       [24, 35, 56]])

In [53]:
np.divide(arr1,  arr2)

array([[0.33333333, 0.66666667, 0.66666667],
       [1.5       , 1.4       , 1.14285714]])

### 1.4.3 `np.power()` and `np.sqrt()`

This `np.power()` function raises an array to a given power. The `np.sqrt()` function calculates the square root of each element in an array.

In [54]:
np.power(arr2, 2)

array([[ 9,  9, 36],
       [16, 25, 49]])

In [55]:
np.sqrt(arr2)

array([[1.73205081, 1.73205081, 2.44948974],
       [2.        , 2.23606798, 2.64575131]])

## 1.5 Statistical Functions

NumPy also provide a wide range os statistical functions that can be used on arrays. These include:

### `np.mean()`

The `np.mean()` function calculates the mean of an array. By
default, this function will flatten the array and calculate the
mean of the flattened array. You can also specify the axis
on which you want the mean to be calculated. Below, we
calculate the mean on axis 1. This means we calculate the
mean of each row.

In [56]:
arr2

array([[3, 3, 6],
       [4, 5, 7]])

In [58]:
np.mean(arr2, axis=1)

array([4.        , 5.33333333])

### 1.5.2 `np.median()`

This function works similar to the `np.mean()` function. In the example below, we calculate the median of each row:

In [59]:
arr1

array([[1, 2, 4],
       [6, 7, 8]])

In [60]:
np.median(arr1, axis=1)

array([2., 7.])

### 1.5.3 `np.std()`

Standard deviation is a measure of how spread out the
data is. It tells you how much the data points typically
vary from the average (mean). To calculate the standard
deviation of an array, we can use the `np.std()` function.
Below, we calculate the std of each column.

In [61]:
arr2

array([[3, 3, 6],
       [4, 5, 7]])

In [62]:
np.std(arr2, axis=0)

array([0.5, 1. , 0.5])

### 1.5.4 `np.var()`

Variance measures the average of the squared differences
from the mean. The `np.var()` function calculates the
variance of an array. Here is an example of how we can
calculate var on axis 1 of the array:

In [63]:
arr2

array([[3, 3, 6],
       [4, 5, 7]])

In [64]:
np.var(arr2, axis=0)

array([0.25, 1.  , 0.25])

### 1.5.5 `np.min()` and `np.max()`

This `np.min()` calculates the minimum value of an array,
and this `np.max()` calculates the maximum value of an
array. First, we calculate the minimum of each column, and
in the second example, we calculate the mean of the whole
array. See below.

In [65]:
arr2

array([[3, 3, 6],
       [4, 5, 7]])

In [67]:
np.min(arr2, axis=0)

array([3, 3, 6])

In [69]:
np.max(arr2, axis=None)

np.int64(7)