### Table of Content
* [1. Import Statement](#1)
* [2. Pandas](#2)
    * [2.1 Series](#2.1)
    * [2.2 DataFrames](#2.2)
    * [2.3 Exercise](#2.3)
* [3. Numpy](#3)
    * [3.1 Topics](#3.1)
    * [3.2 Exercise](#3.2)
* [4. Matplotlib](#4)
    * [4.1 Introduction](#4.1)
    * [4.2 Matplotlib with Numpy](#4.2)
    * [4.3 Matplotlib with Pandas](#4.3)

<a id='1'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">1. Import Statement</p>

In Python, the import statement is used to bring external modules, libraries, or specific functions and classes into your program.

**1. Import the entire module:**

```python
import module_name
```


In [None]:
import pandas  # Full module import

df = pandas.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)

**2. Import specific functions or classes from a module:**

```python
from module_name import function_name, ClassName
```


In [None]:
from pandas import DataFrame

df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)

**3. Import a module and assign it an alias (shorthand):**

```python
import module_name as alias
```


In [None]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)

**4. Import specific functions or classes with an alias:**

```python
from module_name import function_name as alias
```

In [None]:
from pandas import DataFrame as DF

df = DF({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)


<a id='2'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2. Pandas</p>

- Pandas is a library that is used for data manipulation and to analyze large amounts of tabular data. 
- We will need to import the Pandas package into our workspace with `import pandas as pd`.


In [None]:
# Install a pip package in the current Jupyter kernel
# import sys
# !{sys.executable} -m pip install pandas

In [None]:
import pandas as pd

print(pd.__version__)

<a id='2.1'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.1 Series</p>

- Panda `Series` is a 1-dimensional array capable of holding data of any type (integer, string, float, python objects, etc.)
- It is a homogeneous data structure, where its elements are of the same data type.
- `Series` can be created from a list, numpy array, or dictionary.

In [None]:
import pandas as pd

# Creating Series from list
number_list = [10, 20, 30, 40.0]

sr = pd.Series(number_list)
sr

**Note:**

- The values are labelled with the index number by default.
- Custom labels can be created with the `index` parameter.
- This label can be used to access a specified value.

In [None]:
# Creating Series from list, with custom index
labels = ['w', 'x', 'y', 'z']
number_list = [10, 20, 30, 40]

sr = pd.Series(data=number_list, index=labels)
sr

---
### Creating Series out of Dictionary

- When creating a Series with dictionary, the keys of the dictionary become the labels.

In [None]:
# Dictionary keys become the index
simple_dict = {'w': 10, 
               'x': 20, 
               'y': 30, 
               'z': 40}

pd.Series(simple_dict)

---
### Accessing Values

- In Pandas Series, accessing a row value is similar to accessing a value in dictionary.

For example:

In [None]:
sr

In [None]:
# Access the value like dictionary
print(sr['w'])
print()
# Access multiple values by passing in a list of values
print(sr[['y', 'z']])

<a id='2.2'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2 DataFrames</p>

- DataFrame is Panda's 2-dimensional, heterogeneous data structure (i.e. elements can be of different data types).
- Pandas DataFrame is often created by loading datasets from an existing storage.
- It can also be created from list, dictionary of list etc.
 - When creating from a dictionary, the dictionary key will be used as the column name.

In [None]:
# Install a pip package in the current Jupyter kernel
# import sys
# !{sys.executable} -m pip install openpyxl

In [None]:
# Example 1: Creating dataframe from an external dataset
import pandas as pd

df = pd.read_excel('datasets/Consumer.xlsx', sheet_name='Data1')
print(type(df))

In [None]:
# Example 2: Creating dataframe from a list
import pandas as pd
 
sports = ['Cricket', 'Football','Basketball', 'Golf']
 
df = pd.DataFrame(sports, columns=['Sport'])
df

In [None]:
# DataFrame columns are essentially made up of Series.
print(type(df['Sport']))

In [None]:
# Creating DataFrame from a dictionary of list
import pandas as pd
 
data = {'Sports': ['Cricket', 'Football','Basketball', 'Golf'],
        'Audience': [100, 200, 300, 400]}
 
df = pd.DataFrame(data)
df

---
### Retrieving Data from a DataFrame

- Both Selection and Indexing refer to methods in which we can retrieve data from a DataFrame.

In [None]:
import numpy as np
np.random.seed(1)
from numpy import random

df = pd.DataFrame(random.randint(low=40, high=100, size=(5, 5)),
                         index=['Aaron', 'Bob', 'Charlie', 'Desmond', 'Elliot'],
                         columns=['Value1', 'Value2', 'Value3', 'Value4', 'Value5'])

df

**<u>Note:</u>**

- Random numbers in computer is truly not random at all (known as pseudo-random) as they are generated with a mathematical formula.
- In Pandas, the `seed()` method is used to initialize the random number generator.
- If the same seed value is used, we will always get the same random number.

In [None]:
# Accessing the values for column Value 1
df['Value1']

In [None]:
# To display multiple columns, pass in a list of column names
# The order of columns determines the displayed output
# Display column Value2 before Value1
df[['Value2', 'Value1']]

---
### Adding a New Column to a DataFrame

- Use the `assign()` function to add a new column to the end of a DataFrame.
- It returns a new DataFrame with the newly added columns.

In [None]:
# Current df
df

In [None]:
df = df.assign(halved_value1=df['Value1']/2)
print(df)

- Use the `insert()` function to add a new column at a specific index in a DataFrame.

In [None]:
import numpy as np

df.insert(loc=0, 
          column='Before_Value1', 
          value=[10, 20, 30, 40, 50])
df

- Assign a list of values to the new column, ensuring the lenth are the same as dataframe

In [None]:
df['new_value'] = [45, 14, 56, 25, 2]
df

---
### Removing Columns from a DataFrame

- Removing rows is done with the `drop` method with the argument `axis=0`.
- Removing columns is done with the `drop` method with the argument `axis=1`.

In [None]:
df

In [None]:
# Dropping column Half_of_Value1
df.drop('halved_value1', axis=1)

**Note:** However, the column is not dropped permanently. To drop a column permanently, set the `inplace` parameter to `True`.

In [None]:
# Column is not dropped unless the inplace argument is set to True
df

In [None]:
# Dropping the column in-place
df.drop('halved_value1', axis=1, inplace=True)
df

**Note:** Similarly, the row is only dropped when the `inplace` parameter is set to `True`.

In [None]:
# Drop Aaron inplace
df.drop('Aaron', axis=0, inplace=True)
df

---
### Selecting Rows

- The `.loc()` method retrieves a record with the label.
- The `.iloc()` method retrieves a record with an index value.

In [None]:
# Retrieve (locate) the record of Bob
df.loc['Bob']

In [None]:
# Retrieve multiple records
persons_of_interest = ['Bob', 'Desmond', 'Elliot']
df.loc[persons_of_interest]

# df.loc[['Bob', 'Desmond', 'Elliot']]

- Selecting using the index position with the `.iloc()` (index locate) method.

In [None]:
df

In [None]:
# Search with the index locate method; Bob is located at index 0
df.iloc[0]

In [None]:
# Retrieve multiple records
lucky_numbers = [1, 2, 3]
df.iloc[lucky_numbers]

<a id='2.2.1'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.1 More DataFrame Index Details</p>

- Use `reset_index()` to reset the index back to the default value.
- The custom index would be returned as a column.
- Set `inplace=True` for a permanent change.

In [None]:
# Current index
df.index

In [None]:
# Reset to the default index value instead of A to J
df.reset_index(inplace=True)
print(df.index)
print()
print(df)

- Use the `set_index()` method to set a custom index.

In [None]:
# Setting up a new index value
df.set_index('index', inplace=True)
df

<a id='2.2.2'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.2 Sorting</p>


In [None]:
# Create a DataFrame with unsorted index
df = pd.DataFrame(np.random.randn(10, 2),
                           index=[1, 4, 6, 2, 3, 5, 9, 8, 0, 7],
                           columns=['col1', 'col2'])

df

In [None]:
# Sort by index (ascending order by default)
df.sort_index(inplace=True)
df

In [None]:
# Sort by index in descending order
df.sort_index(ascending=False, inplace=True)
df

In [None]:
# Sort by column values
df.sort_values(by='col1')

<a id='2.2.3'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.3 Missing Data</p>


- Here are some techniques to deal with missing data using Pandas.
- Missing values are displayed as `NaN` (Not a Number) in Pandas.

In [None]:
import numpy as np
import pandas as pd

dataframe = pd.DataFrame({'Cricket':[1, 2, np.nan, 4, 6, 7, 2, np.nan],
                          'Baseball':[5, np.nan, np.nan, 5, 7, 2, 4, 5],
                          'Tennis':[1, 2, 3, 4, 5, 6, 7, 8]})

dataframe

- Use the `.isnull()` method to check for missing values.
- It is often chained to the `.sum()` method to gather the total missing values.

In [None]:
# dataframe.isnull()
dataframe.isnull().sum()

- The `dropna()` method removes entries with missing values.
- Set `axis=1` to drop columns with NaN values.
- Set `inplace=True` for permanent change.

In [None]:
# Row 1, 2, 7 will be dropped
dataframe.dropna()

In [None]:
# Drop columns with NaN values; Cricket and Baseball will be dropped
dataframe.dropna(axis=1)

In [None]:
# Replace missing values with the mean value of Baseball
mean_baseball = dataframe['Baseball'].mean()

dataframe['Baseball'].fillna(value=mean_baseball, inplace=True)
dataframe

In [None]:
# Replace all missing values with 0
dataframe.fillna(value=0)

<a id='2.2.4'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.4 Groupby</p>

- Use the `.groupby()` method to group rows together based on a column name.

In [None]:
# Create a Dataframe
import pandas as pd

data = {'CustID':['1001', '1001', '1002', '1002', '1003', '1003'],
        'CustName':['UIPat', 'DatRob', 'Goog', 'Chrysler', 'Ford', 'GM'],
        'ProfitInMil':[2.0, 3.2, 1.2, 8.7, 5.4, 3.5]}

dataframe = pd.DataFrame(data)
dataframe

- Suppose we want to group by `CustID`, after we perform the grouping, it will create a `DataFrameGroupBy` object.

In [None]:
# DataFrameGroupBy object
dataframe.groupby('CustID')

- This GroupBy object can be used to calculate various aggregations such as the average, min, standard deviation etc.

For example:

In [None]:
CustID_grouped = dataframe.groupby("CustID")
# CustID_grouped.mean()
CustID_grouped.mean(numeric_only='False')

In [None]:
# Standard Deviation
# CustID_grouped.std()
CustID_grouped.std(numeric_only='False')

- The `describe()` method provides some basic statistical description about the data.

In [None]:
# dataframe
dataframe.describe()

In [None]:
# Provides some basic descriptive analytics that describe the data
CustID_grouped.describe()

---
## Table Joins

- DataFrames can be joined to produce a resultset using the `merge()` method.

In [None]:
Table1 = pd.DataFrame({'CustID': ['1001', '1002', '1003', '1004'],
                       'Q1': ['101', '102', '103', '104'],
                       'Q2': ['201', '202', '203', '204']})
Table1

In [None]:
Table2 = pd.DataFrame({'CustID': ['1001', '1002', '1003', '1004'],
                       'Q3': ['301', '302', '303', '304'],
                       'Q4': ['401', '402', '403', '404']})
Table2

In [None]:
# Merge using inner join
pd.merge(Table1, Table2, how='inner', on='CustID')

<a id='2.2.5'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.5 Operations</p>

### The `head()` Method

- The `head()` method previews the first 5 lines of a DataFrame.
- The `columns` attribute returns a list of column names.

In [None]:
import pandas as pd

dataframe = pd.DataFrame({'custID':[1, 2, 3, 4, 5, 6],
                          'SaleType':['big', 'small', 'medium', 'big', 'small', 'small'],
                          'SalesCode':['121', '131', '141', '151', '161', '171']})
dataframe.head(7)

In [None]:
dataframe.columns

### Information on Unique Values

In [None]:
# Unique values in the SaleType column
# dataframe
dataframe['SaleType'].unique()

In [None]:
# Number of unique values in the SaleType column
dataframe['SaleType'].nunique()

In [None]:
# Value counts for each value in the SaleType column
dataframe['SaleType'].value_counts()

<a id='2.2.6'></a>
### <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.2.6 Data Input and Output</p>

- Create dataframes from external sources (i.e. CSV and Excel) with the `pd.read_csv()` / `pd.read_excel()` method.
- The resultant dataframe could be saved to an external csv file with the `to_csv()` / `to_excel()` method.

In [None]:
# With CSV file
import pandas as pd

dataframe = pd.read_csv('datasets/train.csv')
dataframe.head()

In [None]:
# Writing to an external csv file
# With index=FALSE, the csv file will not store index values
dataframe.to_csv('exported.csv', index=False)

In [None]:
# With Excel file
df = pd.read_excel('datasets/Consumer.xlsx', sheet_name='Data1')
df.head()

In [None]:
# Writing to an external excel file; 
# With index=FALSE, the csv file will not store index values
df.to_excel('exported.xlsx', index=False)

<a id='2.3'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">2.3 Exercise</p>

Can you list some comonly used methods in pandas for data preprocessing or exploratory data analysis (EDA) that help better understand the data?

> Click to reveal solution

<!--
df.head(), df.tail(), df.info(), df.shape, df.columns;
df.describe(), df.value_counts();
df.isnull(), df.isnull.sum(), df.dropna()
-->

<a id='3'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">3. Numpy</p>

In [None]:
# Install a pip package in the current Jupyter kernel
# import sys
# !{sys.executable} -m pip install numpy

<a id='3.1'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">3.1  Topics</p>

* Array
* Array Manipulation
* Linear Algebra Library
* Input/Output with Numpy

---
## NumPy Library

- The Numerical Python (NumPy) library is a general purpose array processing library.
- NumPy array object (`ndarray`) is ~50x faster than traditional Python lists.
- It offers high-performance computation on **large, multi-dimensional** arrays.
- It has a rich collection of high-level mathematical functions to operate on these arrays.

The common to convention to import numpy is:

```python
import numpy as np
```

**Note:** The `as` keyword is short for alias which allows numpy to be referred to as `np`.

In [None]:
# Check NumPy version
import numpy as np

print(np.__version__)

---
## Arrays

NumPy's primary objects are Arrays (`ndarray`). They have the following properties:
 * Homogeneous (All elements are of the same datatype) and multi-dimensional.
 * Single dimensional array is similar to Python's `list`.
 * Uses indexing to access the elements.

To create a numpy array, call the `array()` function.

In [None]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)
print(type(arr))

## Dimensions in Arrays

Some jargons in numpy are as follow: 
- In NumPy, dimensions are called **axes**.
- The number of axes is called **rank**.
- NumPy arrays have the `ndim` attribute that returns the number of dimensions.

### 0-D Arrays (Rank 0)

- 0-D arrays (Scalars), are the elements in an array. 
- Each value in an array is a 0-D array.

In [None]:
import numpy as np

arr = np.array(42)

print(arr)
print(arr.ndim)

### 1-D Arrays (Rank 1)

- An array that has 0-D arrays as its elements is a 1-D array.
- These are the most common and basic arrays.

In [None]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)
print(arr.ndim)

### 2-D Arrays (Rank 2)

- An array that has 1-D arrays as its elements is called a 2-D array.
- These are often used to represent a matrix / $2^{nd}$ order tensors.

In [None]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)
print(arr.ndim)

---
## Order

- Numpy arrays contain a feature called **Order**. 
- Order is a method of storing multi-dimensional arrays. 
- There are 2 orders namely: 
 - **Row-major Order (Default):** Read from left to right. 
 - **Column-major Order:** Read from top to bottom.

![2-ways.png](attachment:2-ways.png)

**Note:** Array ordering matters when arrays are passed between programs written in **different languages** and it also matters in **data retrieval** as modern CPUs process sequential data more efficiently than non-sequential data.

---
## Creating a NumPy Array

- There are various methods for us to create different types of arrays.

**Method 1:** Manually creating an array object.

In [None]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

**Method 2:** Creating an array from Python lists (and tuples).

In [None]:
py_list = [5, 8, 9]
py_tuple = (7, 8, 9)

arr = np.array([py_list, py_tuple])
print(arr)

## The `arange` Method

- Creates an evenly spaced array using `arange()`.
- Includes an optional `step` parameter for stepsize. 
- For eg. `step=5` indicates the difference between each element created from $0$ to $30$ (excluded).

In [None]:
# Create a sequence of integers from 0 to 30 with a stepsize of 5 
arr = np.arange(start=0, stop=30, step=5)
print(arr)

## Array of Zeros & Ones

- The `.zeros()` method creates an array of zeros.
- The `.ones()` method creates an array of ones.
- To create an `n` by `m` matrix, pass along the dimensions in the form of a tuple `(n, m)`.

In [None]:
# Creating a 3-by-4 array of zeros
arr_0 = np.zeros((3, 4)) 
print(arr_0)

In [None]:
# Creating a 3-by-4 array of ones
arr_1 = np.ones((3, 4))
print(arr_1)

## Array of Random Numbers

- The `.random()` method from the `np.random` sub-module creates an array of random numbers.
- It populates the matrix with random numbers in the *half-open interval [0.0, 1.0).

**Note:** At least 0 and less than 1.

In [None]:
# Creating a 3-by-2 array with random numbers in the half-open interval [0.0, 1.0) 
# i.e. at least 0 and less than 1
arr = np.random.random((3, 2)) 

print(arr)

---
## Accessing Array Properties

In [None]:
import numpy as np 

# Creating an array object 
arr = np.array([[1, 2, 3], 
                [4, 5, 6]])  

# Printing the type of the array object 
print("Array is of type: ", type(arr)) 

# Printing the array dimensions 
print("No. of dimensions: ", arr.ndim) 

# Printing the shape of the array 
print("Shape of array: ", arr.shape) 

# Printing the size (total number of elements) of the array 
print("Size of array: ", arr.size) 

# Printing the type of elements in the array 
print("Array stores elements of type: ", arr.dtype) 

---
## Array Manipulation

### Reshaping

- Reshaping is to change the shape of an array.

In [None]:
# Reshape from 2-by-3 to 3-by-2
import numpy as np
 
arr = np.array([[1, 2, 3], 
                [4, 5, 6]]) 
  
newarr = arr.reshape(3, 2) 
  
print ("\nOriginal array:\n", arr)
print()
print ("Reshaped array:\n", newarr) 

### Array Flattening

- Image processing often requires reshaping a 2D array to a 1D array.
- This process is known as **flattening** an array.


For example:

In [None]:
# Flattening a 2D array to a 1D array
arr = np.array([[1, 2], 
                [5, 8], 
                [9, 4]]) 
print ("\nOriginal array:\n", arr)

arr = arr.flatten()
print ("\nAfter flattening:\n", arr)

---
### Transposing a Matrix

- To transpose a matrix is to change the order of arrangements for all the rows of a matrix into columns and vice-versa.
- This is done with the `transpose()` method or the `.T` method.

In [None]:
# Transposing a matrix
arr = np.array([[1, 2, 3], 
                [4, 5, 6]])

# Method 1
print(np.transpose(arr))

print()

# Method 2
print(arr.T)

## Array Concatenation

- The `concatenate()` combines 2 arrays into 1.
- When combining arrays along the row axis, specify `axis=0`.
- When combining arrays along the column axis, specify `axis=1`.

In [None]:
# Concatenating 1-dimensional array
import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([6, 7, 8, 9, 10])

combined = np.concatenate((a, b), axis=0)
print(combined)

In [None]:
# Concatenating a 2-dimensional array
import numpy as np 

a = np.array([[1, 2],
              [3, 4]])   

b = np.array([[5, 6],
              [7, 8]])

# Concatenate along the row axis (i.e. axis=0)
combined_a = np.concatenate((a, b), axis=0)    
print(f"Combined along the row axis: \n{combined_a}")

print()

# Concatenate along the column axis (i.e. axis=1)
combined_b = np.concatenate((a, b), axis=1)
print(f"Combined along the column axis: \n{combined_b}")

---
## Array Slicing

- Similar to Python list slicing, array slicing retrieves a subset of elements from an array.

In [None]:
# Array slicing with positive numbers
import numpy as np

x = np.array([2, 4, 6, 8, 10, 12, 14])

# Array slicing to retrieve elements from index 1 to 5 with a stepsize of 2
x[1:6:2]

### Negative Slicing

- Use the minus operator to refer to an index from the end.

![negative.png](attachment:negative.png)

**Example:** Array slicing with negative numbers

In [None]:
# Array slicing with negative numbers
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Retrieve elements from index -3 (index 7) to 9
print(x[-3:10])

---
## Linear Algebra Library

- The Linear Algebra library contains algebraic functions on arrays such as:
 - Dot and Inner Products.
 - Determinants.
 - Eigenvectors and Eigenvalues.
 - ...and many more.

### Dot Product of an Array

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [None]:
# Example: Computing the dot product
import numpy as np 

a = np.array([[1,2], 
              [3,4]]) 

b = np.array([[11,12], 
              [13,14]]) 

print(np.dot(a, b))

### Determinant of an Array

$
A = \begin{bmatrix}
a & b\\ 
c & d
\end{bmatrix}
$
<br><br>
$
\left | A \right | = (a*d) - (b*c)
$

In [None]:
# Example: Computing the determinant of an array
from numpy import linalg

a = np.array([[1,2], 
              [3,4]])

print(linalg.det(a))

---
## Input / Output with NumPy

- `ndArray` objects can be saved to external files on disk and loaded from them as well.
 * Use `loadtxt()` and `savetxt()` for normal text files.
 * Use `load()` and `save()` to create NumPy binary files (`.npy` extension).
- A practical usage would be that in Machine Learning, we save the trained weights of a Neural Network to an external NumPy binary files for performance gain.

### Saving and Loading from a Text File

- A file named `output.txt` will be created in the same directory as this notebook.

In [None]:
import numpy as np 

a = np.array([[1, 2, 3, 4],
              [5, 9, 3, 1],
              [4, 8, 2, 3]])

# Save as text file
np.savetxt('output.txt', a)

In [None]:
# Load text file
b = np.loadtxt('output.txt') 
print(b)

### Saving and Loading from a NumPy Binary File

- A file named `output_file.npy` will be created in the same directory as this notebook.

In [None]:
import numpy as np

a = np.array([1,2,3,4,5])

# Save as binary file
np.save('output_file', a)

In [None]:
# Load from binary file
b = np.load('output_file.npy') 
print(b)

<a id='3.2'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">3.2 Exercise</p>

1. Write a NumPy program to create a 5 by 5 array with random values and find the minimum and maximum values

> Click to reveal solution

<!--
import numpy as np

x = np.random.random((5,5))
print("Original Array:")
print(x) 
xmin, xmax = x.min(), x.max()
print("Minimum and Maximum Values:")
print(xmin, xmax)
-->

2. What is the slicing array used to get the output from the given 4 by 4 array.

```python
[[-1.   2.   5.   4. ]
 [ 4.  -0.5  6.  15. ]
 [ 2.6  0.   7.   8. ]
 [ 3.  -7.   4.   2. ]]
```

Output: 
```python
[[-1.,  5.]
 [ 4.,  6.]]
```

> Click to reveal solution

<!--
`x[:2, ::2]` or `x[ [[0,0], [1,1]], [[0,2], [0,2]] ]` or `np.array([ [x[0,0], x[0,2]], [x[1,0], x[1,2]] ])`
-->

<a id='4'></a>
# <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">4. Matplotlib</p>

<a id='4.1'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">4.1 Matplotlib Introduction</p>

- `Matplotlib` is a plotting package that is used for creating visualizations in Python. 
- It can create plots from Python Lists, Numpy arrars or Pandas DataFrames.
- To use `Matplotlib`, import the `pyplot` submodule using `import matplotlib.pyplot as plt`.

---
- Whenever `Matplotlib` creates a plot from your data, it contains it in a container called `Figure`. 
- Each of these `figure` may contain elements that are shown in Figure 1. 
- The most common elements are the x- and y- axis, the major ticks, the major tick labels, the spines and the data being plotted.

In [None]:
# Example: A simple plot using lists of x and y values
import matplotlib.pyplot as plt

# Plots a line graph.
# 1st list is the x axis 
# 2nd list contains the corresponding values
plt.plot([1, 2, 3, 4], [2, 4, 1.5, 3])

# Add the labels and title
plt.title("Numbers Line Graph")
plt.xlabel("x-axis")
plt.ylabel("y-axis")

# Save the figure as simple_line_plot.png with a dpi of 200
plt.savefig('simple_line_plot.png', dpi=200) 

- In the previous graph, it is missing several elements such as the axis labels, title and markers. 
- These properties can be added by invoking certain `pyplot` methods / passing arguments to the `plot()` function.

**Note:** 'D' for Diamond, refer to https://matplotlib.org/stable/api/markers_api.html for more markers style.

In [None]:
# Plotting a scatter plot using random values
import matplotlib.pyplot as plt
import random 
random.seed(1)

# Generate some random numbers
x = random.sample(range(50), 20)
y = random.sample(range(50), 20)

# Plotting the data
plt.scatter(x, y, marker='*')
plt.title("Random Numbers Scatter Plot")
plt.xlabel("x-axis")
plt.ylabel("y-axis")

---
## Multiple Plots in a Graph

- Suppose we want to plot 2 or more lines on to the same plot. 

In [None]:
# Importing libraries
import matplotlib.pyplot as plt
import numpy as np
import math

# Using Numpy to create an array X
X = np.arange(0, math.pi*2, 0.05)

# Assign variables to the y axis part of the curve
y = np.sin(X)
z = np.cos(X)

# Plotting both the curves simultaneously
# Line 1
plt.plot(X, y, color='r', label='sin')
# Line 2
plt.plot(X, z, color='g', label='cos')

# Naming the x-axis, y-axis and the whole graph
plt.xlabel("Angle")
plt.ylabel("Magnitude")
plt.title("Sine and Cosine functions")

# Adding legend, which helps us recognize the curve according to it's color
plt.legend()

# To load the display window
plt.show()


**Note:**

- Customization cannot be done in a separate cell.


---
## Subplot

- If you do not want the 2 lines to be on the same plot, you can plot them in different **subplots** with the `subplot()` method within the **same** figure. 
- `subplot()` accepts either 3 comma separated integers (`3,2,1`) or a 3-digit integer (`321`). 
- This integer informs `subplot()` the number of rows and columns the plot is going have and the index of the current plot being plotted. 
- For example, the integer `321` means that there are 3 rows, 2 columns and the current plot index is `1`. 

### Method 1

- Create a figure with subplots and then add on top of them.

In [None]:
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])

x2 = np.array([0, 1, 2, 3])
y2 = np.array([10, 20, 30, 40])

In [None]:
# Setting up the figure size
plt.figure(figsize=(10,8))

# Subplot 1
ax1 = plt.subplot(2,1,1)    # subplot with 2 rows, 1 column, index 1
ax1.plot(x1, y1, linewidth=2, color='#F77538')
ax1.grid(True, axis='y')
ax1.set_facecolor('#eafff5')

# Subplot 2
ax2 = plt.subplot(2,1,2)    # subplot with 2 rows, 1 column, index 2
ax2.plot(x2, y2, linewidth=2, color='#1AB75C')
ax2.grid(True, axis='y')

### Method 2

The `subplots` method creates the figure along with the subplots that are then stored in the `ax` array.

In [None]:
# Setting up the figure size, (width, height) in inches
fig, ax = plt.subplots(2, 1)
fig.set_figwidth(10)
fig.set_figheight(8)

# Subplot 1
ax[0].plot(x1, y1, linewidth=2, color='#F77538')
ax[0].grid(True, axis='y')
ax[0].set_facecolor('#eafff5')

# Subplot 2
ax[1].plot(x2, y2, linewidth=2, color='#1AB75C')
ax[1].grid(True, axis='y')

<a id='4.2'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">4.2  Matplotlib Plotting with Numpy `ndarray`</p>

- We can use NumPy `ndArray` with `Matplotlib` as well. 
- We just need to ensure that the `ndArray` for both the x-axis and y-axis are of the same shape.


---
### Plotting with Dictionary

- `ndArray` can be stored in a `dictionary` and be subsequently used to plot a graph.

In [None]:
# x and y have ndarrays of size 50
data_dictionary = {'x': np.arange(50),
                   'y': np.random.randint(0, 50, 50)}

# Plot the graph
plt.scatter('x', 'y', data=data_dictionary, marker='o')
plt.title("Random Values Using Dictionary")
plt.xlabel("x-axis")
plt.ylabel("y-axis")

<a id='4.3'></a>
## <p style="background-color:skyblue; font-family:newtimeroman; font-size:150%; text-align:center">4.3 Matplotlib with Pandas</p>

- The plot method on Series and DataFrame is a simple wrapper around `plt.plot()`.

In [None]:
# Plotting a DataFrame
np.random.seed(1)

df = pd.DataFrame(data=np.random.randn(365, 4),
                  index=pd.date_range('1/1/2023', periods=365), 
                  columns=['A','B','C','D'])
df = df.cumsum()
df.plot()

---
### Other Plots

- Various plots can be created by providing manipulating the `kind` keyword argument to `plot()`:  
 - `bar` or `barh` for bar plots.
 - `hist` for histogram.
 - `box` for boxplot.
 - `scatter` for scatter plots.  
 - `pie` for pie plots.  

### Bar Plots 

- For categorical data, you may wish to produce a bar plot:

In [None]:
df = pd.DataFrame({'Labels':['A', 'B', 'C'], 
                   'Values':[10, 30, 20]})

df.plot.bar(x='Labels', 
            y='Values', 
            rot=0)    # Label rotation

- Calling a DataFrame’s `plot.bar()` method produces multiple bar plots:

- To produce a stacked bar plot, pass `stacked=True`.

In [None]:
df2 = pd.DataFrame(np.random.rand(3, 4), 
                   columns=['A', 'B', 'C', 'D'])

print(df2, '\n')

df2.plot.bar(rot=0)

- To produce a horizontal bar plot, use the `barh()` method.

In [None]:
# Horizontal bar plot
df2.plot.barh()

# Stacked Horizontal bar plot
# df2.plot.barh(stacked=True)

---
### Histograms

Histograms can be drawn with the `DataFrame.plot.hist()` and `Series.plot.hist()` methods.

In [None]:
df3 = pd.DataFrame({'a': np.random.randn(1000)})
df3.plot.hist(bins=10)

---
### Box Plots

- Boxplot can be drawn calling `Series.plot.box()` and `DataFrame.plot.box()`, or `DataFrame.boxplot()` to visualize the distribution of values within each column.

In [None]:
df = pd.DataFrame(data=np.random.rand(10, 5), 
                  columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box()

- To create a horizontal boxplot, pass `vert=False`.

In [None]:
df.plot.box(vert=False)

---
### Scatter Plot

- Scatter plot can be drawn by using the `DataFrame.plot.scatter()` method. 
- Scatter plot requires numerical columns for the x and y axes. 
- These can be specified by the x and y keywords.

In [None]:
df = pd.DataFrame(np.random.rand(50, 4), columns=['A', 'B', 'C', 'D'])
df.plot.scatter(x='A', y='B')

In [None]:
ax = df.plot.scatter(x='A', y='B', color='red', label='Group 1')
df.plot.scatter(x='C', y='D', color='blue', label='Group 2', ax=ax)    # Share the same figure

---
### Pie Plot

- You can create a pie plot with `DataFrame.plot.pie()` or `Series.plot.pie()`. 
- If your data includes any `NaN`, they will be automatically filled with `0`. 
- A `ValueError` will be raised if there are negative values in your data.

In [None]:
series = pd.Series(3 * np.random.rand(4), index=['A', 'B', 'C', 'D'], name='Categories')
series

In [None]:
series.plot.pie(figsize=(6, 6))

In [None]:
np.random.seed(1)

df = pd.DataFrame(data=3*np.random.rand(4, 2), 
                  index=['A', 'B', 'C', 'D'], 
                  columns=['X', 'Y'])

print(df)

df.plot.pie(subplots=True, 
            figsize=(10, 8), 
            autopct='%.1f')