# 📘 Lecture Notes: Matplotlib for Machine Learning

Matplotlib is essential for data visualization in machine learning, especially during data exploration and model evaluation.

<b>What is matplotlib.pyplot?<b>

matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Each function makes a change to a figure: 
for example, it creates a figure, creates a plotting area, plots some lines, labels the axes, and so on.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

## 📊 1. Basic Plot Types
### 1.1 Line Plot

In [None]:
x = [1, .6, 3, 4]
y = [2, 4, 6, 9]
plt.plot(x, y)
plt.title("Line Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()

### 1.2 Scatter Plot

In [None]:
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y, color='blue')
plt.title("Scatter Plot")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

### 1.3 Bar Chart

In [None]:
categories = ['A', 'B', 'C']
values = [10, 24, 36]
plt.bar(categories, values, color='green')
plt.title("Bar Chart")
plt.xlabel("Class")
plt.ylabel("Frequency")
plt.show()

### 1.4 Histogram

In [None]:
data = np.random.randn(1000)
plt.hist(data, bins=10, color='orange')
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

### 1.5 Box Plot

A boxplot is a standardized graphical representation of the distribution of a dataset based on five summary statistics:

- Minimum – The smallest data point excluding outliers
- First Quartile (Q1) – 25th percentile
- Median (Q2) – 50th percentile
- Third Quartile (Q3) – 75th percentile
- Maximum – The largest data point excluding outliers

It may also display outliers as individual points beyond the “whiskers”.

In [None]:
data = np.random.normal(0, 1, 100)
plt.boxplot(data)
plt.title("Box Plot")
plt.show()

<b>What a Boxplot Shows<b>

The box shows the interquartile range (IQR = Q3−Q1), which contains the middle 50% of the data.

- The line inside the box is the median.
- The "whiskers" extend from the box to the minimum and maximum values within 1.5×IQR.
- Outliers are shown as individual dots or points beyond the whiskers.

<b> Why Boxplots Are Useful<b>
- Identify skewness, spread, and central tendency
- Spot outliers
- Compare distributions across groups

## 🧱 2. Subplots

- A subplot is a plotting area inside a larger figure that allows you to display multiple plots in one figure window. 
- It helps to compare multiple visualizations side-by-side or in a grid layout.

#fig, axs = plt.subplots(1, 2, figsize=(10, 4))

- 1, 2 → This creates 1 row and 2 columns of plots, i.e., 2 side-by-side plots.
- fig → This is the Figure object (the entire canvas).
- axs → This is a NumPy array of Axes objects. Each Axes is where an individual plot goes.
- figsize=(10, 4) → Sets the size of the entire figure: 10 inches wide and 4 inches tall.


In [None]:
fig, axs = plt.subplots(1, 2, figsize=(10, 4))

axs[0].plot([1, 2, 3], [1, 2, 3])
axs[0].set_title("Plot 1")

axs[1].plot([1, 2, 3], [3, 2, 1])
axs[1].set_title("Plot 2")

plt.tight_layout()
plt.show()

## 📉 3. Loss Curve (Training vs Validation)

- A loss curve is a graph that shows how the loss (or error) of a machine learning model changes over time, typically during training.
- Loss measures how far the model’s predictions are from the actual values.
- Lower loss = better model performance (up to a point).
- The curve is usually plotted against epochs, where an epoch is one full pass over the training dataset.

In [None]:
epochs = range(1, 11)
train_loss = [0.9, 0.8, 0.6, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.30]
val_loss = [1.0, 0.9, 0.75, 0.6, 0.58, 0.57, 0.56, 0.55, 0.54, 0.53]

plt.plot(epochs, train_loss, label='Training Loss')
plt.plot(epochs, val_loss, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Model Loss')
plt.legend()
plt.show()

## 🔥 4. Confusion Matrix (Heatmap)

- A confusion matrix is a performance measurement tool for classification problems.
- It shows how many predictions the model got correct and incorrect, broken down by actual vs predicted classes.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 2, 1]

cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, cmap='Blues', fmt='d')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

# Scikit-learn Library

The Scikit-learn library (also written as scikit-learn or sklearn) is a popular and powerful machine learning library in Python. It provides simple and efficient tools for data mining, data analysis, and machine learning tasks, and is built on top of other popular Python libraries like:

- NumPy – for numerical operations
- SciPy – for scientific computing
- matplotlib – for visualization (optional but useful)
- pandas – for data manipulation (commonly used alongside)

<b>Key Features of Scikit-learn:<b>

- Easy-to-use API
    Consistent syntax across all machine learning models makes it user-friendly.

- Wide Range of ML Algorithms Includes:

    - Supervised learning: Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, etc.

    - Unsupervised learning: K-Means, PCA, DBSCAN, etc.

    - Model selection: Cross-validation, Grid Search

    - Preprocessing: Standardization, Normalization, One-Hot Encoding, etc.

- Integration with Other Libraries
    Works seamlessly with NumPy arrays and pandas DataFrames.