# Introduction to Matplotlib

## Overview of Matplotlib

Matplotlib is a powerful Python library for creating visualizations. It is widely used for data visualization in data analysis and machine learning projects.

### Key Features:
- Extensive support for 2D plotting.
- Customizable plots with labels, titles, legends, and more.

### Common Use Cases:
- Exploratory data analysis.
- Visualizing trends, distributions, and relationships in datasets.
- Communicating results effectively.

---

## Basic Plots in Matplotlib

### 1. Line Plots
Line plots are used to visualize data points connected by lines, often representing trends over time.

#### Example:

In [None]:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Create line plot
plt.plot(x, y, marker='o', linestyle='-', color='b')
plt.title("Line Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()

---

### 2. Bar Charts
Bar charts are used to compare different categories.

#### Example:

In [None]:
# Sample data
categories = ['A', 'B', 'C', 'D']
values = [10, 15, 7, 12]

# Create bar chart
plt.bar(categories, values, color='skyblue')
plt.title("Bar Chart Example")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

---

### 3. Histograms
Histograms are used to show the distribution of a dataset.

#### Example:

In [None]:
import numpy as np

# Sample data
data = np.random.normal(0, 1, 1000)  # Generate random data

# Create histogram
plt.hist(data, bins=20, color='orange', edgecolor='black')
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

---

## Hands-On: Create Basic Plots

### Dataset for Hands-On
We will use a small dataset of car sales over months:

In [None]:
import pandas as pd

# Create a sample dataset
months = ['January', 'February', 'March', 'April', 'May', 'June']
sales = [250, 300, 400, 350, 500, 600]
data = pd.DataFrame({'Month': months, 'Sales': sales})
print(data)

### Task 1: Line Plot for Sales Trends

In [None]:
# Line plot for sales trends
plt.plot(data['Month'], data['Sales'], marker='o', linestyle='-', color='green')
plt.title("Sales Trends")
plt.xlabel("Months")
plt.ylabel("Number of Cars Sold")
plt.grid(True)
plt.show()

### Task 2: Bar Chart for Monthly Sales

In [None]:
# Bar chart for sales
plt.bar(data['Month'], data['Sales'], color='purple')
plt.title("Monthly Car Sales")
plt.xlabel("Months")
plt.ylabel("Number of Cars Sold")
plt.show()

### Task 3: Histogram of Random Data (Distribution Example)

In [None]:
# Generate random data for histogram
random_data = np.random.normal(loc=300, scale=50, size=100)

# Create histogram
plt.hist(random_data, bins=15, color='blue', edgecolor='black')
plt.title("Random Data Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

---

## Sample Datasets for Exploration

### 1. Titanic Dataset (Kaggle)
The Titanic dataset is commonly used for data visualization and machine learning.

#### Access:
[Kaggle Titanic Dataset](https://www.kaggle.com/c/titanic)

#### Example Use:
Visualize the survival rate by gender:

In [None]:
import seaborn as sns
import pandas as pd

# Load Titanic dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
data = pd.read_csv(url)

# Bar plot for survival by gender
sns.countplot(x='Survived', hue='Sex', data=data)
plt.title("Survival by Gender")
plt.show()

---

### 2. Iris Dataset
The Iris dataset is a classic dataset for data visualization and machine learning.

#### Access:
Available via `sklearn` library or [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris).

#### Example Use:
Scatter plot of sepal length vs. sepal width:

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load Iris dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target

# Scatter plot
plt.scatter(data['sepal length (cm)'], data['sepal width (cm)'], c=data['species'], cmap='viridis')
plt.title("Sepal Length vs. Width")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Sepal Width (cm)")
plt.show()

---

### 3. House Prices Dataset (Kaggle)
The House Prices dataset is ideal for visualizing relationships between features and house prices.

#### Access:
[Kaggle House Prices Dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)

#### Example Use:
Histogram of house prices:

In [None]:
# Load dataset
url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
data = pd.read_csv(url)

# Histogram of house prices
plt.hist(data['medv'], bins=30, color='green', edgecolor='black')
plt.title("Distribution of House Prices")
plt.xlabel("Price")
plt.ylabel("Frequency")
plt.show()