# Matlplotlib

Matplotlib is a powerful plotting library in Python that allows you to create a wide variety of static, animated, and interactive visualizations. To understand Matplotlib thoroughly, let's break down its components and functionality using simple analogies and detailed explanations.

## The Philosophy of Matplotlib

Think of Matplotlib as a set of building blocks (or LEGO bricks) that you can use to construct different types of visualizations. Each block represents a different element of a plot, such as the axes, labels, lines, and markers. By combining these blocks in various ways, you can create anything from simple line graphs to complex multi-layered plots.

## Installation


```
pip install matplotlib
```



## Core Components


### Figure and Axes
* **Figure:** Think of the figure as a blank canvas where you will draw your painting (the plot).
* **Axes:** The axes are the area on the canvas where you actually paint. It includes the data space, x and y axis, and their labels.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # Create a figure and an axes.
x = [1,2,3,4]
y = [1,4,2,3]
# [(1,1), (2,4), (3,2), (4,3)]
ax.plot(x, y)  # Plot some data on the axes - line plot
plt.show()  # Display the plot

#### Other common plots

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # Create a figure and an axes.
ax.scatter([1, 2, 3, 4], [1, 4, 2, 3]) # Scatter Plot: Dots instead of lines.
plt.show()  # Display the plot.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # Create a figure and an axes.
ax.bar([1, 2, 3, 4], [1, 4, 2, 3]) # Bar Plot: Bars instead of lines.
plt.show()  # Display the plot.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # Create a figure and an axes.
ax.hist([1, 2, 2, 3, 3, 3, 4, 4, 4, 4]) # Histogram: Frequency distribution of data.
plt.show()  # Display the plot.

## Customizing Plots

You can change colors, add labels, and modify the layout to make it more appealing.

### Title and Labels

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()  # Create a figure and an axes.
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])  # Plot some data on the axes - line plot
ax.set_title("My Plot")  # Add a title.
ax.set_xlabel("X Axis Label")  # Add x-axis label.
ax.set_ylabel("Y Axis Label")  # Add y-axis label.
plt.show()  # Display the plot.

### Colors and Styles

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1,1)  # Create a figure and an axes.
ax.plot([1, 2, 3, 4], [1, 4, 2, 3], color='red', linestyle='-.', marker='o')  # Plot some data on the axes - line plot with changed styles
ax.set_title("My Plot")  # Add a title.
ax.set_xlabel("X Axis Label")  # Add x-axis label.
ax.set_ylabel("Y Axis Label")  # Add y-axis label.
plt.show()  # Display the plot.

### Subplots

In [None]:
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])
ax.plot([1.5, 2.5, 3.5, 4.5], [1, 2, 3, 4])
plt.show()

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2)  # Create a figure with two subplots.
ax1.plot([1, 2, 3, 4], [1, 4, 2, 3])
ax2.plot([1, 2, 3, 4], [1, 2, 3, 4])
plt.show()

In [None]:
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1)  # Create a figure with two subplots.
ax1.plot([1, 2, 3, 4], [1, 4, 2, 3])
ax2.plot([1, 2, 3, 4], [1, 2, 3, 4])
plt.show()

## Putting It All Together


In [None]:
import matplotlib.pyplot as plt

# Create a figure and a set of subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))

# Plot on the first axes
ax1.plot([1, 2, 3, 4], [1, 4, 2, 3], color='blue', linestyle='-', marker='o')
ax1.set_title("Line Plot")
ax1.set_xlabel("X Axis")
ax1.set_ylabel("Y Axis")

# Plot on the second axes
ax2.bar([1, 2, 3, 4], [1, 4, 2, 3], color='green')
ax2.set_title("Bar Plot")
ax2.set_xlabel("X Axis")
ax2.set_ylabel("Y Axis")

# Display the plot
plt.show()

## Advanced Usage

### Multiple Subplots with Shared Axes

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 2 * np.pi, 400)
y1 = np.sin(x)
y2 = np.cos(x)

fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)  # 2 rows, 1 column, shared x-axis
# fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True)  # 2 cols, 1 row, shared y-axis
fig.suptitle('Sharing X Axis')

ax1.plot(x, y1, label='sin(x)')
ax1.legend()
ax1.set_ylabel('sin(x)')

ax2.plot(x, y2, label='cos(x)', color='orange')
ax2.legend()
ax2.set_xlabel('x')
ax2.set_ylabel('cos(x)')

plt.show()

### Annotating Plots

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

fig, ax = plt.subplots()
ax.plot(x, y)

# Highlight the maximum value
max_y = np.max(y)
max_x = x[np.argmax(y)]
ax.annotate(f'Max Value: {max_y:.2f}', xy=(max_x, max_y), xytext=(max_x+2, max_y+0.5),
            arrowprops=dict(facecolor='black', shrink=0.05))

plt.title('Sine Wave with Annotation')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

### Customizing with Styles

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x)

# Use a predefined style
plt.style.use('ggplot')

fig, ax = plt.subplots()
ax.plot(x, y)

plt.title('Styled Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show();

### Creating a Heatmap

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Sample data
data = np.random.rand(10, 10)

fig, ax = plt.subplots()
cax = ax.matshow(data, cmap='plasma')

# Add colorbar
fig.colorbar(cax)

plt.title('Heatmap')
plt.show()

# EDA using Matplotlib

Data: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

Some tips and tricks.

## Step 0: Installing libraries



```
pip install pandas matplotlib
```



## Step 1: Import Libraries and Load the Dataset

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/GFG/58. Data Visualization in Matplotlib and Seaborn - 1st June, 2024/data/train.csv')

# Display the first few rows of the dataset
print(df.head())

## Step 2: Understand the Dataset Structure

In [None]:
# Display basic information about the dataset
print(df.info())

In [None]:
# Describe the numerical columns
print(df.describe())

## Step 3: Visualize Distributions

### Distribution of Sale Prices

In [None]:
plt.figure(figsize=(10, 6))
plt.hist(df['SalePrice'], bins=30, color='blue', edgecolor='k')
plt.title('Distribution of Sale Prices')
plt.xlabel('Sale Price')
plt.ylabel('Frequency')
plt.show()

### Distribution of Numerical Features

In [None]:
numerical_features = ['GrLivArea', 'TotalBsmtSF', '1stFlrSF', 'GarageArea']

fig, axs = plt.subplots(2, 2, figsize=(15, 10))
axs = axs.flatten()

for i, feature in enumerate(numerical_features):
    axs[i].hist(df[feature].dropna(), bins=30, color='green', edgecolor='k')
    axs[i].set_title(f'Distribution of {feature}')
    axs[i].set_xlabel(feature)
    axs[i].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

## Step 4: Analyze Relationships

### Scatter Plot of Sale Price vs. GrLivArea

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(df['GrLivArea'], df['SalePrice'], color='purple', alpha=0.3)
plt.title('Sale Price vs. GrLivArea')
plt.xlabel('Above Grade Living Area (sq ft)')
plt.ylabel('Sale Price')
plt.show()

## Step 5: Detect Outliers

### Box Plot of Sale Price

In [None]:
plt.figure(figsize=(10, 6))
plt.boxplot(df['SalePrice'], vert=False, patch_artist=True)
plt.title('Box Plot of Sale Prices')
plt.xlabel('Sale Price')
plt.show()

### Box Plot of Numerical Features

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(15, 10))
axs = axs.flatten()

for i, feature in enumerate(numerical_features):
    axs[i].boxplot(df[feature].dropna(), vert=False, patch_artist=True)
    axs[i].set_title(f'Box Plot of {feature}')
    axs[i].set_xlabel(feature)

plt.tight_layout()
plt.show()

# Why Seaborn Was Developed
Seaborn was developed to complement Matplotlib and address several specific needs:

* **Ease of Statistical Plotting:** Matplotlib, while powerful, can be cumbersome for statistical plots. Seaborn simplifies this with functions that directly handle common statistical tasks.

* **Improved Aesthetics:** While Matplotlib allows for detailed customization, creating aesthetically pleasing plots can be time-consuming. Seaborn provides beautiful default styles and color palettes.
* **Convenient Data Handling:** Seaborn is designed to work well with Pandas DataFrames, making it straightforward to plot data directly from these structures.
* **Simplified Code:** Reduces the amount of code needed to create common plots, making it more accessible for quick visualizations and exploratory data analysis.

## Comparing Code Examples


### Matplotlib

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Sample data
df = pd.DataFrame({
    'x': range(10),
    'y': [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
})

plt.figure(figsize=(10, 6))
plt.scatter(df['x'], df['y'], color='blue', edgecolor='k')
plt.title('Scatter Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.grid(True)
plt.show()

### Seaborn

In [None]:
import seaborn as sns
import pandas as pd

# Sample data
df = pd.DataFrame({
    'x': range(10),
    'y': [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
})

sns.set(style='dark')
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', data=df)
plt.title('Scatter Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

# Seaborn

## Installing Seaborn


```
pip install seaborn
```



In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Load the dataset
df = sns.load_dataset('tips') # https://github.com/mwaskom/seaborn-data
print(df.head())

## Basic Plotting Functions

### Scatter Plot

In [None]:
sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot of Total Bill vs Tip')
plt.show()

### Histogram

In [None]:
sns.histplot(df['total_bill'], bins=30, kde=True)
plt.title('Histogram of Total Bill')
plt.show()

### Box Plot

In [None]:
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box Plot of Total Bill by Day')
plt.show()

## Advanced Plotting Functions

### Pair Plot

In [None]:
sns.pairplot(df)
plt.show()

### Heatmap

In [None]:
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)
corr = newdf.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Heatmap of Correlation Matrix')
plt.show()

## Customizing Plots

### Changing Styles

In [None]:
sns.set_style('dark')
sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot with Whitegrid Style')
plt.show()

### Custom Colour Palettes

In [None]:
sns.set_palette('Spectral')
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box Plot with Pastel Palette')
plt.show()

### Combining Plots with FacetGrid

In [None]:
g = sns.FacetGrid(df, col='time')
g.map(sns.scatterplot, 'total_bill', 'tip')
plt.show()

## EDA with Seaborn

In [None]:
# Load the dataset
df = sns.load_dataset('titanic')
print(df.head())
print(df.info())
print(df.describe())

# Set the aesthetic style
sns.set_style('whitegrid')

# Distribution of age
plt.figure(figsize=(10, 6))
sns.histplot(df['age'].dropna(), bins=30, kde=True)
plt.title('Distribution of Age')
plt.show()

# Box plot of fare by class
plt.figure(figsize=(10, 6))
sns.boxplot(x='class', y='fare', data=df)
plt.title('Box Plot of Fare by Class')
plt.show()

# Pair plot of select features
features = ['survived', 'age', 'fare', 'pclass']
sns.pairplot(df[features].dropna(), hue='survived', diag_kind='kde')
plt.show()

# Heatmap of correlation matrix
plt.figure(figsize=(10, 6))
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
newdf = df.select_dtypes(include=numerics)
corr = newdf.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('Heatmap of Correlation Matrix')
plt.show()

# FacetGrid for age distribution by survival status
g = sns.FacetGrid(df, col='survived')
g.map(sns.histplot, 'age')
plt.show()