## Matplotlib Tutorial

**Matplotlib** is the bedrock of Python data visualization: a flexible, battle‑tested library that powers higher‑level interfaces like **Seaborn**, **Pandas ``.plot``**, and **Plotly Express** under the hood. Mastering its core concepts—figures, axes, artists, styles—lets you build anything from quick one‑liners to publication‑quality graphics.

This notebook walks through the *essentials*—line plots, bar charts, histograms, scatter plots, pie charts and subplots—using the exact datasets you provided. Each section pairs concise, reusable code with explanations of **why** and **when** to use each feature, so you can adapt snippets to your own projects.

Feel free to run the cells, tweak parameters, and see how each change affects the output. By the end you’ll have a practical mental model of Matplotlib’s power and how to leverage it efficiently.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [None]:
# Countries = pd.read_csv('data/Countries.csv')
pup=pd.read_csv('data/pupils.csv')
titanic=sns.load_dataset('titanic')

# Basic Matplotlib.pyplot graphs

We begin with the simplest *implicit* interface—`plt.plot`—which automatically creates a Figure and a single Axes. This is perfect for quick checks and exploratory work.

In this section we start with the simplest possible uses of `plt.plot` to draw basic lines using numeric lists or NumPy arrays.

In [None]:
# Basic Matplotlib.pyplot graphs: demonstration
plt.plot()

In [None]:
# Basic Matplotlib.pyplot graphs: demonstration
plt.plot([10,20,15,7])

In [None]:
# Basic Matplotlib.pyplot graphs: demonstration
x = [1, 3, 5, 7]
y = [50, 60, 45, 55]

In [None]:
#By default, the plot() function draws a line from point to point.
#plot(x, y)        # plot x and y using default line style and color
#plot(y)           # plot y using x as index array 0..N-1
plt.plot(x,y)

### Figure

A `Figure` is the top‑level container: think of it as a blank canvas. You can set its dimensions (in inches) and DPI once, then add one or many Axes to it.

Matplotlib Figure objects give you fine‑grained control over size, DPI and subplots; here we explore creating empty figures and specifying their size.

In [None]:
y1=np.arange(5)
y2=y1*3
y3=y1**2

In [None]:
# figure -Create a new figure, or activate an existing figure.
plt.figure()
plt.plot(y1)
plt.figure()
plt.plot(y2)
plt.figure()
plt.plot(y3)
plt.show()

### Figure size

Larger figures improve readability when presenting; smaller figures are ideal for dense dashboards.

Changing the overall canvas size helps when targeting different media (e.g., presentations vs. reports).

In [None]:
#figsize -Width, height in inches.
plt.figure(figsize=(5,4))
plt.plot(y1)
plt.show()

In [None]:
# Figure size: demonstration
plt.figure(figsize=(12,4))
plt.plot(y1)
plt.show()

# Format marker , line & color

The classic `plt.plot(x, y, "ro--")` shorthand packs marker, face color and line style in a tiny string. Memorise a few combos to speed up prototyping.

`plt.plot` accepts shorthand strings to set marker style, line style and color all at once; let’s see a few variations.

In [None]:
#style|marker|line|color

### Styles

Calling `plt.style.use('ggplot')` (or any of the bundled styles) instantly changes defaults—handy for presentations or dark‑mode themes.

Matplotlib ships with a collection of predefined stylesheets. We will switch styles to quickly give plots a fresh look.

In [None]:
# Styles: demonstration
plt.style.available

In [None]:
# Styles: demonstration
plt.style.use('seaborn-v0_8')

In [None]:
# Styles: demonstration
plt.figure(figsize=(5,4),dpi=60)
plt.plot(y1)
plt.plot(y2)
plt.show()

In [None]:
plt.style.use('default')

# Color

Control the color of lines or markers using named colors, hex codes.

In [None]:
#'k' - Black
#'w' - White
#'r' - Red
#'b' - Blue
#'g' - Green
#'c'-  Cyan
#'y' - Yellow


In [None]:
# Color: demonstration
plt.figure(figsize=(5,4),dpi=60)
plt.plot(y1,color='b')
plt.plot(y2,color='#F94C66')
plt.show()

# marker

Markers highlight discrete samples—use them sparingly when points are dense.

Choose different marker symbols (circle, triangle, star, etc.) to highlight individual data points.

In [None]:
#'o' - Circle--#parameter 'o', which means 'rings'.
#'*' - Star
#'.' - Point

In [None]:
plt.figure(figsize=(5,4),dpi=100)
plt.plot(y1,color='b',marker='*')
plt.plot(y2,color='#F94C66',marker='o')
plt.show()

In [None]:
plt.figure(figsize=(5,4),dpi=100)
plt.plot(y1,color='b',marker='*')
plt.plot(y2,color='#F94C66',marker='o', markevery=2)
plt.show()

In [None]:
plt.figure(figsize=(5,4),dpi=100)
plt.plot(y1,color='b',marker='*')
plt.plot(y2,color='#F94C66',marker='o', markevery=[2, 4])
plt.show()

# Line

Line width (`lw`) and style (`linestyle`) convey emphasis and trends.

Line style and width help distinguish multiple series—solid, dashed, dotted and more.

In [None]:
#'-' - Solid line
#':' - Dotted line
#'--' - Dashed line

In [None]:
# Line: demonstration

plt.figure(figsize=(5,4),dpi=100)
plt.plot(y1,color='b',linewidth=10)
plt.plot(y2,color='#F94C66',linewidth=3)
plt.show()

# Marker size - ms

Scaling markers helps encode an additional quantitative variable, but avoid oversized blobs that obscure data.

Adjust marker size (`ms`) and line width to improve readability at different resolutions.

In [None]:
# Marker size - ms: demonstration
x = [1, 3, 5, 7,9]
y1 = np.arange(5)
plt.plot(x, y1, marker =">", ms=10)

In [None]:
#line style
plt.plot(x,y1,marker ='o',linestyle = 'dotted' , ms=10)

In [None]:
# Marker size - ms: demonstration
plt.plot(x, y1, linestyle = 'dashed' )

# Labels and Title

Always label axes and title—future you (and teammates) will thank you.

Always label your axes and add a title so your audience knows exactly what they’re viewing.

In [None]:
# Labels and Title: demonstration
x = [1, 3, 5, 7]
y = [30, 20, 70, 40]
z = [10, 30, 20, 60]
plt.plot(x,y)
plt.plot(x,z)

plt.title("x vs y")
plt.legend(["y", "z"])
plt.xlabel("x")
plt.ylabel("y")

In [None]:
#Formatting the style

In [None]:
# Labels and Title: demonstration
x = [1, 3, 5, 7]
y = [30, 20, 70, 40]
z = [10, 30, 20, 60]
plt.plot(x, y)
plt.plot(x, z)
plt.title("x vs y and z")
plt.legend(["y","z"],  loc="upper left")
plt.xlabel("x")
plt.ylabel("y and z")

### xticks
#Get or set the current tick locations and labels of the x-axis.

In [None]:
# xticks
# Get or set the current tick locations and labels of the x-axis.: demonstration
x = [1, 3, 5, 7]
y = [30, 20, 70, 40]
z = [10, 30, 20, 60]

plt.figure(figsize=(5,4),dpi=80)
plt.plot(x, y)
plt.plot(x, z)

plt.title("x vs y and z", loc ="center", fontsize=20, color='r')
plt.legend(["y","z"],  loc="upper left")
plt.xlabel("x", labelpad=10 )
plt.ylabel("y and z")
plt.xticks([0,3,5,7])
plt.yticks([10,20,30,40,50,60,70,80,90,100])
plt.show()

In [None]:

x = [1, 3, 5, 7]
y = [30, 20, 70, 40]
z = [10, 30, 20, 60]
plt.plot(x, y)
plt.plot(x, z)


plt.title("x vs y and z" ,loc ="center" ,fontsize=20,color='r')
plt.legend(["y","z"],  loc="upper left")
plt.xlabel("x", labelpad=50 )
plt.ylabel("y and z")

plt.xticks([1, 3, 5, 7])
plt.yticks([10,20,30,40,50,60],labels=['10K','20K','30K','40K','50K','60K'])

plt.xlim(0,10)
plt.ylim(0,100)

In [None]:

x = [1, 3, 5, 7]
y = [30, 20, 70, 40]
z = [10, 30, 20, 60]
plt.plot(x, y)
plt.plot(x, z)


plt.title("x vs y and z" ,loc ="center" ,fontsize=20,color='r')
plt.legend(["y","z"],  loc="upper left")
plt.xlabel("x", labelpad=50 )
plt.ylabel("y and z")

plt.xticks([1, 3, 5, 7])
plt.yticks([10,20,30,40,50,60],labels=['10K','20K','30K','40K','50K','60K'])

plt.ylim(20,50)
plt.xlim(2,7)

## legend

A well‑placed legend prevents cognitive load. Try `loc='upper left'` or `bbox_to_anchor` for fine control.

A legend maps visual encodings to data series; here we show positioning and styling options.

In [None]:
# legend: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
#plt.style.use("fivethirtyeight")
plt.plot(year,income,color='g',label="income")
plt.legend(loc="upper left", shadow=True, frameon=True, facecolor="white")
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.show()

In [None]:
# legend: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
#plt.style.use("fivethirtyeight")
plt.plot(year,income,color='g',label="income")
plt.legend(loc="upper left", shadow=True, frameon=True, facecolor="white", bbox_to_anchor=(1.05, 1))
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.show()

In [None]:
# legend: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
#plt.style.use("fivethirtyeight")
plt.plot(year,income,color='g',label="income")
plt.legend(loc="upper left", shadow=True, frameon=True, facecolor="white", bbox_to_anchor=(1.05, 1))
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.tight_layout()
plt.show()

### Bar Plots

Bars encode categories vs magnitude; group or stack for comparisons or compositions.

Bar charts are perfect for discrete comparisons. We’ll build simple, grouped and stacked bars.

In [None]:
# Bar Plots: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
plt.bar(year,income)

In [None]:
# Bar Plots: demonstration
year =['2015','2016','2017','2018','2019','2020','2021']
income = [10000,12000,18000,9000,7000,13000,8000]
plt.bar(year,income)

In [None]:
# Bar Plots: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
plt.bar(year,income,color='c')
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.show()

In [None]:
# Bar Plots: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
plt.bar(year,income,color=['red', 'green', 'blue', 'purple'] )
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.show()

## stackplot bar

Stacked bars emphasise part‑to‑whole changes over a shared baseline.

A stackplot is a bar chart variant that shows parts‑of‑whole over categories.

In [None]:
year = np.arange(2015,2022)
income1 = np.random.randint(10000,20000,7)
income2 = np.random.randint(5000,20000,7)

In [None]:
# stackplot bar: demonstration
plt.figure(figsize=(6,6))
plt.bar(year, income1, color='c', bottom=income2, label="Income1")
plt.bar(year, income2, color='r', label="Income2")
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.legend()
plt.show()

In [None]:
# stackplot bar: demonstration
plt.figure(figsize=(6,6))
plt.bar(year,income1,color='c',label="Income1")
plt.bar(year,income2,color='r',bottom=income1,label="Income2")
plt.title("Income By Year")
plt.xlabel("Years")
plt.ylabel('Income')
plt.legend()
plt.show()

## horizontal bar plot

Horizontal orientation prevents label collision for long category names.

Horizontal bars are often clearer for long category names.

In [None]:
# horizontal bar plot: demonstration
year =[2015,2016,2017,2018,2019,2020,2021]
income = [10000,12000,18000,9000,7000,13000,8000]
plt.barh(year,income,color='c')
plt.title("Income By Year")
plt.ylabel("Years")
plt.xlabel('Income')
plt.show()

In [None]:
year = np.arange(2015,2022)
income1 = year = np.arange(2015,2022)
income2 = np.random.randint(5000,20000,20)

In [None]:
year = np.arange(2015,2035)
income = np.random.randint(5000,20000,20)
plt.barh(year,income,color='c')
plt.title("Income By Year")
plt.ylabel("Years")
plt.xlabel('Income')
plt.show()

In [None]:
# How to plot it better?

## Histograms

Histogram bin width dramatically affects interpretation—experiment with `bins`.

Histograms display distributions and let you choose bins, edges and styles.

In [None]:
# Histograms: demonstration
nums = np.random.randn(100)
print(nums)

In [None]:

plt.hist(nums)

In [None]:
plt.hist(nums, bins=50)

In [None]:
# Why it looks so fragmented?

### Scatter Plots

Scatter plots map two continuous variables; you can add a third dimension via `c=` (color) or `s=` (size).

Scatter plots visualise relationship between two quantitative variables; we’ll map extra dimensions via color or size.

In [None]:
# Scatter Plots: demonstration
pup.head()

In [None]:
# Scatter Plots: demonstration
f_heigh =pup['Height'][pup['gen']=='F']
f_Weight=pup['Weight'][pup['gen']=='F']
m_heigh =pup['Height'][pup['gen']=='M']
m_Weight=pup['Weight'][pup['gen']=='M']

In [None]:
plt.figure(figsize=(6,6))
plt.scatter(x=f_heigh, y=f_Weight, label='Female')
plt.scatter(x=m_heigh, y=m_Weight, color='r',  label='Male')
plt.legend()
plt.xlabel("Height (cm)")
plt.ylabel("Weight (kg)")
plt.show()

## Pie Charts

Use pies sparingly—humans judge length better than angle—but they can be persuasive for small part‑to‑whole stories.

Pie and donut charts show part‑to‑whole relationships; use sparingly.

In [None]:
# Pie Charts: demonstration
mask=pup.groupby("Country")["income"].mean()
print(mask)

In [None]:
# Pie Charts: demonstration
plt.pie(mask)
plt.show()

In [None]:
# Exercise: add labels, percent with separated slices

### Subplots

`plt.subplots` returns a Figure and an array of Axes—loop through them to generate dashboards programmatically.

Subplots let you arrange multiple axes in a single figure—great for dashboards or comparisons.

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(15, 5))

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Plot 1: Age Distribution
axes[0, 0].hist(titanic['age'].dropna(), bins=20, color='skyblue', edgecolor='black')
axes[0, 0].set_title('Age Distribution')
axes[0, 0].set_xlabel('Age')
axes[0, 0].set_ylabel('Count')

# Plot 2: Survival Count
# To plot counts with matplotlib, we first need to count the values
survived_counts = titanic['survived'].value_counts()
axes[0, 1].bar(survived_counts.index.map({0: 'No', 1: 'Yes'}), survived_counts.values, color=['red', 'green'])
axes[0, 1].set_title('Survival Count')
axes[0, 1].set_xlabel('Survived')
axes[0, 1].set_ylabel('Count')
axes[0, 1].set_xticks([0, 1]) # Ensure only 0 and 1 are ticks if needed

# Plot 3: Gender Distribution
gender_counts = titanic['sex'].value_counts()
axes[0, 2].bar(gender_counts.index, gender_counts.values, color=['purple', 'orange'])
axes[0, 2].set_title('Gender Distribution')
axes[0, 2].set_xlabel('Gender')
axes[0, 2].set_ylabel('Count')

# Plot 4: Age by Survival (using boxplot from matplotlib)
# Matplotlib's boxplot requires data separated by categories
age_survived = [titanic[titanic['survived'] == 0]['age'].dropna(),
                titanic[titanic['survived'] == 1]['age'].dropna()]
axes[1, 0].boxplot(age_survived, labels=['No', 'Yes'])
axes[1, 0].set_title('Age by Survival')
axes[1, 0].set_xlabel('Survived')
axes[1, 0].set_ylabel('Age')

# Plot 5: Passenger Class
pclass_counts = titanic['pclass'].value_counts().sort_index()
axes[1, 1].bar(pclass_counts.index.astype(str), pclass_counts.values, color=['gold', 'darkblue', 'gray'])
axes[1, 1].set_title('Passenger Class')
axes[1, 1].set_xlabel('Class')
axes[1, 1].set_ylabel('Count')

# Plot 6: Fare vs. Age by Gender (using scatter from matplotlib)
# We need to filter data by gender
male_data = titanic[titanic['sex'] == 'male'].dropna(subset=['fare', 'age'])
female_data = titanic[titanic['sex'] == 'female'].dropna(subset=['fare', 'age'])

axes[1, 2].scatter(male_data['fare'], male_data['age'], color='blue', alpha=0.6, label='Male')
axes[1, 2].scatter(female_data['fare'], female_data['age'], color='red', alpha=0.6, label='Female')
axes[1, 2].set_title('Fare vs. Age by Gender')
axes[1, 2].set_xlabel('Fare')
axes[1, 2].set_ylabel('Age')
axes[1, 2].legend(title='Gender')

plt.show()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

titanic = sns.load_dataset('titanic')

fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Plot 1: Age Distribution with KDE (Kernel Density Estimate) - using Matplotlib's hist with density
ages = titanic['age'].dropna()
axes[0, 0].hist(ages, bins=20, color='skyblue', edgecolor='black', density=True, alpha=0.7)
axes[0, 0].set_title('Age Distribution (Density)')
axes[0, 0].set_xlabel('Age')
axes[0, 0].set_ylabel('Density')
axes[0, 0].set_xlim(0, 80)

# Plot 2: Survival Count with custom bar width and alignment
survived_counts = titanic['survived'].value_counts()
axes[0, 1].bar(x=['No', 'Yes'], height=survived_counts.values, width=0.6, align='center', color=['firebrick', 'forestgreen'], edgecolor='black')
axes[0, 1].set_title('Survival Count')
axes[0, 1].set_xlabel('Survived')
axes[0, 1].set_ylabel('Count')
axes[0, 1].set_ylim(0, 600)
for i, count in enumerate(survived_counts.values):
    axes[0, 1].text(i, count + 10, str(count), ha='center', va='bottom', fontsize=10, color='black')

# Plot 3: Gender Distribution with different colors and rotation for x-tick labels
gender_counts = titanic['sex'].value_counts()
axes[0, 2].bar(gender_counts.index, gender_counts.values, color=['gold', 'purple'], edgecolor='black')
axes[0, 2].set_title('Gender Distribution')
axes[0, 2].set_xlabel('Gender')
axes[0, 2].set_ylabel('Count')
axes[0, 2].tick_params(axis='x', rotation=45)

# Plot 4: Age by Survival (Boxplot)
age_survived_data = [titanic[titanic['survived'] == 0]['age'].dropna(),
                     titanic[titanic['survived'] == 1]['age'].dropna()]
flier_props = dict(marker='o', markerfacecolor='red', markersize=6, linestyle='none', markeredgecolor='black')
axes[1, 0].boxplot(age_survived_data, labels=['Did Not Survive', 'Survived'], flierprops=flier_props, patch_artist=True,
                  boxprops=dict(facecolor='lightblue'))
axes[1, 0].set_title('Age by Survival')
axes[1, 0].set_xlabel('Survival Status')
axes[1, 0].set_ylabel('Age')

# Plot 5: Line Plot - Cumulative Sum of Fare by Class
# This requires some data manipulation for a meaningful line plot
# Let's consider cumulative fare for each class
fare_by_class = titanic.groupby('pclass')['fare'].sum().sort_index()
axes[1, 1].plot(fare_by_class.index.astype(str), fare_by_class.values, color='darkgreen', marker='o', linestyle='-', linewidth=2)
axes[1, 1].set_title('Total Fare by Class')
axes[1, 1].set_xlabel('Passenger Class')
axes[1, 1].set_ylabel('Total Fare')
axes[1, 1].grid(True, linestyle=':', alpha=0.7)


# Plot 6: Line Plot - Passenger count per age (simplified, more for trend)
# For a line plot on age, we might want to group ages or look at unique values
age_counts = titanic['age'].value_counts().sort_index()
axes[1, 2].plot(age_counts.index, age_counts.values, color='darkorange', linestyle='--', marker='^', markersize=4, label='Passengers per Age')
axes[1, 2].set_title('Passenger Count by Age')
axes[1, 2].set_xlabel('Age')
axes[1, 2].set_ylabel('Count')
axes[1, 2].legend()
axes[1, 2].grid(True, linestyle='-', alpha=0.5)


plt.tight_layout()
plt.show()