# Matplotlib Practice

This notebook offers a set of exercises to different tasks with Matplotlib.

It should be noted there may be more than one different way to answer a question or complete an exercise.

Different tasks will be detailed by comments or text.

For further reference and resources, it's advised to check out the [Matplotlib documentation](https://matplotlib.org/3.1.1/contents.html).

If you're stuck, don't forget, you can always search for a function, for example if you want to create a plot with `plt.subplots()`, search for [`plt.subplots()`](<https://www.google.com/search?q=plt.subplots()>).


In [None]:
# Import the pyplot module from matplotlib as plt

import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Create a simple plot using plt.plot()

plt.plot()

In [None]:
# Plot a single Python list

plt.plot(np.random.randn(24))

In [None]:
# Create two lists, one called X, one called y, each with 5 numbers in them

x, y = np.random.randint(0, 10, 5), np.random.randint(0, 10, 5)

In [None]:
# Plot X & y (the lists you've created)

fig, (ax0, ax1) = plt.subplots(nrows=2, ncols=1)

ax0.plot(x)
ax0.set(xlim=(0, 4), ylim=(0, 10))

ax1.plot(y)
ax1.set(xlim=(0, 4), ylim=(0, 10))

In [None]:
# Create a plot using plt.subplots() and then add X & y on the axes
fig, ax = plt.subplots()

ax.plot(x, y)

In [None]:
# Import and get matplotlib ready
import matplotlib.pyplot as plt

# Prepare data (create two lists of 8 numbers, X & y)
x, y = np.random.randint(0, 10, 8), np.random.randint(0, 10, 8)

# Setup figure and axes using plt.subplots()
fig, ax = plt.subplots()

# Add data (X, y) to axes
ax.scatter(x, y)

# Customize plot by adding a title, xlabel and ylabel
ax.set(
    title="Random Plot",
    xlabel="X-Axis",
    ylabel="Y-Axis",
    xlim=(0, 10),
    ylim=(0, 10),
)

# Save the plot to file using fig.savefig()
fig.savefig("./data/random-plot.png")

In [None]:
# Create an array of 100 evenly spaced numbers between 0 and 100 using NumPy and save it to variable X
x = np.linspace(0, 99, 100)

In [None]:
# Create a plot using plt.subplots() and plot X versus X^2 (X squared)
fig, ax = plt.subplots()
ax.plot(x, np.square(x))

In [None]:
# Create a scatter plot of X versus the exponential of X (np.exp(X))
fig, ax = plt.subplots()
ax.scatter(x, np.exp(x))

In [None]:
# Create a scatter plot of X versus np.sin(X)
fig, ax = plt.subplots()
ax.scatter(x, np.sin(x))

In [None]:
# Create a Python dictionary of 3 of your favourite foods and their prices
foods = {
    "Pizza": 7.50,
    "Pasta": 5,
    "Burger": 8,
    "Hot Dog": 3.50,
}

In [None]:
# Create a bar graph where the x-axis is the keys and the y-axis is the values of the dictionary
fig, ax = plt.subplots()
ax.bar(foods.keys(), foods.values())

# Add a title, xlabel and ylabel to the plot
ax.set(
    title="Food Prices",
    xlabel="Food",
    ylabel="Price",
)

In [None]:
# Make the same plot as above, except this time make the bars go horizontal
fig, ax = plt.subplots()
ax.barh(list(foods.keys()), list(foods.values()))

ax.set(
    title="Food Prices",
    xlabel="Food",
    ylabel="Price",
)

In [None]:
# Create a random NumPy array of 1000 normally distributed numbers using NumPy and save it to X
x = np.random.randn(1000)

# Create a histogram plot of X
fig, ax = plt.subplots()
ax.hist(x)

In [None]:
# Create a NumPy array of 1000 random numbers and save it to X
x = np.random.random(1000)

# Create a histogram plot of X
fig, ax = plt.subplots()
ax.hist(x)

In [None]:
x = np.linspace(0, 99, 100)

# Create an empty subplot with 2 rows and 2 columns (4 subplots total)
fig, ((ax0, ax1), (ax2, ax3)) = plt.subplots(ncols=2, nrows=2)

# Plot X versus X/2 on the top left axes
ax0.plot(x, np.square(x))

# Plot a scatter plot of 10 random numbers on each axis on the top right subplot
ax1.scatter(np.random.randint(0, 10, 10), np.random.randint(0, 10, 10))

# Plot a bar graph of the favourite food keys and values on the bottom left subplot
ax2.bar(foods.keys(), foods.values())

# Plot a histogram of 1000 random normally distributed numbers on the bottom right subplot
ax3.hist(np.random.randn(1000))

In [None]:
# Import pandas as pd
import pandas as pd

In [None]:
# Import the '../data/car-sales.csv' into a DataFame called car_sales
car_sales = pd.read_csv("./data/car-sales.csv")

In [None]:
car_sales["Odometer (KM)"].fillna(car_sales["Odometer (KM)"].mean(), inplace=True)
car_sales["Odometer (KM)"] = car_sales["Odometer (KM)"].round(0).astype(int)

car_sales["Doors"].fillna(round(car_sales["Doors"].mean()), inplace=True)
car_sales["Doors"] = car_sales["Doors"].round(0).astype(int)

# Remove the symbols, the final two numbers from the 'Price' column and convert it to numbers
car_sales["Price"] = car_sales["Price"].replace("[\$,]", "", regex=True).astype(float)
car_sales["Price"].fillna(car_sales["Price"].mean(), inplace=True)
car_sales["Price"] = car_sales["Price"].divide(100).round(0).multiply(100).astype(int)

In [None]:
from datetime import datetime

# Add a column called 'Total Sales' to car_sales which cumulatively adds the 'Price' column
car_sales["Total Sales"] = car_sales["Price"].cumsum()

# Add a column called 'Sale Date' which lists a series of successive dates starting from today (your today)
car_sales["Date of Sale"] = pd.date_range(
    datetime.today().date(), periods=len(car_sales)
)

# View the car_sales DataFrame
car_sales

In [None]:
# Use the plot() function to plot the 'Sale Date' column versus the 'Total Sales' column
fig, ax = plt.subplots()
ax.plot(car_sales["Date of Sale"], car_sales["Total Sales"])

In [None]:
# Create a scatter plot of the 'Odometer (KM)' and 'Price' column using the plot() function
fig, ax = plt.subplots()
ax.scatter(car_sales["Odometer (KM)"], car_sales["Price"])

In [None]:
# Create a NumPy array of random numbers of size (10, 4) and save it to X
x = np.random.randint(0, 9, (10, 4))
columns = ["a", "b", "c", "d"]

# Turn the NumPy array X into a DataFrame with columns called ['a', 'b', 'c', 'd']
df = pd.DataFrame(x, columns=columns)

# Create a bar graph of the DataFrame
fig, ax = plt.subplots()
ax.bar(x=df.columns, height=df.sum())

In [None]:
# Create a bar graph of the 'Make' and 'Odometer (KM)' columns in the car_sales DataFrame
fig, ax = plt.subplots()
group_data = car_sales.groupby("Make")["Odometer (KM)"].mean()
ax.bar(group_data.keys(), group_data.values)

In [None]:
# Create a histogram of the 'Odometer (KM)' column
fig, ax = plt.subplots()
ax.hist(car_sales["Odometer (KM)"])

In [None]:
# Create a histogram of the 'Price' column with 20 bins
fig, ax = plt.subplots()
ax.hist(car_sales["Price"], 20)

In [None]:
# Import "../data/heart-disease.csv" and save it to the variable "heart_disease"
heart_disease = pd.read_csv("./data/heart-disease.csv")

In [None]:
# View the first 10 rows of the heart_disease DataFrame
heart_disease.head(10)

In [None]:
# Create a histogram of the "age" column with 50 bins
fig, ax = plt.subplots()
ax.hist(heart_disease["age"], bins=50)

In [None]:
# Call the same line of code from above except change the "figsize" parameter to be (4, 6)
fig, ax = plt.subplots()
fig.set_size_inches(4, 6)
ax.hist(heart_disease["age"], bins=50)

Now let's try comparing two variables versus the target variable.

More specifially we'll see how age and cholesterol combined effect the target in **patients over 50 years old**.

In [None]:
# Replicate the above plot (../images/matplotlib-heart-disease-chol-age-plot.png)
fig, ax = plt.subplots()
heart_disease_over_50 = heart_disease[heart_disease["age"] > 50]
plot = ax.scatter(
    heart_disease_over_50["age"],
    heart_disease_over_50["chol"],
    c=heart_disease_over_50["target"],
    cmap="winter",
)
ax.axhline(
    heart_disease_over_50["chol"].mean(),
    color="red",
    linestyle="--",
    linewidth=1,
)
ax.set(
    xlabel="Age",
    ylabel="Cholesterol",
)
ax.legend(*plot.legend_elements(), loc="upper right", title="Target")

In [None]:
# Check what styles are available under plt
plt.style.available

In [None]:
# Change the style to use "seaborn-whitegrid"
plt.style.use("seaborn-v0_8-whitegrid")

In [None]:
# Reproduce the same figure as above with the "seaborn-whitegrid" style
fig, ax = plt.subplots()
heart_disease_over_50 = heart_disease[heart_disease["age"] > 50]
plot = ax.scatter(
    heart_disease_over_50["age"],
    heart_disease_over_50["chol"],
    c=heart_disease_over_50["target"],
    cmap="winter",
)
ax.axhline(
    heart_disease_over_50["chol"].mean(),
    color="red",
    linestyle="--",
    linewidth=1,
)
ax.set(
    xlabel="Age",
    xlim=(50, 80),
    ylabel="Cholesterol",
    ylim=(10, 600),
)
ax.set_xlim
ax.legend(*plot.legend_elements(), loc="upper right", title="Target")

In [None]:
# Save the current figure using savefig(), the file name can be anything you want
fig.savefig("./figures/cholesterol-by-age.png")

In [None]:
# Reset the figure by calling plt.subplots()
plt.subplots()

## Extensions

For more exercises, check out the [Matplotlib tutorials page](https://matplotlib.org/3.1.1/tutorials/index.html). A good practice would be to read through it and for the parts you find interesting, add them into the end of this notebook.

The next place you could go is the [Stack Overflow page for the top questions and answers for Matplotlib](https://stackoverflow.com/questions/tagged/matplotlib?sort=MostVotes&edited=true). Often, you'll find some of the most common and useful Matplotlib functions here. Don't forget to play around with the Stack Overflow filters! You'll likely find something helpful here.

Finally, as always, remember, the best way to learn something new is to try it. And try it relentlessly. Always be asking yourself, "is there a better way this data could be visualized so it's easier to understand?"
