<a href="https://colab.research.google.com/github/aidanbolinger/MachineLearning/blob/main/matplotlib_ICP5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Matplotlib Practice

This notebook offers a set of exercises to different tasks with Matplotlib.

It should be noted there may be more than one different way to answer a question or complete an exercise.

Different tasks will be detailed by comments or text.

For further reference and resources, it's advised to check out the [Matplotlib documentation](https://matplotlib.org/3.1.1/contents.html).

If you're stuck, don't forget, you can always search for a function, for example if you want to create a plot with `plt.subplots()`, search for [`plt.subplots()`](https://www.google.com/search?q=plt.subplots()).

In [None]:
# Import the pyplot module from matplotlib as plt and make sure
# plots appear in the notebook using '%matplotlib inline'
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Create a simple plot using plt.plot()
plt.plot([1,2,3,4,5])


In [None]:
# Plot a single Python list
plt.show()

In [None]:
# Create two lists, one called X, one called y, each with 5 numbers in them
X = [1,2,3,4,5]
y = [6,7,8,9,10]

In [None]:
# Plot X & y (the lists you've created)
plt.plot(X,y)
plt.show()

There's another way to create plots with Matplotlib, it's known as the object-orientated (OO) method. Let's try it.

In [None]:
# Create a plot using plt.subplots()
fig, ax = plt.subplots()

In [None]:
# Create a plot using plt.subplots() and then add X & y on the axes
fig, ax = plt.subplots()
ax.plot(X, y)
plt.show()

Now let's try a small matplotlib workflow.

In [None]:
# Import and get matplotlib ready
import matplotlib.pyplot as plt

# Prepare data (create two lists of 5 numbers, X & y)
X = [1,2,3,4,5]
y = [10,20,30,40,50]

# Setup figure and axes using plt.subplots()
fig, ax = plt.subplots()

# Add data (X, y) to axes
ax.plot(X, y)

# Customize plot by adding a title, xlabel and ylabel
ax.set(title="Simple Plot", xlabel="X", ylabel="y")

# Save the plot to file using fig.savefig()
fig.savefig("SmallWorkflow.png")
plt.show()

Okay, this is a simple line plot, how about something a little different?

To help us, we'll import NumPy.

In [None]:
# Import NumPy as np
import numpy as np

In [None]:
# Create an array of 100 evenly spaced numbers between 0 and 100 using NumPy and save it to variable X
X = np.linspace(0, 100, 100)

In [None]:
# Create a plot using plt.subplots() and plot X versus X^2 (X squared)
fig, ax = plt.subplots()
ax.plot(X, X**2)
plt.show()

We'll start with scatter plots.

In [None]:
# Create a scatter plot of X versus the exponential of X (np.exp(X))
plt.scatter(X, np.exp(X))
plt.show()

In [None]:
# Create a scatter plot of X versus np.sin(X)
plt.scatter(X, np.sin(X))
plt.show()

How about we try another type of plot? This time let's look at a bar plot. First we'll make some data.

In [None]:
# Create a Python dictionary of 3 of your favourite foods with
# The keys of the dictionary should be the food name and the values their price
favoriteFoods = {
    "Pizza": 10,
    "Burger": 5,
    "Ice Cream": 3
}

In [None]:
# Create a bar graph where the x-axis is the keys of the dictionary
# and the y-axis is the values of the dictionary
plt.bar(favortieFoods.keys(), favoriteFoods.values())


# Add a title, xlabel and ylabel to the plot
plt.title("My Favourite Foods")
plt.xlabel("Food")
plt.ylabel("Price")

plt.show()

In [None]:
# Make the same plot as above, except this time make the bars go horizontal
plt.barh(list(favoriteFoods.keys()), list(favoriteFoods.values()))
plt.title("My Favourite Foods (Horizontal)")
plt.xlabel("Price")
plt.ylabel("Food")
plt.show()

All this food plotting is making me hungry. But we've got a couple of plots to go.

Let's see a histogram.

In [None]:
# Create a random NumPy array of 1000 normally distributed numbers using NumPy and save it to X
X = np.random.randn(1000)

# Create a histogram plot of X
plt.hist(X)
plt.show()

In [None]:
# Create a NumPy array of 1000 random numbers and save it to X
X = np.random.rand(1000)

# Create a histogram plot of X
plt.hist(X)
plt.show()


Notice how the distributions (spread of data) are different. Why do they differ?


Now let's try make some subplots. A subplot is another name for a figure with multiple plots on it.

In [None]:
# Create an empty subplot with 2 rows and 2 columns (4 subplots total)
fig, axs = plt.subplots(2, 2)
plt.show()

Notice how the subplot has multiple figures. Now let's add data to each axes.

In [None]:
# Create the same plot as above with 2 rows and 2 columns and figsize of (10, 5)
fig, axs = plt.subplots(2, 2, figsize=(10, 5))

# Plot X versus X/2 on the top left axes
axs[0, 0].plot(X, X/2)
axs[0, 0].set_title("X vs X/2")

# Plot a scatter plot of 10 random numbers on each axis on the top right subplot
axs[0, 1].scatter(np.random.rand(10), np.random.rand(10))
axs[0, 1].set_title("Scatter Plot of Random Numbers")

# Plot a bar graph of the favourite food keys and values on the bottom left subplot
axs[1, 0].bar(favoriteFoods.keys(), favoriteFoods.values())
axs[1, 0].set_title("Favourite Foods Bar Graph")

# Plot a histogram of 1000 random normally distributed numbers on the bottom right subplot
axs[1, 1].hist(np.random.randn(1000))
axs[1, 1].set_title("Histogram of Random Normals")

plt.show()

Now we've seen how to plot with Matplotlib and data directly.

First we'll need to import pandas and create a DataFrame work with.

In [1]:
# Import pandas as pd
import pandas as pd

In [9]:
# Import the '../data/car-sales.csv' into a DataFame called car_sales and view
carSales = pd.read_csv('../data/car-sales.csv')
print(carSales)

FileNotFoundError: [Errno 2] No such file or directory: '../data/car-sales.csv'

In [None]:
# Try to plot the 'Price' column using the plot() function
carSales['Price'].plot()

plt.show()

Why doesn't it work?

Hint: It's not numeric data.

In the process of turning it to numeric data, let's create another column which adds the total amount of sales and another one which shows what date the car was sold.

Hint: To add a column up cumulatively, look up the cumsum() function. And to create a column of dates, look up the date_range() function.

In [None]:
# Remove the symbols, the final two numbers from the 'Price' column and convert it to numbers
carSales['Price'] = carSales['Price'].str.replace('[\$\,\.]', '').astype(int)

In [None]:
# Add a column called 'Total Sales' to car_sales which cumulatively adds the 'Price' column
carSales['Total Sales'] = carSales['Price'].cumsum()

# Add a column called 'Sale Date' which lists a series of successive dates starting from today (your today)
carSales['Sale Date'] = pd.date_range('today', periods=len(carSales))

# View the car_sales DataFrame
print(carSales)

Now we've got a numeric column (`Total Sales`) and a dates column (`Sale Date`), let's visualize them.

In [None]:
# Use the plot() function to plot the 'Sale Date' column versus the 'Total Sales' column
carSales.plot(x='Sale Date', y='Total Sales', title = 'Total Sales Over Time')
plt.show()

In [None]:
# Convert the 'Price' column to the integers
carSales['Price'] = carSales['Price'].astype(int)

# Create a scatter plot of the 'Odometer (KM)' and 'Price' column using the plot() function
carSales.plot(x='Odometer (KM)', y='Price', kind='scatter', title = 'Odometer vs Price')
plt.show()

In [None]:
# Create a NumPy array of random numbers of size (10, 4) and save it to X
X = np.random.rand(10, 4)

# Turn the NumPy array X into a DataFrame with columns called ['a', 'b', 'c', 'd']
df = pd.DataFrame(X, columns=['a', 'b', 'c', 'd'])

# Create a bar graph of the DataFrame
df.plot(kind='bar')

plt.show()

In [None]:
# Create a bar graph of the 'Make' and 'Odometer (KM)' columns in the car_sales DataFrame
carSales.plot(x='Make', y='Odometer (KM)', kind='bar')
plt.show()

In [None]:
# Create a histogram of the 'Odometer (KM)' column
carSales['Odometer (KM)'].plot(kind='hist')
plt.show()

In [None]:
# Create a histogram of the 'Price' column with 20 bins
carSales['Price'].plot(kind='hist', bins=20)
plt.show()

Now we've seen a few examples of plotting directly from DataFrames using the `car_sales` dataset.

Let's try using a different dataset.

In [None]:
# Import "../data/heart-disease.csv" and save it to the variable "heart_disease"
heartDisease = pd.read_csv('../data/heart-disease.csv')

In [None]:
# View the first 10 rows of the heart_disease DataFrame
print(heartDisease.head(10))

In [None]:
# Create a histogram of the "age" column with 50 bins
heartDisease['age'].plot(kind='hist', bins=50)
plt.show()

In [None]:
# Call plot.hist() on the heart_disease DataFrame and toggle the
# "subplots" parameter to True
heartDisease.plot.hist(subplots=True)
plt.show()

That plot looks pretty squished. Let's change the figsize.

In [None]:
# Call the same line of code from above except change the "figsize" parameter
# to be (10, 30)
heartDisease.plot.hist(subplots=True, figsize=(10, 30))
plt.show()

Now let's try comparing two variables versus the target variable.

More specifially we'll see how age and cholesterol combined effect the target in **patients over 50 years old**.

For this next challenge, we're going to be replicating the plot1:


In [None]:
# Replicate the above plot in whichever way you see fit

# Note: The method below is only one way of doing it, yours might be
# slightly different

# Create DataFrame with patients over 50 years old
df = heartDisease[heartDisease['age'] > 50]

# Create the plot
fig, ax = plt.subplots()

# Plot the data
ax.scatter(df['age'], df['chol'])

# Customize the plot
ax.set(title="Age vs Cholesterol", xlabel="Age", ylabel="Cholesterol")

# Add a meanline
ax.axhline(df['chol'].mean(), color='red', linestyle='--')

Now you've created a plot of two different variables, let's change the style.

In [None]:
# Check what styles are available under plt
availableStyles = plt.style.available
print(availableStyles)

In [None]:
# Change the style to use "seaborn-whitegrid"
plt.style.use("seaborn-whitegrid")

Now the style has been changed, we'll replot the same figure from above and see what it looks like.

If you've changed the style correctly, it should look like plot2


In [None]:
# Reproduce the same figure as above with the "seaborn-whitegrid" style

# Create the plot
fig, ax = plt.subplots()

# Plot the data
ax.scatter(df['age'], df['chol'])

# Customize the plot
ax.set(title="Age vs Cholesterol", xlabel="Age", ylabel="Cholesterol")

# Add a meanline
ax.axhline(df['chol'].mean(), color='red', linestyle='--')

Wonderful, you've changed the style of the plots and the figure is looking different but the dots aren't a very good colour.

Let's change the `cmap` parameter of `scatter()` as well as the `color` parameter of `axhline()` to fix it.

Completing this step correctly should result in a figure which looks like plot3

In [None]:
# Replot the same figure as above except change the "cmap" parameter
# of scatter() to "winter"
# Also change the "color" parameter of axhline() to "red"

# Create the plot
fig, ax = plt.subplots()

# Plot the data
ax.scatter(df['age'], df['chol'], cmap='winter')

# Customize the plot
ax.set(title="Age vs Cholesterol", xlabel="Age", ylabel="Cholesterol")

# Add a meanline
ax.axhline(df['chol'].mean(), color='red', linestyle='--')

Beautiful! Now our figure has an upgraded color scheme let's save it to file.

In [None]:
# Save the current figure using savefig(), the file name can be anything you want
fig.savefig('figure1.png')

In [None]:
# Reset the figure by calling plt.subplots()
fig, ax = plt.subplots()