## Session 3 - Matplotlib and figure plotting in Python

Matplotlib is the most widely used plotting library in Python. Many plot types are implemented in Matplotlib, some very useful, some more niche. 

Compared to excel, the main advantage of Matplotlib is the ability to have ultimate control over your graphs and figures.

The main disadvantage is the difficulty of getting Matplotlib to do **exactly** what you want - excel is definitely faster.


# Example 1 - basic matplotlib usage.

All plotting functions in Matplotlib are contained within the Pyplot module imported via: ```import matplotlib.pyplot as plt```

All of the functions are accessed via plt - similar to how we accessed numpy functions using np.*something* in the previous session.

The below function takes some data (x axis and y axis) and produces a lineplot using ```plt.plot```. Without any other aruments, matplotlib uses default settings.

In [None]:
# Import the pyplot module
import matplotlib.pyplot as plt 

# Generate data as lists
x = [0, 1, 2, 3, 4, 5]
y = [0, 1, 4, 9, 16, 25]

# plt the graph x, y
plt.plot(x, y)

# Show the graph
plt.show()

We often want to plot multiple things on the same axes. The simplest way to do this using matplotlib is by producing multiple graphs using plt.*something*

Note that the second plot has an orange line - this is the default behaviour

In [None]:
# Define the lists
x = [0, 1, 2, 3, 4, 5]
y1 = [0, 1, 4, 9, 16, 25]
y2 = [0, 1, 8, 27, 64, 125]

# Plot the first line
plt.plot(x, y1)

# Then the second.
plt.plot(x, y2)

# And display the graph
plt.show()

scatterplots are another commonly used graph type in matplotlib. See below another example where we have a lineplot and a scatterplot on the same axis.

Examples of when we may want to do this is if we fit a function to some data (e.g. sigmoidal function to dose-response curves) or a moving average to some noisy data (e.g. covid deaths by date)

In [None]:
# Define the lists
x = [0, 1, 2, 3, 4, 5]
observed_data = [ 0.29052297,  2.67672771,  4.84741051,  9.70210351, 18.68549226, 23.26443173]
fit_function = [0, 1, 4, 9, 16, 25]

# Plot the fit function as a lineplot
plt.plot(x, fit_function)

# Then the second as a scatterplot
plt.scatter(x, observed_data)

# And display the graph
plt.show()

### Exercise 1 - plotting line and scatter graphs

Using the data given below, produce lineplot and scatterplot graphs

In [None]:
# Data for scatter plots

x_scatter = [1, 2, 3, 4, 5, 6]
y_scatter = [10, 15, 13, 20, 22, 24]

# Write code for producing scatter plots below



In [None]:
# Data for line plots

x_line = [0, 1, 2, 3, 4, 5]
y_line = [0, 1, 4, 9, 16, 25]

# Write code for producing line plots below


### Example 2 - customising graphs

The default matplotlib plots are very basic, we need to add chart elements and modify chart attributes to clearly describe what the plot shows.

Commonly modified chart Elements:
- ylabel
- xlabel
- title
- legend

See examples below modifying our multiple line graph

In [None]:
# Define the lists
x = [0, 1, 2, 3, 4, 5]
y1 = [0, 1, 4, 9, 16, 25]
y2 = [0, 1, 8, 27, 64, 125]

# Plot the first line. Note that we add the argument "label" which tells matplotlib what data this drawing object is showing.
plt.plot(x, y1, label = "y = x^2")

# Then the second with a different label argument
plt.plot(x, y2, label = "y = x^3")

# Here we can add chart elements. Labels and titles need a string argument.
plt.xlabel("Input Values")
plt.ylabel("Output Values")
plt.title("Result of X raised to the 2nd or 3rd Power")

# plt.legend collects all labels used to generate the graph and produce a figure legend
plt.legend()

# And display the graph
plt.show()

We can also modify the way our charts look. This is different for each graph type and we will cover more in turn.

Options for lineplot:
color --> The color of the line
Linestyle --> The way the line looks, e.g. solid, dashed
linewidth --> The width of the line

In [None]:
# Define the lists
x = [0, 1, 2, 3, 4, 5]
y1 = [0, 1, 4, 9, 16, 25]
y2 = [0, 1, 8, 27, 64, 125]

# Plot the first line. Note that we set the color of this line to red.
plt.plot(x, y1, label = "y = x^2", color="red")

# Then the second with a different color. We also set this linestyle to dashed "--"
plt.plot(x, y2, label = "y = x^3", color="purple", linestyle="--")

# Here we can add chart elements. Labels and titles need a string argument.
plt.xlabel("Input Values")
plt.ylabel("Output Values")
plt.title("Result of X raised to the 2nd or 3rd Power")

# plt.legend collects all labels used to generate the graph and produce a figure legend
plt.legend()

# And display the graph
plt.show()

### Exercise 2 - customising plots

Add chart elements to the plots below.

In [None]:
# Month of the year in indices - note that these numbers are completely arbitrary and do not necessarily have to fit python indexing
months = [1,2,3,4,5,6,7,8,9,10,11,12]

# Sales in GBP
sales_2021 = [543, 324, 234, 366, 789, 1006, 1089, 1500, 760, 520, 690, 800]
sales_2022 = [450, 380, 340, 356, 560, 689, 897, 780, 600, 540, 700, 810]

# Produce lineplots of sales in each years - add label  information for each year


# Add the legend object

# Show the graph
plt.show()

### Exercise 3 - add additional elements to the graph

In [None]:
import numpy as np

# Use numpy to calculate the average sales

# Add the lineplots from the previous exercise
# - Additionally set the two lines with specific colors


# Add an additional lineplot for the average monthly value as a dotted line

# Add a legend, X-label and Y-label

# Show the new graph
plt.show()

### Example 3 - Multiple subplots

Subplots in **matplotlib** really useful to compare multiple datasets or show different aspects of data side by side. Instead of creating separate figures, you can pack everything into one neatly organized view, which makes it easier to spot trends and tell a cohesive story with your visualizations.

The first way to do this is to use the plt.subplots() function. This initialises a figure object and an array of subplot objects.

The second way to do this is to initialise a figure using plt.figure(). This produces a figure object which allows use of a method to ada subplot to a figure.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Produce some synthetic data for plotting
x = np.linspace(0, 2 * np.pi, 100)

# Evaluate y-values for each function
y_sin = np.sin(x)
y_cos = np.cos(x)
y_tan = np.tan(x)
y_quad = x**2

The below cell shows what's happening under the hood of the subplots function

In [None]:
# Creating subplots
fig, axs = plt.subplots(nrows = 2, ncols= 2, figsize=(10, 8))

print("fig object type:",type(fig))
print("axs object type:", type(axs))
# note that the shape of the array is determined by the nrows and ncols values
print("axs object shape:", axs.shape)
print("axs element type:", type(axs[0,0]))

# Note that a blank figure is displayed in the notebook.

plt.show()

Now we can plot our multiple plot figure.

Note that each subplot object of the axs array are accessed identically to a numpy array. 

Each subplot has the same plotting methods as the pyplot package (plt) with some differences. 

Note that instead of calling plt.xlabel directly to set the xlabel, we use ```axs[0,0].set_xlabel("x")``` to set the xlabel attribute.

In [None]:
# Creating subplots
fig, axs = plt.subplots(nrows = 2, ncols = 2, layout="tight") # layout keywords prevents overlapping subplots.

# First subplot
axs[0, 0].plot(x, y_sin)
axs[0, 0].set_title('Sine Function')
axs[0, 0].set_xlabel('x')
axs[0, 0].set_ylabel('sin(x)')

# Second subplot
axs[0, 1].plot(x, y_cos)
axs[0, 1].set_title('Cosine Function')
axs[0, 1].set_xlabel('x')
axs[0, 1].set_ylabel('cos(x)')

# Third subplot
axs[1, 0].plot(x, y_tan)
axs[1, 0].set_title('Tangent Function')
axs[1, 0].set_xlabel('x')
axs[1, 0].set_ylabel('tan(x)')
axs[1, 0].set_ylim(-10, 10)  # Limit y to avoid extreme values

# Fourth subplot
axs[1, 1].plot(x, y_quad)
axs[1, 1].set_title('Quadratic Function')
axs[1, 1].set_xlabel('x')
axs[1, 1].set_ylabel('x^2')

# Display the plots
plt.show()

The same result can be achieved using the ```fig.add_subplot``` method

The grid of subplots is defined when you add the subplot. For example, if we want a 2x2 grid and want to add a subplot in the first quadrant of the grid, we would use:
```fig.add_subplot(2,2,1)``` meaning "assume a 2 row x 2 column grid and give me the first quardant" - **matplotlib indexes from 1: to me this feels like a weird choice, but you just have to live with it**

In [None]:
# First define the figure
fig = plt.figure(layout="tight")

# Add a subplot to the figure
ax = fig.add_subplot(2,2,1) # 2x2 grid, 1st element of the grid
ax.plot(x, y_sin) # this time a scatterplot
ax.set_ylabel("sin(x)")
ax.set_xlabel("x")

# Add another subplot to the figure
ax = fig.add_subplot(2,2,2) # 2x2 grid, 2nd element of the grid
# The ax variable now refers to the second element
ax.plot(x, y_cos) # this time a scatterplot
ax.set_ylabel("cos(x)")
ax.set_xlabel("x")

# Add another subplot to the figure
ax = fig.add_subplot(2,2,3) # 2x2 grid, 3rd element of the grid
ax.plot(x, y_tan) # this time a scatterplot
ax.set_ylabel("tan(x)")
ax.set_xlabel("x")
ax.set_ylim(-10, 10)  # Limit y to avoid extreme values


# Add another subplot to the figure
ax = fig.add_subplot(2,2,4) # 2x2 grid, 4th element of the grid
ax.plot(x, y_quad) # this time a scatterplot
ax.set_ylabel("x^2")
ax.set_xlabel("x")

# Display the plots
plt.show()

In [None]:
# We can also make weird arrangements of plots like this:

fig = plt.figure(layout="tight")

ax = fig.add_subplot(2,2,1)
ax.hist(y_sin, bins=16) # This time we're using a histogram
ax.set_title("Histogram of Sin(x)")


ax = fig.add_subplot(2,1,2)
ax.plot(x, y_sin)
ax.set_title("Plot of Sin(x)")
ax.set_ylabel("sin(x)")
ax.set_xlabel("x")

plt.show()

### Exercise 4 - Plotting IC50 Data for multiple drugs

In the data folder, I have included some fake dose response data for three drugs: erlotinib, gefitinib and lenvatinib

Plot graphs following the prompts to show **scatterplots** of each replica point and a **lineplot** of the average value between each replica 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read in dataframes for each drug using pandas
erlotinib_df = 
gefitinib_df = 
lenvatinib_df = 

# Initialise figure and axes
# Note the way we set the figure size with the figsize argument
fig, axes = plt.subplots(3,1, layout="constrained", figsize=(8,6))

# See below that we need to use a log scale on our X-axis
for ax in axes:
    ax.set_xscale('log')

# Produce scatterplots for each drug in each axes. Hint: you can use a for loop to iterate through the columns of the dataframes
for col in ["Response (Replicate 1)","Response (Replicate 2)","Response (Replicate 3)"]:

# Add lineplots for the average values

# Add legends, labels and titles to each graph


### Exercise 5 - Hydrogen bond donor/acceptor data

Load in the dataframe ```"./data/lipinski_properties.csv"``` and plot a 2x2 grid of histograms for each feature.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Load the CSV data into a DataFrame
data = pd.read_csv("./data/lipinski_properties.csv")

# Create a new figure
fig = plt.figure(figsize=(10, 8))

# Add subplot for Hydrogen Bond Donors
ax1 = fig.add_subplot( # finish code here

# Add subplot for Hydrogen Bond Acceptors

# Add subplot for Molecular Weight

# Add subplot for LogP

# Adjust layout for better spacing
plt.tight_layout()

# Display the plot
plt.show()