[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QecD3OF_fvgp73ZcbDweGS-0la53h6t-?usp=sharing)

## **Week 2 - Introduction to Matplotlib - Solutions**

### **Solutions - Setting up Data**

In [None]:
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np


# Load a chemistry dataset from a GitHub repository
data_url = "https://github.com/RodrigoAVargasHdz/CHEM-4PB3/raw/main/Course_Notes/data/qm9.csv"
data = pd.read_csv(data_url)

# Select single columns for HOMO and LUMO
homo = np.array(data['homo'])  # Extract HOMO
lumo = data.lumo.to_numpy()  # Extract LUMO


# Select multiple columns for HOMO and LUMO
homo_and_lumo = np.array(data[['homo', 'lumo']])



### **Solution: Plotting Alpha and Mu**



In [None]:
# Extracting 'alpha' and 'mu' columns from the dataset
alpha = np.array(data['alpha'])
mu = np.array(data['mu'])

figure(figsize=(8, 6), dpi=80)

# Creating the scatter plot
plt.scatter(alpha, mu)
plt.xlabel('Alpha', fontsize=15)
plt.ylabel('Mu', fontsize=15)
plt.show()


### **Solution: Statistical Insights**

In [None]:
print("Mean of HOMO:", np.mean(homo))
print("Median of HOMO:", np.median(homo))
print("Standard Deviation of HOMO:", np.std(homo))

print("Mean of LUMO:", np.mean(lumo))
print("Median of LUMO:", np.median(lumo))
print("Standard Deviation of LUMO:", np.std(lumo))

>  Review the Mean, Median, and Mode syntax in the [**Week 1**](https://colab.research.google.com/drive/14xmNNrXwtj65L2tsRaEXBHVPz1vasKsV?usp=sharing) lesson.

### **Solution: Enhanced Visualization**

In [None]:
# Setting up the figure
figure(figsize=(8, 6), dpi=80)

# Creating the scatter plot with labels and title
plt.scatter(homo, lumo)
plt.xlabel('HOMO', fontsize=15)
plt.ylabel('LUMO', fontsize=15)
plt.title('HOMO vs LUMO Scatter Plot', fontsize=18)

# Adding a grid
plt.grid(True)

# Adding a legend
plt.legend(['HOMO vs LUMO'])
plt.show()


> Consult the [**`plt.grid()`**](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html) documentation to learn more about it's syntax and use cases.



### **Solution: Data Highlighting**

In [None]:
condition = (homo > -0.2) & (lumo < 0.2)
figure(figsize=(8, 6), dpi=80)

# Creating the scatter plot with labels and title
plt.scatter(homo, lumo, c='blue')
plt.scatter(homo[condition], lumo[condition], c='red')

# Adding a grid
plt.xlabel('HOMO', fontsize=15)
plt.ylabel('LUMO', fontsize=15)
plt.legend(['Regular Points', 'Highlighted Points'])
plt.show()


### **Solution: Fitting Curves with np.polyfit()**

In [None]:
# Reshape the 'homo' array and stack it with an array of ones
homo_and_ones = np.column_stack((homo.reshape((-1, 1)), np.ones_like(homo)))

# Calculate the slope (m) and intercept (b) using least squares
m_and_c = np.linalg.lstsq(homo_and_ones, lumo, rcond=None)[0]
m = m_and_c[0]
b = m_and_c[1]
print(f'Slope (m) = {m}; Intercept (b) = {b}')

> Calculating the Slope $(m)$ and intercept $(b)$ to calculate the regression curve allows for easier visualization.

In [None]:
%matplotlib inline
figure(figsize=(8, 6), dpi=80)
plt.scatter(homo,lumo)

x = np.linspace(np.min(homo),np.max(homo),100)
f = lambda x: m *x + b

y = f(x)
plt.plot(x,y, c='k')

plt.title("Regression Model - HOMO versus LUMO")
plt.xlabel('HOMO - V(eV)')
plt.ylabel('LUMO - V(eV)')

### **Solution: Experimenting with Colors and Symbols**

In [None]:
plt.title("Scatter Plot - HOMO versus LUMO")
plt.xlabel('HOMO - V(eV)')
plt.ylabel('LUMO - V(eV)')
plt.scatter(homo, lumo, marker='o', color='red')

> Learn more about **`Matplotlib Styling`** syntax and use cases at **Python Charts'** **[lesson](https://python-charts.com/matplotlib/title/)**.

### **Solution: Customizing Line Styles**

In [None]:
%matplotlib inline
from matplotlib.pyplot import figure
import matplotlib.pyplot as plt
import numpy as np

figure(figsize=(8, 6), dpi=80)
plt.scatter(homo, lumo)

x = np.linspace(np.min(homo), np.max(homo), 100)
f = lambda x: m * x + b
y = f(x)

# Here, the linestyle is set to '--', making it a dashed line.
plt.plot(x, y, c='k', linestyle='--')

plt.title("Scatter Plot - HOMO versus LUMO", fontsize = 16)
plt.xlabel('HOMO - V(eV)', fontsize=12)
plt.ylabel('LUMO - V(eV)', fontsize=12)


### **Solution: Creating a Histogram with *HOMO* Data**

In [None]:
# Create a histogram with 20 bins
plt.hist(homo, bins=20, color='blue')
plt.title("Histogram of HOMO Data")
plt.xlabel('V (eV)')
plt.ylabel('N (count)')
plt.show()


### **Solution: Exploring Covariance**

In [None]:
# Calculate the covariance using numpy
cov_matrix = np.cov(homo, lumo)
covariance = cov_matrix[0,1]

# Create a scatter plot for visualization
plt.scatter(homo, lumo)
plt.title(f"Covariance: {covariance}", color = 'blue', fontsize = 12)
plt.suptitle("Scatter Plot - HOMO versus LUMO")
plt.xlabel('HOMO (eV)')
plt.ylabel('LUMO (eV)')
plt.show()

In [None]:
std_homo = np.std(homo, ddof=1)  # ddof=1 for unbiased estimator
std_lumo = np.std(lumo, ddof=1)

# Calculate correlation (r) using the formula
correlation = covariance / (std_homo * std_lumo)
print("Correlation: ", correlation)

> Learn more about **`ddof`** syntax, type, and use cases at **NumPy's** **[documentation](https://numpy.org/doc/stable/reference/generated/numpy.std.html)**.

In [None]:
import seaborn as sns

# Compute the covariance matrix and visualize it
covariance_matrix = np.cov(homo, lumo)
sns.heatmap(covariance_matrix, annot=True)

### **Solution - Extra Styles**

In [None]:
# This can be ammended for other columns
col1 = 'lumo'
col2 = 'mu'

# Extract data for the columns
x = data[col1].values
y = data[col2].values

# Create the figure and axes
fig, ax = plt.subplots(figsize=(8, 8))

# Place the histograms inside the main plot using inset_axes
ax_histx = ax.inset_axes([0, 1.05, 1, 0.2])
ax_histy = ax.inset_axes([1.05, 0, 0.2, 1])

# Scatter plot
ax.scatter(x, y)

# Histogram on the attached axes
ax_histx.hist(x, bins=50)
ax_histy.hist(y, bins=50, orientation='horizontal')

# Turn off tick labels on histograms (optional)
ax_histx.tick_params(axis="x", labelbottom=False)
ax_histy.tick_params(axis="y", labelleft=False)

# Labels and titles
ax.set_xlabel(col1, fontsize = 12)
ax.set_ylabel(col2, fontsize = 12)
ax.set_title(f"Scatter plot with Histograms: {col1} vs {col2}", fontsize = 18)

plt.show()

### **Solution - Extra Styles Interactive**

In [None]:
# List available numerical columns
numerical_cols = data.select_dtypes(include=[float]).columns.tolist()

# Ask user for the columns to plot
print("Available numerical columns: ", numerical_cols)
col1 = input("Select the first column: ")
col2 = input("Select the second column: ")

if col1 in numerical_cols and col2 in numerical_cols:
    # Extract data for the chosen columns
    x = data[col1].values
    y = data[col2].values

    # Create the figure and axes
    fig, ax = plt.subplots(figsize=(8, 8))
    ax_histx = ax.inset_axes([0, 1.05, 1, 0.2])
    ax_histy = ax.inset_axes([1.05, 0, 0.2, 1])

    # Scatter plot
    ax.scatter(x, y)

    # Histogram on the attached axes
    ax_histx.hist(x, bins=50)
    ax_histy.hist(y, bins=50, orientation='horizontal')

    # Turn off tick labels on histograms
    ax_histx.tick_params(axis="x", labelbottom=False)
    ax_histy.tick_params(axis="y", labelleft=False)

    # Labels and titles
    ax.set_xlabel(col1, fontsize = 12)
    ax.set_ylabel(col2, fontsize = 12)
    ax.set_title(f"Scatter plot with Histograms: {col1} vs {col2}", fontsize = 18)

    plt.show()
else:
    print("One or both of the columns you selected are not in the list of numerical columns. Please try again.")
