## Notes:
To open on Google Colab\
https://colab.research.google.com/github/RodrigoAVargasHdz/CHEM-4PB3/blob/main/Course_Notes/Week2/intro_matplotlib.ipynb

# Introduction to Matplotlib

Matplotlib is a 2D graphics package used for Python for application development, interactive scripting,and publication-quality image generation across user interfaces and operating systems.

(**paper**) [Computing in Science & Engineering 9, 90, (2007)](10.1109/MCSE.2007.55)

[**Matplotlib website**](https://matplotlib.org/stable/gallery/index)

# Load data 
[**QM9 data paper**]([10.1088/1367-2630/15/9/095003](https://iopscience.iop.org/article/10.1088/1367-2630/15/9/095003))

## mini intro to Pandas

[Pandas](https://pandas.pydata.org/docs/index.html) is a easy-to-use data structure library in Python.

In [None]:
import numpy as np
import pandas as pd
# D = pd.read_csv('../data/qm9.csv')

#read data
data_url = "https://github.com/RodrigoAVargasHdz/CHEM-4PB3/raw/main/Course_Notes/data/qm9.csv"
data = pd.read_csv(data_url)

#print data
print('Properties:')
print('------------')
for i,c in enumerate(data.columns):
    print(i,': ',c)

In [None]:
print(data.head)

In [None]:
# select single column
homo = np.array(data['homo'])  # extract homo
lumo = data.lumo.to_numpy()  # extract lumo
print('HOMO ->', homo.shape)
print('LUMO ->', homo.shape)

# select multiple column
homo_and_lumo = np.array(data[['homo','lumo']])
print('HOMO & LUMO ->', homo_and_lumo.shape)

# Plotting

In [None]:
# load relevant libraries
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

## Scatter plot

In [None]:
%matplotlib inline
figure(figsize=(8, 6), dpi=80)

plt.scatter(homo,lumo)
plt.xlabel('HOMO',fontsize=15)
plt.ylabel('LUMO',fontsize=15)
# plt.show()

In [None]:
# select the molecule with the largest LUMO
print('LUMO = ', data['lumo'].max()) # only value
print('i0 = ',data['lumo'].idxmax()) # index of the data point with max LUMO
print(data.iloc[[183]])  # all info of the data point with max LUMO

## Correlation plot ```y vs x```
What are the components for a correlation plot?

(**your fist ML model!!**)\
quick recap of linear model,\
$y = m * x + b$\
$y = [m,b]^T [x,1]$
* m -> slope
* b -> y intercept


In [None]:
# -1 infers the size of the new dimension from the size of the input array.
homo_and_ones = np.column_stack((homo.reshape((-1, 1)),np.ones_like(homo))) #DISCUSS THIS!!
m_and_c = np.linalg.lstsq(homo_and_ones, lumo,rcond=None)[0]
m = m_and_c[0]
b = m_and_c[1]
print('m = ',m,'; b = ',b)

## Exercise
Using the function [```np.polyfit()```](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html) find the value of the linear model parameters.



In [None]:
# code here
print(np.polyfit(homo,lumo,deg=1))

In [None]:
%matplotlib inline
figure(figsize=(8, 6), dpi=80)
plt.scatter(homo,lumo)

x = np.linspace(np.min(homo),np.max(homo),100)
f = lambda x: m *x + b
# def f(x,m,b):
#     return m*x + b
y = f(x)
plt.plot(x,y, c='k')

plt.xlabel('HOMO',fontsize=15)
plt.ylabel('LUMO',fontsize=15)

## Exercise  
* Change the color and the symbols of the scatter plot
* Change the line-style of the regression model

## Histogram

In [None]:
%matplotlib inline
from matplotlib.pyplot import figure
figure(figsize=(8, 6), dpi=80)

plt.hist(homo,bins=20,density=True,)
plt.xlabel('HOMO',fontsize=15)

## Gaussian
${\cal N}(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp^{\frac{-(x-\mu)^2}{2\sigma^2}}$

<!-- ```python
def gaussian(x,mu,std):
    return (1./(std*np.sqrt(2.*np.pi)))*np.exp(-(x-mu)**2/(2.*std**2))
``` -->

In [None]:
#code here! use NumPy's functions
def gaussian(x,mu,std):
    return (1./(std*np.sqrt(2.*np.pi)))*np.exp(-(x-mu)**2/(2.*std**2))


In [None]:
%matplotlib inline
figure(figsize=(8, 6), dpi=80)

x = np.linspace(np.min(homo),np.max(homo),homo.shape[0])
y = gaussian(x,np.mean(homo),np.std(homo))
plt.hist(homo,bins=100,density=True)
plt.plot(x,y,label=R'${\cal N}$')
plt.vlines(np.mean(homo),np.min(y),np.max(y),colors='k',label=R'$\mu$')
plt.legend(fontsize=15)
plt.xlabel('HOMO')

# Extra styles (homework!)
For the other columns of the QM9 data set generate a "scatter plot with histograms" [template](https://matplotlib.org/stable/gallery/lines_bars_and_markers/scatter_hist.html#sphx-glr-gallery-lines-bars-and-markers-scatter-hist-py).


In [None]:
#code here

## Matplotlib other styles
Matplotlib has other different plot styles, for more reference consult [examples website](https://matplotlib.org/stable/gallery/index)