# AI applied to battery manufacturing

----------------------------------------
This notebook provides necessary resources for the application of _supervised_ Machine Learning regression approaches for the training session. You will be free to play with the different functions based on the examples found within the **training-example.ipynb** notebook. The idea is to familiarize yourself with the python functions, the training and testing process and the Bayesian Optimization framework for hyperparameters tuning.

----------------------------------------

Here is the name of the file to load which contains the new manufacturing data: _forcefields-dataset.csv_<br>
- the input variables are related to the Force Fields parameters we apply to parametrize a CGMD model: $(X_1, X_2, ..., X_8)$.
- the ouput variables are related to the electrode properties being the viscosity and the density: $(Y_1, Y_2, Y_3)$.

Another dataset is available: _hybrid-dataset.csv_. This second dataset contains resources on the calculations electrode mesostructures properties associated to manufacturing parameters.</br><br>
If you wish to use your own dataset, feel free to discuss with moderators to recieve feedbacks and help.

## Python libraries

All the libraries below are required for you to make use necessary functions, and make trials during the training session. Please refer to the GitHub for Python installations in the case you are not able to load properly the libraries.

In [20]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import LinearRegression
from skopt.space import Real, Integer
from IPython.display import Image
import matplotlib.pyplot as plt
import ipywidgets as widgets
from skopt import Optimizer
import pandas as pd
import numpy as np
import warnings
import random
import time

warnings.filterwarnings("ignore")

We set the dataset into the variable $X$. This variables contains the new input variables and output variables.

In [47]:
X = pd.read_csv("../resources/forcefields-dataset.csv", sep=",")
X.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2,Y3
0,0.10294,1.0,0.011552,6.2,0.005,3.5,10.9426,0.001962,3.14482,21.3215,4.0303
1,0.001,0.94,0.0001,6.2,0.005,3.5,11.7288,0.002848,3.28428,13.8008,1.94162
2,0.001,0.82,0.0001,6.2,0.005,3.5,12.424,0.005762,3.28808,12.0263,1.30092
3,0.001,0.7,0.0001,6.2,0.005,3.5,10.2735,0.001374,3.28866,11.152,1.39649
4,0.001,0.82,0.0001,6.2,0.005,3.5,15.0,0.006223,3.28805,10.0798,1.14284


Inside the dataset, there more variables related to the parametrization of manufacturing models:
- 8 inputs variables.
- 3 output variables.

In [48]:
X.columns.tolist()[-3:]

['Y1', 'Y2', 'Y3']

In [1]:
for col in X.columns[:-3]:  
    
    fig=plt.figure(figsize=(8,5))
    ax = plt.gca()
    ax.set_facecolor('whitesmoke')
    plt.hist(X[col], color="mediumblue", density=False, bins=70, alpha=0.7)
    plt.ylabel("Counts", fontsize=20)
    plt.xlabel(col, fontweight="bold", fontsize=13)
    plt.yticks(fontweight="bold", fontsize=13)
    plt.xticks(fontsize=13, fontweight="bold")
    plt.show()

## Example n°1: train the regression model

You are welcome to train your own regression model with the new available dataset. There are different possiblities:
- apply the GBR model to train 1 output, or 3 outputs (find the right way to use it).
- apply another regression model based on the sugestions.
<br><br>

For each fitting:
- display the size of the different training/testing dataset for inputs/outputs.
- display the different validation metrics you used.
- write a code to draw a figure showing the good capabilities of the model.

##### use the Gradient Boosting Regressor with 1(3) output(s)

try to fit only 1 output.

In [32]:
### your code

###

In [None]:
fig=plt.figure(figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('whitesmoke')

### your code for the plot

###

plot the evolution of the validation metrics as a function of the training size.

In [None]:
### your code for the plot

###

here is the function to train for more than 2 outputs: _MultiOutputRegressor()_

In [None]:
parameters = {'estimator__learning_rate': [0.01,0.02,0.03,0.04],
              'estimator__subsample'    : [0.9, 0.5, 0.2, 0.1],
              'estimator__n_estimators' : [100,500,1000, 1500],
              'estimator__max_depth'    : [4,6,8,10]
             }

In [33]:
### your code

###

In [None]:
fig=plt.figure(figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('whitesmoke')

### your code for the plot

###

##### use another regression model

Select another type of regression model in order to fit the outputs as a function of inputs variables. Here find a short example of possibilities: AdaBoostRegressor / KNeighborsRegressor / ...

In [35]:
### your code

###

In [None]:
fig=plt.figure(figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('whitesmoke')

### your code for the plot

###

## Example n°2: train the regression model with hyperparameters tuning with Bayesian Optimization

You are welcome to train your own regression model with the new available dataset, using the Bayesian Optimization framework within Python. Instead of 1 output fitted within the previous notebook, please apply the methodology for the 3 availbale outputs.
<br><br>
For the final best model:
- display the size of the different training/testing dataset for inputs/outputs.
- display the different validation metrics you used.
- write a code to draw a figure showing the good capabilities of the model.

In [None]:
### your code
# variable "best_estimator" must contain the optimal trained GBR 

###

## Example n°3: train the regression model based on the previous factorial analysis

You are free to modify the number of inputs variables to apply a new training/testing process, an choose another regression model. The idea is to show that it is possible to obtain high predictive capabilities by selecting a fewer amount of input variables.

In [46]:
### your code

###

## Example n°4: train the regression model based on features importance

This example will make a comparison between the selection based on your previous factorial analysis, and the features selection based on features importance. The latter is a tool to check the importance of a input variable on the building of trees withn the GBR model. <br>
The functions are provided below, and the comparisons will be done after exchanges between students.

In [None]:
### your code
# variable "yourmodel" contains the fitted regression model you have selected above.

###

In [51]:
plt.figure(figsize=(12, 7))
ax = plt.gca()
ax.set_facecolor('whitesmoke')
plt.bar([0,1,2,3,4,5,6,7], mod.feature_importances_,
        0.5, label='FI', color='red')
plt.xticks(range(8), labels=XTrain.columns, fontsize=11, fontweight="bold")
plt.yticks(fontsize=11, fontweight="bold")
plt.xlabel("Input variables", fontsize=15, fontweight="bold", labelpad=15)
_=plt.ylabel("Features importance", fontsize=15, fontweight="bold", labelpad=15)