### DuPont Mixture Design - Inverse Modeling

## Alison Shapiro, Sean Farrington, and Peter Osazuwa

***Machine Learning Techniques Used:*** 

Linear Regression

Gaussian Process Regression

## Script for Dual Annealing Optimization

#### Utilize predictions from the Gaussian process regression

Start by remaking the Gaussian process regression code

# Gaussian Process Regression


In [1]:
# Suppress Warnings
import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)

import os
import sys
if not sys.warnoptions:
    warnings.simplefilter("ignore")
    os.environ["PYTHONWARNINGS"] = "ignore" 

### Import new packages for GPR

In [2]:
import pandas as pd
import numpy as np
from scipy.optimize import dual_annealing
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel as C, RBF, WhiteKernel
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_percentage_error as mape, r2_score
import matplotlib.pyplot as plt

### Import data

In [3]:
file = 'DATA/training_inputs.xlsx'

df = pd.read_excel(file)

design = ['Powdered Additive','Base Resin A','Base Resin B','Stabilizer','Temperature','Screw Speed (RPM)']
performance = ['Toughness (J/m2)','Modulus (GPa)']

X = df[design]
y = df[performance]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
X = pd.DataFrame(data=X,
                columns=design)
# print(X)

### Create model 

***Need two separate models for each output***

In [4]:
kernel = C() + RBF(length_scale=np.ones(X.shape[1]))

# Split to two models
reg_0 = GaussianProcessRegressor(kernel=kernel,random_state=1773).fit(X,y[performance[0]])
reg_1 = GaussianProcessRegressor(kernel=kernel,random_state=1773).fit(X,y[performance[1]])

y_pred_0 = reg_0.predict(X)
y_pred_1 = reg_1.predict(X)
y_pred_GPR = pd.DataFrame({performance[0]:y_pred_0,
                       performance[1]:y_pred_1
                       })

# Particle Swarm Routine

In this inverse model we will try to predict a data point from the testing set

The values for this datum is as follows:

- Powdered Additive: 0.32
- Base Resin A: 0.6
- Base Resin B: 0.05
- Stabilizer: 0.09
- Temperature: 406
- Screw Speed (RPM): 120


- Toughness (J/m2): 656
- Modulus (GPa): 5.1

In [5]:
import pyswarms as ps
from pyswarms.utils.plotters import (plot_cost_history, plot_contour, plot_surface)

X_true = np.array([0.32,0.6,0.05,0.09,406,120])
y_true = np.array([656,5.1])

def CostFuncSlack(x,a,b):
    """
    Input 'x' is an array of unscaled variables. For this function they are:
    
    x = ['Powdered Additive','Base Resin A','Base Resin B','Temperature','Screw Speed']
    
    Notice that 'Stabilizer' is removed from this array, this must be accounted for so 
    scaling is done properly.
    
    Use 'y_0_target' and 'y_1_target' to assign the desired performance
    
    The cost function uses the slack variable approach to reduce
    dimensionality and ensure the composition is real.
    
    The barrier term here is a large penalty associated with the first three components 
    being greater than 1
    
    """
    y_0_target = a # Target toughness (J/m2)
    y_1_target = b # Target Modulus (GPa)
    
    sum_noslack = x[0]+x[1]+x[2]
    
    slack = 1 - (sum_noslack)
    
    x_full = np.insert(x,3,slack) # Insert slack into the fourth position
    
    x_scaled = scaler.transform(x_full.reshape(1,-1))
    
    y_0,std_0 = reg_0.predict(x_scaled,return_std=True)
    y_1,std_1 = reg_1.predict(x_scaled,return_std=True)
    
    var_0 = std_0**2/y_0_target # Normalized variance
    var_1 = std_1**2/y_1_target # Normalized variance
    
    y_0_penalty = ((y_0-y_0_target)/y_0_target)**2
    y_1_penalty = ((y_1-y_1_target)/y_1_target)**2
    
    barrier = 0
    if sum_noslack >= 1:
        barrier = 1e6
            
    loss = y_0_penalty + y_1_penalty + var_0 + var_1 + barrier
    return loss

# Ten iterations of dual annealing for reproducibility

In [6]:
# Set-up Bounds
max_bound = [1,1,1,475,121]
min_bound = [0,0,0,380,79]

file = 'ANALYSIS/Iterations_DA.xlsx'
df = pd.DataFrame()
df.to_excel(file,index=False)

for i in range(10):
    print(f'Iteration {i}')
    
    # Perform optimization
    ret = dual_annealing(CostFuncSlack,
                        bounds=list(zip(min_bound,max_bound)),
                        args = (y_true[0],y_true[1]),
                        maxiter = 5_000)
    
    pos = ret.x
    Stabilizer = 1 - (pos[0]+pos[1]+pos[2])
    pos = np.insert(pos,3,Stabilizer)

    pos_scaled = scaler.transform(pos.reshape(1,-1))
    y_pred_0 = reg_0.predict(pos_scaled)
    y_pred_1 = reg_1.predict(pos_scaled)
    
    df1 = pd.read_excel(file)
    df2 = pd.DataFrame({'Cost':ret.fun,
                        design[0]:pos[0],
                        design[1]:pos[1],
                        design[2]:pos[2],
                        design[3]:pos[3],
                        design[4]:pos[4],
                        design[5]:pos[5],
                        performance[0]:y_pred_0,
                        performance[1]:y_pred_1
                        })
    
    df = pd.concat([df1,df2])
    df.to_excel(file,index=False)