The examples and data used in this notebook are taken from:
https://github.com/materialsvirtuallab/nano281


# Machine Learning Roadmap

The process of training a machine learning model involves several key steps:

1. **Data collection and preparation**: This involves gathering and organizing the data that will be used to train the model. This includes tasks such as cleaning the data, removing irrelevant features, handling missing values, and converting the data into a suitable format for the machine learning algorithm.

2. **Feature engineering**: This involves selecting and transforming the input variables (features) that will be used to predict the output variable. This can include tasks such as scaling or normalizing the data, creating new features, and selecting the most relevant features based on domain knowledge or statistical analysis.

3. **Model selection**: This involves selecting the appropriate machine learning algorithm for the specific problem and data at hand. This can involve experimenting with different algorithms and hyperparameters, or using pre-trained models for certain types of problems.

4. **Model training**: This involves using the selected algorithm and data to train the model by adjusting the model's parameters to minimize the error between the predicted output and the actual output. The training process involves iterating over the data multiple times, adjusting the model's parameters with each iteration to improve its accuracy.

5. **Model evaluation**: This involves assessing the performance of the trained model on a separate test dataset, to determine its accuracy and generalization ability. This can involve metrics such as accuracy, precision, recall, F1-score, or mean squared error, depending on the specific problem and type of model.

6. **Model deployment**: Once the model has been trained and evaluated, it can be deployed into a production environment for use in real-world applications. This can involve integrating the model into a larger system, building a user interface or API for interacting with the model, and monitoring its performance over time.

Overall, the process of training a machine learning model is iterative and often requires careful experimentation and testing to achieve optimal results. It involves a combination of domain knowledge, data analysis, and statistical techniques to build accurate and effective models for a wide range of applications.

It is common AI/ML knowledge that step 1 is the most important and time-consuming step (average ~80% of the total time doing AI/ML). Without a cleaned and reliable dataset, no amount of machine learning will be able to get accurate data. 

# Data Preparation & Feature Engineering



**Data Preparation:** Import data from the nano281 github page and filter the data to only keep what we want




In [None]:
#Getting the data

import pandas as pd

url = "https://raw.githubusercontent.com/materialsvirtuallab/nano281/master/labs/lab2/data2022.csv"
data = pd.read_csv(url, index_col=0, na_filter=False)
df = pd.DataFrame(data)
display(df.head())

print("There are " + '{:.1f}'.format(len(df)) + " materials in the data2022.csv.")

Unnamed: 0,task_id,formula,formation_energy_per_atom,e_above_hull,band_gap,has_bandstructure
297,mp-570140,AuBr,-0.125181,0.011579,1.9716,True
573,mp-1206190,ZnI6,0.387945,0.65247,0.1076,False
579,mp-1208424,TeBr4,-0.39006,0.0,2.5202,False
738,mp-570480,TcBr4,-0.404404,0.0,0.6025,True
880,mp-1064459,PBr,1.067789,1.327882,0.0,False


There are 681.0 materials in the data2022.csv.


In [None]:
#Filtering the data

temp = df.sort_values(by=['formation_energy_per_atom'], ascending=True) #rearrange by most negative formation energy first for every formula
df_lowest = temp.drop_duplicates(subset='formula', keep='first') #delete all duplicates but most negative formation energy per formula

df_neg = df_lowest[(df_lowest['formation_energy_per_atom'] <= 0)] #keeping only the formulae with negative formation energies

**Feature Engineering:** is the process of creating new features or transforming existing features in order to improve the performance of a machine learning model. One common technique for feature engineering is to use a design matrix, which is a matrix of predictor variables used in statistical modeling, particularly in regression analysis.

By manipulating the design matrix, we can create new features or transform existing ones to better capture the underlying patterns and relationships in the data. 

However, it is important to be careful when manipulating the design matrix, as it can also introduce issues such as multicollinearity, which can cause instability in the model and make it more difficult to interpret the results. Therefore, it is important to carefully consider the implications of any feature engineering decisions and to evaluate their impact on the model performance.

For this example, we will create the `average`, `min`, and `max` features for each of the row of data we have for each material. Open the data import link on your browser and check the data for yourself to understand better what we are trying to do. 

In [None]:
# Making the Design Matrix
# A design matrix is a matrix containing data about multiple characteristics of several individuals or objects. 
# Each row corresponds to an individual and each column to a characteristic.

url = 'https://raw.githubusercontent.com/materialsvirtuallab/nano281/master/labs/lab2/element_properties.csv'
data = pd.read_csv(url, index_col=0)
element_data = pd.DataFrame(data)

EP_mean = element_data.mean(skipna = True) #calculate the mean for each column ignoring all NaN
element_datanew = element_data.fillna(EP_mean) #fill in the NaN indexes with the average from their specific column

#!pip install pymatgen git+https://github.com/materialsproject/api.git@py37
#!pip install mp-api 
from pymatgen.core import Composition
import numpy as np

element_datanew["composition"] = [Composition(i).to_data_dict["unit_cell_composition"] for i in element_datanew["formula_pretty"]]

def composition_to_dict(c):
  if isinstance(c, dict):
    unit_cell_composition = c
  else:
    if isinstance(c, str):
      c = Composition(c)
      unit_cell_composition = c.to_data_dict["unit_cell_composition"]
  return unit_cell_composition

def compute_average_from_composition(c, prop):
  unit_cell_composition = composition_to_dict(c)
  res = 0
  total = 0
  for i, j in unit_cell_composition.items():
    res += element_data.loc[i, prop] * j
    total += j
  return res / total

def get_maxmin_properties(c, prop, mode="max"):
  if mode == "max":
    func = np.max
  elif mode == "min":
    func = np.min
  unit_cell_composition = composition_to_dict(c)
  res = func([element_data.loc[i, prop] for i in unit_cell_composition])
  return res

properties = element_data.columns
average_properties = []
max_properties = []
min_properties = []
for prop in properties:
  average_properties.append([compute_average_from_composition(i, prop) for i in element_datanew["composition"]])
  max_properties.append([get_maxmin_properties(i, prop, mode="max") for i in element_datanew["composition"]])
  min_properties.append([get_maxmin_properties(i, prop, mode="min") for i in element_datanew["composition"]])

average_properties = np.array(average_properties).T
max_properties = np.array(max_properties).T
min_properties = np.array(min_properties).T

average_properties = pd.DataFrame() 
min_properties = pd.DataFrame()
max_properties = pd.DataFrame()

design_matrix = np.concatenate([average_properties, max_properties, min_properties], axis=1) #combining the average, min, and max propeties 

column_names = ([f"Average {n}" for n in properties] + [f"Max {n}" for n in properties] + [f"Min {n}" for n in properties])

design_matrix = pd.DataFrame(design_matrix, columns=column_names)

ModuleNotFoundError: ignored

**Feature scaling** is an important step in feature engineering that involves transforming the numerical features in a dataset to a common scale. This can be necessary because many machine learning algorithms work better when the features are on a similar scale, and some algorithms can even fail to converge or produce biased results if the features are not scaled properly.

Scaling the features is typically done after creating the design matrix, as it involves transforming the numerical columns of the design matrix. The scaled features can then be included in the design matrix as predictor variables for machine learning algorithms.

In [None]:
#Scaling

#X is your materials data 
#Y is your design matrix 

from sklearn.preprocessing import StandardScaler

#Method 1 part a
scaler = StandardScaler()
scaler.fit(x)
means_ = scaler.mean_
stds_ = scaler.scale_
z = scaler.transform(x)

#Method 1 part b
from sklearn.decomposition import PCA

pca = PCA()
pca.fit(z)
z_pca = pca.transform(z)

#Method 2
scaler = StandardScaler()
scaled_data = scaler.fit_transform(X) 

#Method 3
norm_train_X = (train_X - train_X.mean())/train_X.std()
norm_test_X = (test_X - train_X.mean())/train_X.std()

for column in train_X.columns: 
    norm_train_X[column] = (train_X[column] - \
        train_X[column].mean()) / train_X[column].std()     

for column in test_X.columns: 
    norm_test_X[column] = (test_X[column] - \
        test_X[column].mean()) / test_X[column].std() 


#Linear Regression

**Regression** is a type of statistical analysis used to model the relationship between a dependent variable and one or more independent variables.

**Linear Regression:** This is the simplest and most widely used type of regression. It assumes a linear relationship between the dependent variable and the independent variables. Linear regression can be either simple (one independent variable) or multiple (more than one independent variable).

**How does k folds work?** High k => more computation time largely irrelevant, as you anyways want to calculate many models.
Usually, large k mean less (pessimistic) bias. If possible, I use a k that is a divisor of the sample size, or the size of the groups in the sample that should be stratified.

In [None]:
#Linear Regression
from sklearn import linear_model 
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.metrics import r2_score, mean_squared_error
import matplotlib.pyplot as plt

kfold = KFold(n_splits=5, shuffle=True, random_state=42) #42 because it is not too big or small.

mlr = linear_model.LinearRegression()
yhat_mlr = cross_val_predict(mlr, z, y, cv=kfold) #Assume you did test split, scaled, and what not already.
r2_mlr = r2_score(y, yhat_mlr)
mse_mlr = mean_squared_error(y, yhat_mlr)
label_mlr = "MLR: $R^2$ = %.3f, MSE = %.1f" % (r2_mlr, mse_mlr)

f, ax = plt.subplots(figsize=(8, 8))
plt.plot(y, yhat_mlr, "o", label=label_mlr)
plt.ylabel(r"$K_{predicted}$ (GPa)")
plt.xlabel(r"$K$ (GPa)")
plt.legend()
plt.xlim([0, 410])
plt.ylim([0, 410])
plt.plot([0, 410], [0, 410], "k--");

NameError: ignored

**Ridge Regression:** This is a type of regression that is used to overcome the problem of multicollinearity (high correlation) among the independent variables. Ridge regression adds a penalty term to the least squares objective function to control the size of the coefficients.

**GridSearchCV (Grid Search Cross-Validation)** is a technique in machine learning used for hyperparameter tuning, which refers to the process of selecting the best combination of hyperparameters for a machine learning model. It automates the process of trying out different combinations of hyperparameters for a model by creating a "grid" of all possible hyperparameter values to try. It then trains the model using each combination of hyperparameters and evaluates its performance using cross-validation.

In [None]:
#Ridge
from sklearn.linear_model import Ridge

alphas = np.linspace(0.0001, 0.005, 100)
model = Ridge(max_iter=100000)
gcv = GridSearchCV(model, param_grid={"alpha": alphas}, scoring="neg_mean_squared_error", cv=kfold)

gcv.fit(z, y)
alphas = gcv.cv_results_["param_alpha"]

alpha_optimal = gcv.best_params_["alpha"]
ridge_best = gcv.best_estimator_

pred_test_ridge = ridge_best.predict(test_scaled)

This next section repeats the above code but define a function to perform the GridSearchCV hyperparameter tuning. 

In [None]:
#More accurate best alpha
ridge = linear_model.Ridge(max_iter=100000)

from sklearn.model_selection import GridSearchCV 
import seaborn as sns
#Function imported from lecture to plot Gridsearch CV to visualize parameters
def plot_grid_search_results(gs, ylim=None):
    """
    Plots the results of GridSearchCV.

    Args:
        gs: A GridSearchCV object.
        ylim: Optional setting for y limits.
    """
    results = pd.DataFrame(gs.cv_results_)
    for c in results.columns:
        # Note that here we are working with just variations in one parameter.
        # So we can automatically find the name of that parameter.
        if c.startswith("param_"):
            x = c
            break
    fig, ax = plt.subplots(figsize=(16, 8))
    ax = sns.lineplot(x=x, y="mean_train_score", data=results)
    ax = sns.scatterplot(x=x, y="mean_train_score", data=results, marker="x")
    ax = sns.lineplot(x=x, y="mean_test_score", data=results)
    ax = sns.scatterplot(x=x, y="mean_test_score", data=results, marker="o")
    plt.xlabel(x)
    if ylim:
        plt.ylim(ylim)
    ax.legend(["Train", "Test"], loc=2)
#This uses gridsearch to find optimal parameters based on the best MAE
def find_best_params(x, y, model, params): 
    gs = GridSearchCV(
        model,

        param_grid=params,
        return_train_score=True,
        scoring="neg_mean_absolute_error",
        cv=kfold,
        n_jobs= coresToUse, )
    gs.fit(x,y)
    results = pd.DataFrame(gs.cv_results_)
    display("mean_test_score")
    print("The range MAE values is ", np.ptp(results['mean_test_score']), " with a standard deviation of ", np.std(results['mean_test_score']), " and an average of", np.average(results['mean_test_score']))
    best_result = results[results['rank_test_score']==results['rank_test_score'].min()]
    plot_grid_search_results(gs)
    return best_result

result = find_best_params(x,y,ridge, {"alpha": np.logspace(-10, 1, 30)})
print("For the best params ", dict(result['params']), " mae score is ", result['mean_test_score'] )

After obtaining the best parameters in Ridge, you can create a new Linear regression model with your best `alpha` parameter.

In [None]:
final_ridge = linear_model.Ridge(max_iter=100000, alpha= #best alpha)
yhat = cross_val_predict(final_ridge, z, y, cv=kfold)
mae = mean_absolute_error(y,yhat)
print("The mean absolute error is" , mae)

**Lasso Regression:** This is similar to ridge regression but uses a different penalty term to shrink the coefficients of the independent variables towards zero. Lasso regression is useful for variable selection and can be used to identify the most important predictors.

In [None]:
#Lasso 
from sklearn import GridSearchCV, Lasso
alphas = np.linspace(0.0001, 0.005, 100)
model = Lasso(max_iter=100000)
gcv = GridSearchCV(model, param_grid={"alpha": alphas}, scoring="neg_mean_squared_error", cv=kfold)

gcv.fit(z, y)
alphas = gcv.cv_results_["param_alpha"]

alpha_optimal = gcv.best_params_["alpha"]
lasso_best = gcv.best_estimator_

pred_test_lasso = lasso_best.predict(test_scaled)

To learn more about Lasso and Ridge, their differences, and when either shine more than the other, check out this link: https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression

# Tree-based Regression

**Tree-based regression** is a type of supervised learning algorithm that is used to predict a continuous output variable, given one or more input variables. It is a non-parametric method, which means that it makes no assumptions about the distribution of the data. Instead, it learns the relationships between the input variables and the output variable by constructing a tree-like model.

This means that often for tree-based regression models, performing feature scaling or normalizing is not necessary, unlike the above linear regression models when data scaling is more important. 

The tree-based regression algorithm works by recursively splitting the data into subsets, based on the values of the input variables. The splits are chosen to minimize the variance of the output variable within each subset. This means that the algorithm tries to find the values of the input variables that are most strongly associated with the output variable, and then splits the data based on those values.

In [None]:
#Decision Tree
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.tree import export_text

decision_treeR = DecisionTreeRegressor(ccp_alpha=0.0001, criterion='mse', random_state=2, max_depth=15, max_features=30, min_samples_leaf=1, min_samples_split=3, spliiter='best')
decision_treeR = decision_treeR.fit(norm_train_X, train_y['log10K'])

pred_DTR = decision_treeR.predict(norm_test_x) #Predictions on Testing data
print(decision_treeR.score(norm_test_X, test_y['log10K']))

In [None]:
#DT Gradiant Boosting

from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
from datetime import datetime
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_predict
import sklearn.metrics

def modelfit(alg, dtrain, predictors, performCV=True, printFeatureImportance=True, cv_folds=5):
    #Fit the algorithm on the data
    alg.fit(scaled_data,y)
        
    #Predict training set:
    dtrain_predictions = alg.predict(scaled_data)
    dtrain_predprob = alg.predict_proba(scaled_data)[:,1]
    
    #Perform cross-validation:
    if performCV:
        cv_score = cross_val_predict.cross_val_score(alg,scaled_data,y, cv=cv_folds, scoring='roc_auc')
    
    #Print model report:
    print ("\nModel Report")
    print ("Accuracy : %.4g" % metrics.accuracy_score(y.values, dtrain_predictions))
    print ("AUC Score (Train): %f" % metrics.roc_auc_score(y, dtrain_predprob))
    
    if performCV:
        print ("CV Score : Mean - %.7g | Std - %.7g | Min - %.7g | Max - %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))
        
    #Print Feature Importance:
    if printFeatureImportance:
        feat_imp = pd.Series(alg.feature_importances_, predictors).sort_values(ascending=False)
        feat_imp.plot(kind='bar', title='Feature Importances')
        plt.ylabel('Feature Importance Score')

#Choose all predictors except target & IDcols
param_test1 = {'n_estimators':range(20,81,10)}
gsearch1 = GridSearchCV(estimator = GradientBoostingRegressor(learning_rate=0.1, min_samples_split=500,min_samples_leaf=50,max_depth=8,max_features='sqrt',subsample=0.8,random_state=10),
                        param_grid = param_test1, scoring='roc_auc',n_jobs=-1, cv=11)
gsearch1.fit(scaled_data,y)

gsearch1.best_params_

In [None]:
param_test2 = {'max_depth':range(5,16,2), 'min_samples_split':range(200,1001,200)}
gsearch2 = GridSearchCV(estimator = GradientBoostingRegressor(learning_rate=0.1, n_estimators=60, max_features='sqrt', subsample=0.8, random_state=10), 
                        param_grid = param_test2, scoring='roc_auc',n_jobs=-1, cv=5)
gsearch2.fit(scaled_data,y)
gsearch2.best_params_

In [None]:
param_test3 = {'min_samples_split':range(100,2000,200), 'min_samples_leaf':range(10,71,10)}
gsearch3 = GridSearchCV(estimator = GradientBoostingRegressor(learning_rate=0.1, n_estimators=60,max_depth=9,max_features='sqrt', subsample=0.8, random_state=10), 
                                    param_grid = param_test3, scoring='roc_auc',n_jobs=-1, cv=5)
gsearch3.fit(scaled_data,y)
gsearch3.best_params_

In [None]:
param_test4 = {'max_features':range(2,20,2)}
gsearch4 = GridSearchCV(estimator = GradientBoostingRegressor(learning_rate=0.1, n_estimators=60,max_depth=9, min_samples_split=1200, min_samples_leaf=60, subsample=0.8, random_state=10),
                        param_grid = param_test4, scoring='roc_auc',n_jobs=-1, cv=5)
gsearch4.fit(scaled_data,y)
gsearch4.best_params_

In [None]:
param_test5 = {'subsample':[0.6,0.7,0.75,0.8,0.85,0.9]}
gsearch5 = GridSearchCV(estimator = GradientBoostingRegressor(learning_rate=0.1, n_estimators=20,max_depth=9,min_samples_split=1200, min_samples_leaf=60, subsample=0.8, random_state=10,max_features=7),
param_grid = param_test5, scoring='roc_auc',n_jobs=-1, cv=5)
gsearch5.fit(scaled_data,y)
gsearch5.best_params_

So I did this in part because google colab can't compute that much. The last time I did this in one go, it took over 4 hours. I simply stop it.

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
DTGBR = GradientBoostingRegressor(learning_rate=0.01, n_estimators=100, max_depth=9, min_samples_split=100, min_samples_leaf=10, subsample=0.8, random_state=42)
DTGBR = DTGBR.fit(scaled_data, y)

pred_DTGBR = DTGBR.predict(test_scaled)
from sklearn.metrics import mean_squared_error as MSE
rmse_test = MSE(sample_file['log10K_predicted'], df_DTGBR['log10K_predicted'])**(1/2)
print(rmse_test)

**Other Tree ML alogrithms: **

**DT Adaboost** -> https://towardsdatascience.com/understanding-adaboost-for-decision-tree-ff8f07d2851

**Extra Tree** is an ensemble machine learning algorithm that combines the predictions from many decision trees. -> https://machinelearningmastery.com/extra-trees-ensemble-with-python/

**Random Forest** Decision trees are also computationally expensive to train, carry a big risk of overfitting and tend to find local optima because they can’t go back after they have made a split. To address these weaknesses, we turn to random forest, which illustrates the power of combining many decision trees into one model. -> https://builtin.com/data-science/random-forest-python

**XGBoost** is an implementation of gradient boosted decision trees designed for speed and performance that is dominative competitive machine learning. -> https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/ 

In essence Decision Tree < Gradient = Adaboost < Extra Tree < Random Forest < XGBoost

# Neural Network

**A neural network** is a type of machine learning algorithm that is inspired by the structure and function of the human brain. It consists of multiple interconnected nodes, or neurons, that work together to process and learn from data. It is often refers to as the "newest and greatest" for AI/ML applications.

Neural networks have been used to solve a wide range of machine learning problems, including image classification, speech recognition, natural language processing, and game playing. They are particularly well-suited for problems with large amounts of data and complex patterns, where traditional machine learning algorithms may struggle to find meaningful relationships.

However, neural networks can be difficult to train and optimize, and may require large amounts of computational resources to achieve good performance. They also tend to be "black box" models, meaning that it can be difficult to interpret how they are making their predictions. As such, they require careful tuning and evaluation to ensure that they are accurately capturing the underlying relationships in the data.

In [None]:
#Most Basic NN
from sklearn.neural_network import MLPRegressor

nn = MLPRegressor(hidden_layer_sizes=(5,3), alpha=1e-6, max_iter=5000, learning_rate_init=0.001, activation='relu', batch_size=335) #These numbers will affect it. Find optimal ones
nn.fit(scaled_data, y)

pred_SKNN = nn.predict(test_scaled)

from sklearn.metrics import mean_squared_error as MSE
rmse_test = MSE(sample_file['log10K_predicted'], df_SKNN['log10K_predicted'])**(1/2)
print(rmse_test)

In [None]:
#Bayesian NN

#First, do the instal below because Google Colab doesn't have the package
#!pip install bayesian-optimization 

from keras.models import Sequential
from keras.layers import Dense, BatchNormalization, Dropout
from keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.wrappers.scikit_learn import KerasClassifier
from math import floor
from sklearn.metrics import make_scorer, accuracy_score
from bayes_opt import BayesianOptimization
from sklearn.model_selection import StratifiedKFold
from keras.layers import LeakyReLU
LeakyReLU = LeakyReLU(alpha=0.1)
import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

def nn_cl_bo(neurons, activation, optimizer, learning_rate,  batch_size, epochs ):
    optimizerL = ['SGD', 'Adam', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','SGD']
    optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
                 'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
                 'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
                 'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
    activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
                   'elu', 'exponential', LeakyReLU,'relu']
    neurons = round(neurons)
    activation = activationL[round(activation)]
    batch_size = round(batch_size)
    epochs = round(epochs)
    def nn_cl_fun():
        opt = Adam(lr = learning_rate)
        nn = Sequential()
        nn.add(Dense(neurons, input_dim=10, activation=activation))
        nn.add(Dense(neurons, activation=activation))
        nn.add(Dense(1, activation='sigmoid'))
        nn.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
        return nn
    es = EarlyStopping(monitor='accuracy', mode='max', verbose=0, patience=20)
    nn = KerasClassifier(build_fn=nn_cl_fun, epochs=epochs, batch_size=batch_size,
                         verbose=0)
    kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
    score = cross_val_score(nn, scaled_data, trainY, scoring=score_acc, cv=kfold, fit_params={'callbacks':[es]}).mean()
    return score

# Set paramaters
params_nn ={
    'neurons': (10, 100),
    'activation':(0, 9),
    'optimizer':(0,7),
    'learning_rate':(0.0001, 1),
    'batch_size':(200, 1000),
    'epochs':(20, 100),
}
# Run Bayesian Optimization
nn_bo = BayesianOptimization(nn_cl_bo, params_nn, random_state=111)
nn_bo.maximize(init_points=25, n_iter=4) 

#Gets the best parameters
params_nn_ = nn_bo.max['params']
activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
               'elu', 'exponential', LeakyReLU,'relu']
params_nn_['activation'] = activationL[round(params_nn_['activation'])]
params_nn_

#Run like normal

*Other NN ML alogrithms: *

**NN Gaussian Process** -> https://towardsdatascience.com/gaussian-process-models-7ebce1feb83d

**TensorFlow** allows you to fine tune the hyperparameters much more easily than sklearn and takes less time as well. It is for large datasets and object detection and need excellent functionality and high performance. -> https://www.tensorflow.org/datasets/keras_example 

**PyTorch** similar to TensorFlow, but is more friendly if coming from basic python. It is used more in the realm of AI.  -> https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html 

More complex NN requires knowing what **Keras** is. It is used for small datasets, rapid prototyping, and multiple back-end support. More here -> https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/ 

The differences between TensorFlow, PyTorch, and Keras -> https://www.simplilearn.com/keras-vs-tensorflow-vs-pytorch-article 

In terms of user friendliness: sklearn > PyTorch > Keras > TensorFlow.

In terms of speed: TensorFlow > PyTorch > Keras > sklearn

Research: PyTorch

Industry: TensorFlow and Keras