## Introduction
This notebook aims to replicate the studies that were successful in predicting the lattice constants of the Perovskite compound ($ABO_{3}$). It's important to identify the features neccessary to perform predictions. The lattice constants of the $ABO_{3}$ -type may be correlated compounds as a general function of nine variables as below

$Lattice constant = f(z_A + z_B + z_O + r_A + r_B + r_O + x_A + x_B + x_O)$

where z, r and x denote the valence, the radius and the electronegativity of the ions (A, B, O), respectively. It is noted that the three variables associated with anion $O^{2-}$ , namely $z_{O}, r_{O},$ and $x_{O}$ can be ignored as they remain unchanged for all of the samples.

The lattice constant can be reduced as a ﬁve-parameter function shown below.

$Lattice constant = f(z_A + r_A + r_B + x_A + x_B)$

There's a Goldschmidt’s tolerance factor (t), which can condense the atomic radii. It has been widely accepted as a criterion for the formation of the perovskite structure, up to now. I will use $\textbf{four features}$ including the tolerance factor and others are denoted by, $r_{A}, r_{B}$ and $\frac{r_{A}}{t}$. The list of feature does not include electronegativity and valence number. Basically, other sets of features that includes these two atomic configurations and they can be use to predict the lattice constant with great accuracy. However, I will only take the atomic radii in to my account. The tolerance factor (t) is expressed as below.

$t = \frac{r_A + r_O}{\sqrt{2}(r_B + r_O)}$

## Procedure
First a simple regression model will be built which imitates the ANN (Built from TensorFlow APIs). Then the ANN model will be built to perform a GridSearch. Finally the predictions will be performed on the model with fine tuned hyperparameters.

## Machine Learning Model
I will build an artificial neural network (Hereafter called as ANN). The loss function will be mean square error. "MSE" greatly penalised the outliers in predictions compared to mean absolute error and huber loss function. I will choose Adaptive optimizer over Stochastic gradient descent (SGD). The ANN has lots of hyperparameters to be tuned before I set out to predict the target variables. Besides Adam optimizer the other hyperparameters are number of neural nodes, number of hidden layers, learning rate (Adam optimizer), activation functions and so on.  The '.csv' provides $r_{A}$ and $r_{b}$. So the tolerance factor and $\frac{r_{A}}{t}$ needs to be calculated.

## Linear Regression model
Here I have built a simple LR model.

In [6]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

import numpy as np
import tensorflow as tf
from keras
import pandas as pd

lat_df = pd.read_csv('lat_const.csv', header=0, index_col = 0, usecols= ['compound','ra','rb','xa','xb','za','a','b','c'])
print (lat_df.head())

AttributeError: module 'keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'

In [None]:
lat_df = lat_df.assign(tolerance = lambda x:((x['ra'] + 1.4)/(np.sqrt(2)*(x['rb'] + 1.4))))
lat_df = lat_df.assign(rat = lambda x:(x['ra']/x['tolerance']))

print (lat_df.head())

features = np.array(lat_df[['ra', 'rb', 'tolerance', 'rat']], np.float32)
targets = np.array(lat_df[['a', 'b', 'c']], np.float32)

print ("\n")
print (features[:5,:])
print ("\n")
print (targets[:5,:])

In [None]:
# # W = tf.Variable(np.random.randn(), name="weight")
# # b = tf.Variable(np.random.randn(), name="bias")

# W = tf.Variable(np.array([[0.1, 0.1, 0.1]] *4), name = "W", shape=tf.TensorShape([4,3]))
# b = tf.Variable(np.array([[0.1, 0.1, 0.1]] *100), name = "b", shape=tf.TensorShape([100,3]))

# def linear_regression(intercept, slope):
#     return tf.add(tf.multiply(features,slope), intercept)

# def loss_function(intercept, slope):
#     predictions = linear_regression(intercept, slope)
#     return tf.keras.losses.mse(targets, predictions)

# opt = tf.keras.optimizers.Adam()

# for j in range(500):
#     opt.minimize(lambda: loss_function(b, W), var_list= [b, W])
#     print (loss_function(intercept, slope))
    
# print ("Intercept and Slope from the optimization")
# print (intercept.numpy(), slope.numpy())

## GridSearch
Grid search as the name suggests requires a grid made up by the hyperparamters. The alogorithm then runs the model over the grid and finds the best set of parameters for which the model perform best, i.e. low error (MSE, MAE, huber loss, etc in case of refression). The algorithm is heavy because it is performing a brute force approach. There are some techniques like Bayesian inference methods which are very quick in findig the hyperparameters. Alongside the GridSearch, I will set cross-validation option which results in accurate loss metrics.

The parameter grid for my models are 



In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, targets,
                                                    stratify=targets, 
                                                    test_size=0.2)

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[100]):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="relu"))
    model.add(keras.layers.Dense(3))
    optimizer = keras.optimizers.Adam(lr=learning_rate)
    model.compile(loss="mse", optimizer=optimizer)
    return model

keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)

from sklearn.model_selection import GridSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],
    "n_neurons": np.arange(1, 100),
    "learning_rate": [1/3e-4, 1/3e-2],
}

rnd_search_cv = GridSearchCV(keras_reg, param_distribs, cv=4)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])
