# Lattice Physics 
The model encompasses 2 lattice physics parameters
1. **k -inf** : The infinite multiplication factor  
2. **PPPF** : Pin Power Peaking Factor


Both of these are modeled as functions of variations in fuel pin enrichments for the reactor. The model focuses on predicting the value of k-inf and PPPF values associated with enrichments. 

For further information on the reactor please refer - https://github.com/your-username/your-repo-name/blob/main/README.md

## Importing necessary libraries 
In the model the following library are used - 
1. Numpy
2. Scikit-learn
3. regression (custom linear regression library)

In [1]:
import numpy as np 
import pandas as pd 
from regression import regression_func
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import Ridge


The data set contains 24000 rows and 1 Column, each column has multiple features seperated by whitespace. Pandas library is used to make data more readable into 24K rows and 41 columns.
Since, it is not mentioned which columns correspond to which feature or which target variable the correct appraoch would be to actually find out mean, mode and median for all the columns because the normal values of k-inf lies in the range (0.8,1.5) while on the other hand the PPPF factor lies mainly in the range (1.5,2.2). The two columns which will satisfy these conditions will correspond to k-inf and PPPF target variable columns and rest will be the input features. 

In [2]:
df = pd.read_csv('data/raw.csv', sep=r'\s+')
df_test = pd.read_csv('data/test.csv', sep=r'\s+')

pd.set_option('display.max_columns', None) 
pd.set_option('display.width', None) 
pd.set_option('display.max_rows', None)
stats = pd.DataFrame({
    'Mean': df.mean(),
    'Median': df.median(),
    'Standard Deviation': df.std(),
    'Mode': df.mode().iloc[0],
    'Max' : df.max()
})
X = (0.9, 1.5)
Y = (1.5, 2.2)
filtered_columns_1 = stats[
    (stats['Mean'] >= X[0]) & 
    (stats['Median'] >= X[0]) & 
    (stats['Mode'] >= X[0]) &
    (stats['Mean'] <= X[1]) & 
    (stats['Median'] <= X[1]) & 
    (stats['Mode'] <= X[1])
]
filtered_columns_2 = stats[
    (stats['Mean'] >= Y[0]) & 
    (stats['Median'] >= Y[0]) & 
    (stats['Mode'] >= Y[0]) &
    (stats['Mean'] <= Y[1]) & 
    (stats['Median'] <= Y[1]) & 
    (stats['Mode'] <= Y[1])
]
print(filtered_columns_1)
print(filtered_columns_2)

                 Mean   Median  Standard Deviation    Mode      Max
1.326300000  1.325139  1.32608            0.017798  1.3222  1.38558
                Mean    Median  Standard Deviation      Mode       Max
1.862085698  1.88356  1.876338             0.12002  1.527497  2.473015


## Normalisation and Flattening
The dataset is now split into 3 seperate parts belonging to - 
1. **k-inf**
2. **PPPF**
3. **Input Features**

The unnormalised data varies between a greater range which might hamper effective learning of the model since convergence is reached in a much more stable fashion when data is normalised. 
The Normalisation technique used for this model is known as Z-Score Normalisation.

The normalisation technique works using the following: $$ X_{\text norm} = \frac{X - \mu}{\sigma} $$

where $$ \mu = \frac{\sum_{i=1}^n x_i}{n}, \quad \sigma = \sqrt{\frac{\sum_{i=1}^n (x_i - \mu)^2}{n}} $$

Since the data in numpy flows quite differently therefore all the (24000,1) column matrices are converted to arrays by reducing their dimensions from 2 to 1.

In [3]:
def normalisation(x):
    mu = np.mean(x,axis=0)
    sigma = np.std(x,axis=0)
    x_norm = (x-mu)/sigma
    return (x_norm)

y_train_k_inf = (df.iloc[:, :1]).to_numpy()
y_train_pppf = (df.iloc[:, 1:2]).to_numpy()
x_train = (df.iloc[:, 2:]).to_numpy()

x_train = (normalisation(x_train))
y_train_k_inf = (normalisation(y_train_k_inf))
y_train_pppf = (normalisation(y_train_pppf))


y_train_k_inf = y_train_k_inf.ravel() 
y_train_pppf = y_train_pppf.ravel()

y_test_k_inf = (df_test.iloc[:, :1]).to_numpy()
y_test_pppf = (df_test.iloc[:, 1:2]).to_numpy()
x_test = (df_test.iloc[:, 2:]).to_numpy()


x_test = (normalisation(x_test))
y_test_k_inf = (normalisation(y_test_k_inf))
y_test_pppf = (normalisation(y_test_pppf))


y_test_k_inf = y_test_k_inf.ravel() 
y_test_pppf = y_test_pppf.ravel()


### Linear Regression
The linear regression is used for training the model using the regression function which is a seperate script. The current linear regression works on the principle of reducing the cost function (squared error cost function given below), this is done automatically by an algorithm known as gradient descent.



In [4]:
w_k_inf,b_k_inf,cost_k_inf = regression_func(x_train, y_train_k_inf, 10000,0.1)
w_pppf,b_k_pppf,cost_pppf = regression_func(x_train, y_train_k_inf, 10000,0.1)

In [5]:
def predict(x, w, b):
    return np.dot(x, w) + b
y_pred_k_inf = predict(x_test, w_k_inf, b_k_inf)
y_pred_pppf = predict(x_test, w_pppf, b_k_pppf)

mse_k_inf = mean_squared_error(y_test_k_inf, y_pred_k_inf)
r2_k_inf = r2_score(y_test_k_inf, y_pred_k_inf)

mse_pppf = mean_squared_error(y_test_pppf, y_pred_pppf)
r2_pppf = r2_score(y_test_pppf, y_pred_pppf)


print(f"k_inf - Mean Squared Error (Cost): {mse_k_inf}, R_2 Score: {r2_k_inf}")
print(f"pppf - Mean Squared Error (Cost): {mse_pppf}, R_2 Score: {r2_pppf}")

k_inf - Mean Squared Error (Cost): 0.01837588317990346, R_2 Score: 0.9816241168200965
pppf - Mean Squared Error (Cost): 3.752469363008063, R_2 Score: -2.752469363008063


### Polynomial Regression
The linear regression used for training the model for the target variable PPPF has failed misearbly, this can only mean that data is incompatible with linear regression thus we have to switch the data for polynomial regression. The mean squared error with a polynomial regression of degree 3 reduces quite well, also The R_2 score remains at 0.60 meaning the model is able to predict the PPPF in a much better way than before. Natuarally the model can be improved drastically by using a neural network model to have even better R_2 scores.

In [14]:

poly = PolynomialFeatures(degree=3, include_bias=False)
x_polynomial = poly.fit_transform(x_train)


model = LinearRegression()
model.fit(x_train, y_train_pppf)

ridge = Ridge(alpha=1.0) 
ridge.fit(x_polynomial, y_train_pppf)

y_pred = model.predict(x_test)

mse = mean_squared_error(y_test_pppf, y_pred)
r2 = r2_score(y_test_pppf, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R^2 Score: {r2:.4f}")


Mean Squared Error: 0.3929
R^2 Score: 0.6071
