# Regression model for SOH prediction 

For a resource-constrained device like the ESP32, random forest regression is generally a better choice compared to RNN, because it is computationally lighter and easier to implement. 

 we will use the Random Forest regression model to predict the state of health (SOH) of a battery. 


In [30]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import  StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score


about the data:

In [31]:
df = pd.read_csv('data.csv')
display(df.head())
display(df.describe())
display(df.info())


Unnamed: 0,Voltage,Current,Battery Temperature (°C),Environmental Temperature (°C),Cycle Count,Battery Age (Days),Unnamed: 6,SOC (%),SOH (%)
0,3.2,0.5,25,22,10,30,,20,98
1,3.3,0.6,26,23,20,60,,30,97
2,3.4,0.58,26,23,25,75,,35,96
3,3.5,0.55,27,23,30,90,,40,96
4,3.6,0.6,27,24,35,105,,45,95


Unnamed: 0,Voltage,Current,Battery Temperature (°C),Environmental Temperature (°C),Cycle Count,Battery Age (Days),Unnamed: 6,SOC (%),SOH (%)
count,45.0,45.0,45.0,45.0,45.0,45.0,0.0,45.0,45.0
mean,3.960667,0.763778,30.444444,25.755556,92.488889,278.0,,67.822222,88.244444
std,0.297798,0.186344,3.539103,2.970733,42.976538,129.484994,,22.708918,5.69751
min,3.2,0.4,25.0,20.0,10.0,30.0,,20.0,77.0
25%,3.75,0.62,28.0,24.0,55.0,165.0,,50.0,84.0
50%,4.0,0.76,30.0,26.0,95.0,285.0,,70.0,89.0
75%,4.2,0.9,33.0,28.0,130.0,390.0,,90.0,93.0
max,4.4,1.1,38.0,32.0,158.0,480.0,,99.0,98.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 9 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Voltage                         45 non-null     float64
 1   Current                         45 non-null     float64
 2   Battery Temperature (°C)        45 non-null     int64  
 3   Environmental Temperature (°C)  45 non-null     int64  
 4   Cycle Count                     45 non-null     int64  
 5   Battery Age (Days)              45 non-null     int64  
 6   Unnamed: 6                      0 non-null      float64
 7   SOC (%)                         45 non-null     int64  
 8   SOH (%)                         45 non-null     int64  
dtypes: float64(3), int64(6)
memory usage: 3.3 KB


None

lets remove unnecessary columns and check the correlation between the columns.

In [32]:
df.drop(['Unnamed: 6'], axis=1, inplace=True)
df.dropna(inplace=True)
display(df.info())
display(df.columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 8 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Voltage                         45 non-null     float64
 1   Current                         45 non-null     float64
 2   Battery Temperature (°C)        45 non-null     int64  
 3   Environmental Temperature (°C)  45 non-null     int64  
 4   Cycle Count                     45 non-null     int64  
 5   Battery Age (Days)              45 non-null     int64  
 6   SOC (%)                         45 non-null     int64  
 7   SOH (%)                         45 non-null     int64  
dtypes: float64(2), int64(6)
memory usage: 2.9 KB


None

Index(['Voltage', 'Current', 'Battery Temperature (°C)',
       'Environmental Temperature (°C)', 'Cycle Count', 'Battery Age (Days)',
       'SOC (%)', 'SOH (%)'],
      dtype='object')

Data Splitting:

In [33]:
X = df.drop(['SOH (%)'], axis=1)
y = df['SOH (%)']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


standardisation:

In [34]:
SC = StandardScaler()
x_train_scaled= SC.fit_transform(x_train)
x_test_scaled = SC.transform(x_test)


Fitting the dataset and training the model:

In [35]:
regression = RandomForestRegressor(n_estimators=100, random_state=42)
model = regression.fit(x_train_scaled, y_train)
y_pred = model.predict(x_test_scaled)   

Model evaluation:
for regression models, evaluation metrics are different from classification models. We will use the following metrics to evaluate the model:
- Mean Squared Error
- R2 – Score

In [36]:
mean_squared_error(y_test, y_pred)
r2_score(y_test, y_pred)
print('Mean Squared Error:', mean_squared_error(y_test, y_pred))
print('R2 Score:', r2_score(y_test, y_pred))


Mean Squared Error: 0.26238888888888806
R2 Score: 0.9928051794177387


Lets not import this model using joblib

In [37]:
import joblib
joblib.dump(model, 'AURAmodel.pkl')

['AURAmodel.pkl']