This electric vehicle (EV) battery life prediction app is designed to estimate the remaining life of an EV battery based on various factors that will be described a little later. This app will analyze historical data from the vehicle, taking into account variables that affect battery degradation over time. The main objective is to help users understand how long their EV battery might last before requiring replacement. This app will utilize a powerful method known as Artificial Neural Networks (ANN) to make accurate predictions based on data and trends, helping users make informed decisions about battery maintenance or replacement.

Artificial Neural Networks (ANN) are computational models inspired by the structure of the human brain. They consist of layers of interconnected nodes (also called neurons), which process input data and pass it through to other layers. The network is trained on a set of data by adjusting the weights between neurons to minimize the difference between the predicted and actual outputs. ANN can capture complex relationships within data, making it highly effective for tasks like pattern recognition, classification, and regression. In this case, ANN will learn from a wide range of data points to identify patterns in battery degradation and predict future battery life.

In the context of this EV battery life prediction app, ANN plays a crucial role in making sense of the data gathered from the vehicle’s performance. By training the ANN on data from multiple sources, such as charge cycles, temperature changes, and driving behavior, the network can identify the most significant factors contributing to battery degradation. It will then use this knowledge to predict how much life is left in the battery, offering users a reliable estimate. The ability of ANN to learn from diverse data sets makes it ideal for accurately predicting future battery performance, ensuring that the app provides users with actionable insights and recommendations for maintaining their EV’s battery.

The columns in the dataset are:

1) type : Indicates the type of test performed on the battery. Values include:

(a) discharge: This means that discharge test was performed on the battery.

- A discharge test measures how much energy (capacity) a battery can deliver when it's being drained under controlled conditions. A discharge test measures how much energy (capacity) a battery can deliver when it's being drained under controlled conditions.

- During this test, the battery is discharged at a specific rate. The amount of electrical charge (in Amp-hours) that the battery can deliver is measured. This measurement directly represents the battery's capacity.

- Only value of column 'Capacity' can be recorded in this test as this test measures the battery's energy storage capability, which can only be measured when the battery is actually delivering power ie being discharged.

(b) charge: This means that charge test was performed on the battery.

- A charge cycle is when the battery is being recharged by receiving electrical energy from a power source. During this process, the battery is receiving power rather than delivering it and the primary focus is on safely restoring the battery's energy. The charge cycle data is valuable for tracking battery health.

- None of the three parameters (Capacity, Re, Rct) can be measured during charging because capacity can only be measured during discharge when the battery is delivering power. Re and Rct measurements require specific impedance testing conditions that would interfere with the charging process. Also, the charging process itself introduces electrical noise and varying current flows that would make these measurements inaccurate.

(c) impedance: An impedance test measures the battery's internal resistance characteristics by applying small AC (alternating current) signals.

- During this test, Re (Ohmic Resistance) measures the battery's pure electrical resistance to current flow and Rct (Charge Transfer Resistance) measures the resistance associated with chemical reactions at the electrode interfaces.

- The impedance test is crucial for assessing the battery's internal health, detecting degradation in the battery's components, identifying potential failure mechanisms, and evaluating the battery's power capability.

- Only Re and Rct can be measured during impedance tests because these measurements require specific AC signal applications and frequency response analysis and the test must be performed in a steady state (not charging or discharging).

2) start_time : The timestamp when the test started, represented as [year, month, day, hour, minute, seconds]

3) lifetime_in_hours : Lifetime of the Electric Vehicle (EV) battery in hours.

4) battery_id : Unique identifier for each battery (e.g., B0047, B0045, B0046, B0043).

5) test_id : Sequential test number for each battery, starting from 0.

6) uid : Unique identifier for each test across all batteries.

7) filename : Name of the CSV file containing the detailed measurements for that specific test.

8) Capacity : Battery capacity measurement in Amp-hours (Ah), only recorded during discharge cycles.

9) Re : Ohmic resistance (in ohms), only recorded during impedance tests.

10) Rct : Charge transfer resistance (in ohms), only recorded during impedance tests.

In [1]:
# to start working on this project, we first need to import the necessary libraries

import pandas as pd # for dataframe creation, manipulation, and analysis

from tensorflow.keras.models import Sequential # import the Sequential model from Keras

from tensorflow.keras.layers import Dense, Dropout # import the Dense and Dropout layers from Keras

from sklearn.model_selection import train_test_split # import the train_test_split function from scikit-learn
# to split the data into training and testing sets

from sklearn.preprocessing import MinMaxScaler,LabelEncoder # import the MinMaxScaler and LabelEncoder functions from scikit-learn
# to scale numerical features to the range [0,1] and encode categorical features to numerical values respectively

In [2]:
df = pd.read_csv("metadata.csv") # create a dataframe out of the csv file

In [3]:
df.head(10) # checking if the dataframe was successfully created by looking at the first 10 rows

Unnamed: 0,type,start_time,lifetime_in_hours,battery_id,test_id,uid,filename,Capacity,Re,Rct
0,discharge,[2010. 7. 21. 15. 0. ...,9455,B0047,0,1,00001.csv,1.6743047446975208,,
1,impedance,[2010. 7. 21. 16. 53. ...,9384,B0047,1,2,00002.csv,,0.0560578334388809,0.2009701658445833
2,charge,[2010. 7. 21. 17. 25. ...,9672,B0047,2,3,00003.csv,,,
3,impedance,[2010 7 21 20 31 5],9616,B0047,3,4,00004.csv,,0.053191858509211,0.1647339991486473
4,discharge,[2.0100e+03 7.0000e+00 2.1000e+01 2.1000e+01 2...,9515,B0047,4,5,00005.csv,1.5243662105099025,,
5,charge,[2010. 7. 21. 22. 38. ...,9410,B0047,5,6,00006.csv,,,
6,discharge,[2.010e+03 7.000e+00 2.200e+01 1.000e+00 4.000...,9322,B0047,6,7,00007.csv,1.5080762969973425,,
7,charge,[2010. 7. 22. 3. 14. ...,8935,B0047,7,8,00008.csv,,,
8,discharge,[2010. 7. 22. 6. 16. ...,8167,B0047,8,9,00009.csv,1.4835577960067696,,
9,charge,[2010. 7. 22. 7. 50. ...,9578,B0047,9,10,00010.csv,,,


In [4]:
df = df.drop(columns = ['start_time','battery_id','test_id','uid','filename']) # remove unnecessary columns from the dataframe

In [5]:
df.head(10) # checking the dataframe after removing unnecessary columns

Unnamed: 0,type,lifetime_in_hours,Capacity,Re,Rct
0,discharge,9455,1.6743047446975208,,
1,impedance,9384,,0.0560578334388809,0.2009701658445833
2,charge,9672,,,
3,impedance,9616,,0.053191858509211,0.1647339991486473
4,discharge,9515,1.5243662105099025,,
5,charge,9410,,,
6,discharge,9322,1.5080762969973425,,
7,charge,8935,,,
8,discharge,8167,1.4835577960067696,,
9,charge,9578,,,


In [6]:
# values of columns 'Capacity' 'Re' and 'Rct' in the dataframe are in string form
# convert them to numerical form and leave missing values as they are by writing errors = 'coerce'

df['Re'] = pd.to_numeric(df['Re'], errors = 'coerce')
df['Rct'] = pd.to_numeric(df['Rct'], errors = 'coerce')
df['Capacity'] = pd.to_numeric(df['Capacity'], errors = 'coerce')

# replace missing values of columns 'Capacity' 'Re' and 'Rct' in the dataframe with mean of all values of the respective columns
# as machine learning models cannot handle missing values and mean allows keeping the distribution of the data intact

df['Re'] = df['Re'].fillna(df['Re'].mean())
df['Rct'] = df['Rct'].fillna(df['Rct'].mean())
df['Capacity'] = df['Capacity'].fillna(df['Capacity'].mean())

In [7]:
df.head(10) # checking if the dataframe was manipulated properly by checking it's first 10 rows

Unnamed: 0,type,lifetime_in_hours,Capacity,Re,Rct
0,discharge,9455,1.674305,-497650000000.0,1055903000000.0
1,impedance,9384,1.326543,0.05605783,0.2009702
2,charge,9672,1.326543,-497650000000.0,1055903000000.0
3,impedance,9616,1.326543,0.05319186,0.164734
4,discharge,9515,1.524366,-497650000000.0,1055903000000.0
5,charge,9410,1.326543,-497650000000.0,1055903000000.0
6,discharge,9322,1.508076,-497650000000.0,1055903000000.0
7,charge,8935,1.326543,-497650000000.0,1055903000000.0
8,discharge,8167,1.483558,-497650000000.0,1055903000000.0
9,charge,9578,1.326543,-497650000000.0,1055903000000.0


In [8]:
label_encoder = LabelEncoder() # create an object of the class LabelEncoder

df['type'] = label_encoder.fit_transform(df['type']) # use this object's fit_transform method to encode the values
# of the column 'type' into numerical values as machine learning models can work only on numerical values

In [9]:
df.head(10) # checking if the encoding was successful by checking the first 10 rows of the dataframe

Unnamed: 0,type,lifetime_in_hours,Capacity,Re,Rct
0,1,9455,1.674305,-497650000000.0,1055903000000.0
1,2,9384,1.326543,0.05605783,0.2009702
2,0,9672,1.326543,-497650000000.0,1055903000000.0
3,2,9616,1.326543,0.05319186,0.164734
4,1,9515,1.524366,-497650000000.0,1055903000000.0
5,0,9410,1.326543,-497650000000.0,1055903000000.0
6,1,9322,1.508076,-497650000000.0,1055903000000.0
7,0,8935,1.326543,-497650000000.0,1055903000000.0
8,1,8167,1.483558,-497650000000.0,1055903000000.0
9,0,9578,1.326543,-497650000000.0,1055903000000.0


As we see that 'charge', 'discharge' and 'impedance' that are the unique/distinct values of column 'type' are given numeric values 0 1 and 2 respectively.

Now that we have a dataframe containing only numeric values, we can use it for training our model.

In [None]:
X = df.drop(columns=['lifetime_in_hours']) # all columns except 'lifetime_in_hours' would be the input features
y = df['lifetime_in_hours'] # 'lifetime_in_hours' would be the target variable as this is what we want to predict

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) # split the data into 80% training and 20% testing sets

In [11]:
# we can check the dimensions of testing and training data
print(f'X_train shape: {X_train.shape}')
print(f'X_test shape: {X_test.shape}')

X_train shape: (6052, 4)
X_test shape: (1513, 4)


In [12]:
# now we have to normalize the input features

scaler = MinMaxScaler() # create an object of the MinMaxScaler class

# normalize training and testing parts of input features
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [13]:
# checking if the normalization was successful by checking the first 5 rows of the normalized dataframe
print(f'Scaled X_test: \n{X_test_scaled[:5]}')
print(f'Scaled X_train: \n{X_train_scaled[:5]}')

Scaled X_test: 
[[0.5        0.         0.         1.        ]
 [0.5        0.63823903 0.         1.        ]
 [0.         0.50245012 0.         1.        ]
 [0.5        0.38080339 0.         1.        ]
 [0.5        0.38936986 0.         1.        ]]
Scaled X_train: 
[[1.00000000e+00 5.02450122e-01 9.99999999e-01 1.98111482e-10]
 [5.00000000e-01 4.55998175e-01 0.00000000e+00 1.00000000e+00]
 [5.00000000e-01 3.60527766e-02 0.00000000e+00 1.00000000e+00]
 [0.00000000e+00 5.02450122e-01 0.00000000e+00 1.00000000e+00]
 [5.00000000e-01 5.35874070e-01 0.00000000e+00 1.00000000e+00]]


In [15]:
# now we have to create an ANN model that will use training and testing data

model = Sequential() # create an empty sequential model from keras

# add a dense layer with 64 units, relu activation function and input dimension of X_train_scaled.shape[1] ie 4 input features
model.add(Dense(units=64, activation='relu', input_shape=(X_train_scaled.shape[1],)))

model.add(Dropout(0.2)) # add a 20% dropout layer to prevent overfitting

model.add(Dense(units=32, activation='relu')) # add another dense layer with 32 units and relu activation function

model.add(Dropout(0.2)) # add another 20% dropout layer to prevent overfitting

model.add(Dense(units=1, activation='linear')) # add the output layer with 1 unit and linear activation function as this is a regression problemmodel.compile(optimizer='adam', loss='mean_squared_error')

model.compile(optimizer='adam', loss='mean_squared_error') # compile the model with adam optimizer and mean squared error loss function

In [16]:
model.summary() # check the summary of the model

In [17]:
# create a model from which input will be passed 150 times, and weight will be updated after 32 samples go through the model
history = model.fit(X_train_scaled, y_train, epochs=150, batch_size=32, validation_data = (X_test_scaled, y_test))

Epoch 1/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 81084800.0000 - val_loss: 79911864.0000
Epoch 2/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 77568304.0000 - val_loss: 76912072.0000
Epoch 3/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 61702336.0000 - val_loss: 98774216.0000
Epoch 4/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 33384252.0000 - val_loss: 210316272.0000
Epoch 5/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 11059217.0000 - val_loss: 551672832.0000
Epoch 6/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 3721417.7500 - val_loss: 1304221056.0000
Epoch 7/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 2841214.5000 - val_loss: 2376788224.0000
Epoch 8/150
[1m190/190[0m [32m━━━━━━━━━━━━━━━━━━

In [23]:
# now create a function that will predict battery lifetime of an EV using the model trained above

import numpy as np

def predict_battery_life(type_discharge, Capacity, Re, Rct, label_encoder, scaler, model):
    
    # encode the input features into numerical values
    type_discharge_encoded = label_encoder.transform([type_discharge])[0]
    
    # create an array of input features
    X_input = np.array([[type_discharge_encoded,Capacity, Re, Rct]])
    
    # scale the input features
    X_input_scaled = scaler.transform(X_input)
    
    # predict the battery life using the model trained
    predicted_battery_life = model.predict(X_input_scaled)
    
    # an array of predictions are returned, so we need to return the first prediction in the array
    return predicted_battery_life[0]

In [24]:
# checking the function for custom input

type_discharge = 'discharge'
Capacity = 1.674305
Re = -4.976500e+11
Rct = 1.055903e+12

predicted_battery_life = predict_battery_life(type_discharge, Capacity, Re, Rct, label_encoder, scaler, model)

print(f"Predicted Battery Life: {predicted_battery_life}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 82ms/step
Predicted Battery Life: [8832.261]




In [25]:
# save the model, scaler, and label encoder using pickle module

import pickle

with open('battery_life_model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

with open('scaler.pkl', 'wb') as scaler_file:
    pickle.dump(scaler, scaler_file)

with open('label_encoder.pkl', 'wb') as le_file:
    pickle.dump(label_encoder, le_file)



model.save("battery_life_model.h5")

