## Charging Time Prediction for Battery Electric Vehicles (BEVs) using Time Series Methods

## Project Overview
### Battery Electric Vehicles (BEVs) play a significant role in reducing energy consumption and air pollution, offering a cleaner alternative to conventional internal combustion engine vehicles. Despite their advantages, BEVs face challenges related to limited driving range and prolonged charging durations. These issues contribute to range anxiety, which significantly affects user adoption and satisfaction. Predicting charging time accurately based on real-world data can empower drivers with better travel planning and reduce anxiety. This project aims to develop a predictive model that improves the estimation of BEV charging times by leveraging actual operational data from BEVs

In [71]:
## Important libraries to import for data analysis and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline


### Loading the data set 
### Basic data analysis
### Creating the copy of the dataset (df_copy)

In [72]:
df = pd.read_csv(r"../ACN_Data_Session/acndata_sessions.csv")
df_copy = df.copy() ## generated a copy of the original dataframe (df_copy)

In [73]:
df_copy.head(5)

Unnamed: 0,_id,clusterID,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,sessionID,siteID,spaceID,stationID,timezone,userID,userInput_WhPerMile,userInput_kWhRequested,userInput_milesRequested,userInput_minutesAvailable,userInput_modifiedAt,userInput_paymentRequired,userInput_requestedDeparture,userInput_userID
0,5bc90cb9f9af8b0d7fe77cd2,39,"Wed, 25 Apr 2018 11:08:04 GMT","Wed, 25 Apr 2018 13:20:10 GMT","Wed, 25 Apr 2018 13:21:10 GMT",7.932,2_39_78_362_2018-04-25 11:08:04.400812,2,CA-496,2-39-78-362,America/Los_Angeles,,,,,,,,,
1,5bc90cb9f9af8b0d7fe77cd3,39,"Wed, 25 Apr 2018 13:45:10 GMT","Thu, 26 Apr 2018 00:56:16 GMT","Wed, 25 Apr 2018 16:44:15 GMT",10.013,2_39_95_27_2018-04-25 13:45:09.617470,2,CA-319,2-39-95-27,America/Los_Angeles,,,,,,,,,
2,5bc90cb9f9af8b0d7fe77cd4,39,"Wed, 25 Apr 2018 13:45:50 GMT","Wed, 25 Apr 2018 23:04:45 GMT","Wed, 25 Apr 2018 14:51:44 GMT",5.257,2_39_79_380_2018-04-25 13:45:49.962001,2,CA-489,2-39-79-380,America/Los_Angeles,,,,,,,,,
3,5bc90cb9f9af8b0d7fe77cd5,39,"Wed, 25 Apr 2018 14:37:06 GMT","Wed, 25 Apr 2018 23:55:34 GMT","Wed, 25 Apr 2018 16:05:22 GMT",5.177,2_39_79_379_2018-04-25 14:37:06.460772,2,CA-327,2-39-79-379,America/Los_Angeles,,,,,,,,,
4,5bc90cb9f9af8b0d7fe77cd6,39,"Wed, 25 Apr 2018 14:40:34 GMT","Wed, 25 Apr 2018 23:03:12 GMT","Wed, 25 Apr 2018 17:40:30 GMT",10.119,2_39_79_381_2018-04-25 14:40:33.638896,2,CA-490,2-39-79-381,America/Los_Angeles,,,,,,,,,


In [74]:
df_copy.tail(10)

Unnamed: 0,_id,clusterID,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,sessionID,siteID,spaceID,stationID,timezone,userID,userInput_WhPerMile,userInput_kWhRequested,userInput_milesRequested,userInput_minutesAvailable,userInput_modifiedAt,userInput_paymentRequired,userInput_requestedDeparture,userInput_userID
14274,5c2e89e7f9af8b13dab07967,39,"Wed, 28 Nov 2018 15:49:52 GMT","Thu, 29 Nov 2018 00:13:34 GMT","Wed, 28 Nov 2018 17:30:29 GMT",6.791,2_39_79_383_2018-11-28 15:49:52.423434,2,CA-492,2-39-79-383,America/Los_Angeles,1001.0,240.0,19.2,80.0,307.0,"Wed, 28 Nov 2018 15:50:04 GMT",True,"Wed, 28 Nov 2018 20:56:52 GMT",1001.0
14275,5c2e89e7f9af8b13dab07968,39,"Wed, 28 Nov 2018 16:12:43 GMT","Thu, 29 Nov 2018 01:07:03 GMT","Wed, 28 Nov 2018 21:26:06 GMT",13.04,2_39_78_367_2018-11-28 16:12:43.045712,2,CA-494,2-39-78-367,America/Los_Angeles,671.0,364.0,18.2,50.0,276.0,"Wed, 28 Nov 2018 16:12:54 GMT",True,"Wed, 28 Nov 2018 20:48:43 GMT",671.0
14276,5c2e89e7f9af8b13dab07969,39,"Wed, 28 Nov 2018 16:14:47 GMT","Thu, 29 Nov 2018 01:45:46 GMT","Wed, 28 Nov 2018 21:36:52 GMT",14.211,2_39_78_366_2018-11-28 16:14:47.025374,2,CA-323,2-39-78-366,America/Los_Angeles,22.0,350.0,17.5,50.0,557.0,"Wed, 28 Nov 2018 16:15:24 GMT",True,"Thu, 29 Nov 2018 01:31:47 GMT",22.0
14277,5c2e89e7f9af8b13dab0796a,39,"Wed, 28 Nov 2018 16:17:55 GMT","Wed, 28 Nov 2018 21:38:22 GMT","Wed, 28 Nov 2018 18:42:20 GMT",15.816,2_39_91_441_2018-11-28 16:17:55.157777,2,CA-499,2-39-91-441,America/Los_Angeles,234.0,250.0,20.0,80.0,522.0,"Wed, 28 Nov 2018 16:18:23 GMT",True,"Thu, 29 Nov 2018 00:59:55 GMT",234.0
14278,5c2e89e7f9af8b13dab0796b,39,"Wed, 28 Nov 2018 16:24:51 GMT","Wed, 28 Nov 2018 23:14:45 GMT","Wed, 28 Nov 2018 18:35:09 GMT",12.808,2_39_131_30_2018-11-28 16:24:51.171005,2,CA-305,2-39-131-30,America/Los_Angeles,1153.0,500.0,20.0,40.0,403.0,"Wed, 28 Nov 2018 16:24:57 GMT",True,"Wed, 28 Nov 2018 23:07:51 GMT",1153.0
14279,5c2e89e7f9af8b13dab0796c,39,"Wed, 28 Nov 2018 16:25:34 GMT","Wed, 28 Nov 2018 23:14:56 GMT","Wed, 28 Nov 2018 18:32:27 GMT",12.08,2_39_123_23_2018-11-28 16:25:33.851774,2,CA-313,2-39-123-23,America/Los_Angeles,68.0,200.0,14.0,70.0,389.0,"Wed, 28 Nov 2018 16:26:09 GMT",True,"Wed, 28 Nov 2018 22:54:34 GMT",68.0
14280,5c2e89e7f9af8b13dab0796d,39,"Wed, 28 Nov 2018 16:33:36 GMT","Wed, 28 Nov 2018 18:26:34 GMT","Wed, 28 Nov 2018 18:07:48 GMT",0.874,2_39_78_361_2018-11-28 16:33:35.686051,2,CA-493,2-39-78-361,America/Los_Angeles,,,,,,,,,
14281,5c2e89e7f9af8b13dab0796e,39,"Wed, 28 Nov 2018 16:35:56 GMT","Thu, 29 Nov 2018 01:32:58 GMT","Wed, 28 Nov 2018 19:43:06 GMT",13.07,2_39_139_28_2018-11-28 16:35:55.622452,2,CA-303,2-39-139-28,America/Los_Angeles,559.0,313.0,15.65,50.0,536.0,"Wed, 28 Nov 2018 16:36:27 GMT",True,"Thu, 29 Nov 2018 01:31:56 GMT",559.0
14282,5c2e89e7f9af8b13dab0796f,39,"Wed, 28 Nov 2018 16:38:38 GMT","Thu, 29 Nov 2018 01:39:18 GMT","Wed, 28 Nov 2018 18:48:01 GMT",13.844,2_39_89_25_2018-11-28 16:38:38.200344,2,CA-315,2-39-89-25,America/Los_Angeles,945.0,600.0,30.0,50.0,394.0,"Wed, 28 Nov 2018 16:38:50 GMT",True,"Wed, 28 Nov 2018 23:12:38 GMT",945.0
14283,5c2e89e7f9af8b13dab07970,39,"Wed, 28 Nov 2018 16:44:28 GMT","Thu, 29 Nov 2018 02:27:17 GMT","Wed, 28 Nov 2018 18:53:30 GMT",6.331,2_39_129_17_2018-11-28 16:44:27.728475,2,CA-307,2-39-129-17,America/Los_Angeles,712.0,400.0,8.0,20.0,509.0,"Wed, 28 Nov 2018 16:44:44 GMT",True,"Thu, 29 Nov 2018 01:13:28 GMT",712.0


In [75]:
df_copy.shape,df_copy.size,df_copy.columns

((14284, 20),
 285680,
 Index(['_id', 'clusterID', 'connectionTime', 'disconnectTime',
        'doneChargingTime', 'kWhDelivered', 'sessionID', 'siteID', 'spaceID',
        'stationID', 'timezone', 'userID', 'userInput_WhPerMile',
        'userInput_kWhRequested', 'userInput_milesRequested',
        'userInput_minutesAvailable', 'userInput_modifiedAt',
        'userInput_paymentRequired', 'userInput_requestedDeparture',
        'userInput_userID'],
       dtype='object'))

In [76]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14284 entries, 0 to 14283
Data columns (total 20 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   _id                           14284 non-null  object 
 1   clusterID                     14284 non-null  int64  
 2   connectionTime                14284 non-null  object 
 3   disconnectTime                14284 non-null  object 
 4   doneChargingTime              14278 non-null  object 
 5   kWhDelivered                  14284 non-null  float64
 6   sessionID                     14284 non-null  object 
 7   siteID                        14284 non-null  int64  
 8   spaceID                       14284 non-null  object 
 9   stationID                     14284 non-null  object 
 10  timezone                      14284 non-null  object 
 11  userID                        1626 non-null   float64
 12  userInput_WhPerMile           1626 non-null   float64
 13  u

In [77]:
df_copy.describe()

Unnamed: 0,clusterID,kWhDelivered,siteID,userID,userInput_WhPerMile,userInput_kWhRequested,userInput_milesRequested,userInput_minutesAvailable,userInput_userID
count,14284.0,14284.0,14284.0,1626.0,1626.0,1626.0,1626.0,1626.0,1626.0
mean,39.0,8.981434,2.0,554.303198,340.129766,22.829576,73.060271,337.647601,554.303198
std,0.0,6.922619,0.0,354.928857,98.9051,18.423266,61.504842,197.059946,354.928857
min,39.0,0.501,2.0,1.0,200.0,2.25,10.0,1.0,1.0
25%,39.0,4.03675,2.0,248.0,250.0,9.99,30.0,160.25,248.0
50%,39.0,7.4735,2.0,560.0,325.0,18.2,50.0,322.0,560.0
75%,39.0,13.204,2.0,754.75,400.0,28.0,100.0,515.75,754.75
max,39.0,69.373,2.0,1359.0,600.0,132.0,300.0,1078.0,1359.0


In [78]:
df_copy.shape

(14284, 20)

In [79]:
## Getting All Different Types OF Features
num_features = [feature for feature in df.columns if df[feature].dtype != 'O']
print('Num of Numerical Features :', len(num_features))
cat_features = [feature for feature in df.columns if df[feature].dtype == 'O']
print('Num of Categorical Features :', len(cat_features))
discrete_features=[feature for feature in num_features if len(df[feature].unique())<=25]
print('Num of Discrete Features :',len(discrete_features))
continuous_features=[feature for feature in num_features if feature not in discrete_features]
print('Num of Continuous Features :',len(continuous_features))
## Discrete Feature +continous Feature = Numerical feature 

Num of Numerical Features : 9
Num of Categorical Features : 11
Num of Discrete Features : 2
Num of Continuous Features : 7


In [80]:
## get all the numeric features
num_features = [feature for feature in df.columns if df[feature].dtype != 'O']
print('Num of Numerical Features :', len(num_features))

Num of Numerical Features : 9


In [81]:
## Discrete features but discrete features are also part of numerical features 
discrete_features=[feature for feature in num_features if len(df[feature].unique())<=25]
print('Num of Discrete Features :',len(discrete_features))

Num of Discrete Features : 2


In [82]:
discrete_features

['clusterID', 'siteID']

In [83]:
## Discrete features but discrete features are also part of numerical features 
continuous_features=[feature for feature in num_features if len(df[feature].unique())>25]
print('Num of  continuous Features :',len(continuous_features))

Num of  continuous Features : 7


In [84]:
## get all the numeric features
catagorical_features = [feature for feature in df.columns if df[feature].dtype == 'O']
print('Num of categorical Features :', len(catagorical_features))

Num of categorical Features : 11


In [85]:
categorical_features = df_copy.select_dtypes(include=['object']).columns.tolist()
numerical_features = df_copy.select_dtypes(include=['float64','int64']).columns.tolist()
yes_no_features = df_copy.select_dtypes(include=['bool']).columns.tolist()
print(df_copy.columns)
print(len(df_copy.columns))
print(categorical_features)
print(len(categorical_features))
print(numerical_features)
print(len(numerical_features))
print(yes_no_features)
print(len(yes_no_features))


Index(['_id', 'clusterID', 'connectionTime', 'disconnectTime',
       'doneChargingTime', 'kWhDelivered', 'sessionID', 'siteID', 'spaceID',
       'stationID', 'timezone', 'userID', 'userInput_WhPerMile',
       'userInput_kWhRequested', 'userInput_milesRequested',
       'userInput_minutesAvailable', 'userInput_modifiedAt',
       'userInput_paymentRequired', 'userInput_requestedDeparture',
       'userInput_userID'],
      dtype='object')
20
['_id', 'connectionTime', 'disconnectTime', 'doneChargingTime', 'sessionID', 'spaceID', 'stationID', 'timezone', 'userInput_modifiedAt', 'userInput_paymentRequired', 'userInput_requestedDeparture']
11
['clusterID', 'kWhDelivered', 'siteID', 'userID', 'userInput_WhPerMile', 'userInput_kWhRequested', 'userInput_milesRequested', 'userInput_minutesAvailable', 'userInput_userID']
9
[]
0


In [86]:
df_copy.isnull().mean()

_id                             0.000000
clusterID                       0.000000
connectionTime                  0.000000
disconnectTime                  0.000000
doneChargingTime                0.000420
kWhDelivered                    0.000000
sessionID                       0.000000
siteID                          0.000000
spaceID                         0.000000
stationID                       0.000000
timezone                        0.000000
userID                          0.886166
userInput_WhPerMile             0.886166
userInput_kWhRequested          0.886166
userInput_milesRequested        0.886166
userInput_minutesAvailable      0.886166
userInput_modifiedAt            0.886166
userInput_paymentRequired       0.886166
userInput_requestedDeparture    0.886166
userInput_userID                0.886166
dtype: float64

In [87]:
## null values are present or not 
df_copy.isnull().sum().sum()

113928

In [88]:
df_copy.isnull().sum()

_id                                 0
clusterID                           0
connectionTime                      0
disconnectTime                      0
doneChargingTime                    6
kWhDelivered                        0
sessionID                           0
siteID                              0
spaceID                             0
stationID                           0
timezone                            0
userID                          12658
userInput_WhPerMile             12658
userInput_kWhRequested          12658
userInput_milesRequested        12658
userInput_minutesAvailable      12658
userInput_modifiedAt            12658
userInput_paymentRequired       12658
userInput_requestedDeparture    12658
userInput_userID                12658
dtype: int64

In [89]:
## Duplicate values are present or not 
df[df.duplicated()]

Unnamed: 0,_id,clusterID,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,sessionID,siteID,spaceID,stationID,timezone,userID,userInput_WhPerMile,userInput_kWhRequested,userInput_milesRequested,userInput_minutesAvailable,userInput_modifiedAt,userInput_paymentRequired,userInput_requestedDeparture,userInput_userID


In [90]:
df[df.isnull()]

Unnamed: 0,_id,clusterID,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,sessionID,siteID,spaceID,stationID,timezone,userID,userInput_WhPerMile,userInput_kWhRequested,userInput_milesRequested,userInput_minutesAvailable,userInput_modifiedAt,userInput_paymentRequired,userInput_requestedDeparture,userInput_userID
0,,,,,,,,,,,,,,,,,,,,
1,,,,,,,,,,,,,,,,,,,,
2,,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14279,,,,,,,,,,,,,,,,,,,,
14280,,,,,,,,,,,,,,,,,,,,
14281,,,,,,,,,,,,,,,,,,,,
14282,,,,,,,,,,,,,,,,,,,,


In [91]:
null_rows = df_copy[df_copy['doneChargingTime'].isnull()]
print(len(null_rows))


6


In [92]:
df_copy = df_copy.dropna(subset=['doneChargingTime'])


In [93]:
df_copy.isnull().sum()

_id                                 0
clusterID                           0
connectionTime                      0
disconnectTime                      0
doneChargingTime                    0
kWhDelivered                        0
sessionID                           0
siteID                              0
spaceID                             0
stationID                           0
timezone                            0
userID                          12652
userInput_WhPerMile             12652
userInput_kWhRequested          12652
userInput_milesRequested        12652
userInput_minutesAvailable      12652
userInput_modifiedAt            12652
userInput_paymentRequired       12652
userInput_requestedDeparture    12652
userInput_userID                12652
dtype: int64

In [94]:
df_copy.shape

(14278, 20)

In [95]:
df_copy.columns

Index(['_id', 'clusterID', 'connectionTime', 'disconnectTime',
       'doneChargingTime', 'kWhDelivered', 'sessionID', 'siteID', 'spaceID',
       'stationID', 'timezone', 'userID', 'userInput_WhPerMile',
       'userInput_kWhRequested', 'userInput_milesRequested',
       'userInput_minutesAvailable', 'userInput_modifiedAt',
       'userInput_paymentRequired', 'userInput_requestedDeparture',
       'userInput_userID'],
      dtype='object')

In [96]:
df_copy.isnull().sum()

_id                                 0
clusterID                           0
connectionTime                      0
disconnectTime                      0
doneChargingTime                    0
kWhDelivered                        0
sessionID                           0
siteID                              0
spaceID                             0
stationID                           0
timezone                            0
userID                          12652
userInput_WhPerMile             12652
userInput_kWhRequested          12652
userInput_milesRequested        12652
userInput_minutesAvailable      12652
userInput_modifiedAt            12652
userInput_paymentRequired       12652
userInput_requestedDeparture    12652
userInput_userID                12652
dtype: int64

In [97]:
df_copy.drop(columns=['_id','clusterID','sessionID','siteID','spaceID','stationID','timezone','userID','userInput_WhPerMile','userInput_kWhRequested','userInput_milesRequested','userInput_minutesAvailable','userInput_modifiedAt','userInput_paymentRequired','userInput_requestedDeparture','userInput_userID'], inplace=True)

In [98]:
df_copy.head()

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered
0,"Wed, 25 Apr 2018 11:08:04 GMT","Wed, 25 Apr 2018 13:20:10 GMT","Wed, 25 Apr 2018 13:21:10 GMT",7.932
1,"Wed, 25 Apr 2018 13:45:10 GMT","Thu, 26 Apr 2018 00:56:16 GMT","Wed, 25 Apr 2018 16:44:15 GMT",10.013
2,"Wed, 25 Apr 2018 13:45:50 GMT","Wed, 25 Apr 2018 23:04:45 GMT","Wed, 25 Apr 2018 14:51:44 GMT",5.257
3,"Wed, 25 Apr 2018 14:37:06 GMT","Wed, 25 Apr 2018 23:55:34 GMT","Wed, 25 Apr 2018 16:05:22 GMT",5.177
4,"Wed, 25 Apr 2018 14:40:34 GMT","Wed, 25 Apr 2018 23:03:12 GMT","Wed, 25 Apr 2018 17:40:30 GMT",10.119


In [99]:
df_copy.isnull().sum().sum()

0

In [100]:
df_copy[df.duplicated()]

  df_copy[df.duplicated()]


Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered


In [101]:
df_copy.shape

(14278, 4)

In [102]:
df_copy

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered
0,"Wed, 25 Apr 2018 11:08:04 GMT","Wed, 25 Apr 2018 13:20:10 GMT","Wed, 25 Apr 2018 13:21:10 GMT",7.932
1,"Wed, 25 Apr 2018 13:45:10 GMT","Thu, 26 Apr 2018 00:56:16 GMT","Wed, 25 Apr 2018 16:44:15 GMT",10.013
2,"Wed, 25 Apr 2018 13:45:50 GMT","Wed, 25 Apr 2018 23:04:45 GMT","Wed, 25 Apr 2018 14:51:44 GMT",5.257
3,"Wed, 25 Apr 2018 14:37:06 GMT","Wed, 25 Apr 2018 23:55:34 GMT","Wed, 25 Apr 2018 16:05:22 GMT",5.177
4,"Wed, 25 Apr 2018 14:40:34 GMT","Wed, 25 Apr 2018 23:03:12 GMT","Wed, 25 Apr 2018 17:40:30 GMT",10.119
...,...,...,...,...
14279,"Wed, 28 Nov 2018 16:25:34 GMT","Wed, 28 Nov 2018 23:14:56 GMT","Wed, 28 Nov 2018 18:32:27 GMT",12.080
14280,"Wed, 28 Nov 2018 16:33:36 GMT","Wed, 28 Nov 2018 18:26:34 GMT","Wed, 28 Nov 2018 18:07:48 GMT",0.874
14281,"Wed, 28 Nov 2018 16:35:56 GMT","Thu, 29 Nov 2018 01:32:58 GMT","Wed, 28 Nov 2018 19:43:06 GMT",13.070
14282,"Wed, 28 Nov 2018 16:38:38 GMT","Thu, 29 Nov 2018 01:39:18 GMT","Wed, 28 Nov 2018 18:48:01 GMT",13.844


In [103]:
import pandas as pd

# Load and parse datetime columns
df_copy['connectionTime'] = pd.to_datetime(df['connectionTime'])
df_copy['disconnectTime'] = pd.to_datetime(df['disconnectTime'])
df_copy['doneChargingTime'] = pd.to_datetime(df['doneChargingTime'])

In [104]:
df_copy

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932
1,2018-04-25 13:45:10,2018-04-26 00:56:16,2018-04-25 16:44:15,10.013
2,2018-04-25 13:45:50,2018-04-25 23:04:45,2018-04-25 14:51:44,5.257
3,2018-04-25 14:37:06,2018-04-25 23:55:34,2018-04-25 16:05:22,5.177
4,2018-04-25 14:40:34,2018-04-25 23:03:12,2018-04-25 17:40:30,10.119
...,...,...,...,...
14279,2018-11-28 16:25:34,2018-11-28 23:14:56,2018-11-28 18:32:27,12.080
14280,2018-11-28 16:33:36,2018-11-28 18:26:34,2018-11-28 18:07:48,0.874
14281,2018-11-28 16:35:56,2018-11-29 01:32:58,2018-11-28 19:43:06,13.070
14282,2018-11-28 16:38:38,2018-11-29 01:39:18,2018-11-28 18:48:01,13.844


In [105]:
# 1. ⚡ Charging Duration (in minutes)
df_copy['effective_charging_duration_min'] = (df_copy['doneChargingTime'] - df_copy['connectionTime']).dt.total_seconds() / 60

In [106]:
df_copy.head(1)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1


In [107]:
# 2. ⌛ Idle Duration (post-charge, in minutes)
#df_copy['post_charge_duration_min'] = (df_copy['disconnectTime'] - df_copy['doneChargingTime']).dt.total_seconds() / 60

In [108]:
df_copy.head(2)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1
1,2018-04-25 13:45:10,2018-04-26 00:56:16,2018-04-25 16:44:15,10.013,179.083333


In [109]:
# 3. 🕓 Total Session Duration (from plug-in to unplug)
#df_copy['total_session_duration_min'] = (df_copy['disconnectTime'] - df_copy['connectionTime']).dt.total_seconds() / 60

In [110]:
df_copy.head(2)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1
1,2018-04-25 13:45:10,2018-04-26 00:56:16,2018-04-25 16:44:15,10.013,179.083333


In [111]:
# 4. 🧠 Efficiency: kWh delivered per minute of session
#df_copy['kWh_per_min_total_session'] = df_copy['kWhDelivered'] / df_copy['total_session_duration_min']

In [112]:
df_copy.head(1)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1


In [113]:
# 4. 🧠 Efficiency: kWh delivered per minute of charging duration
#df_copy['kWh_per_min_effective_charging'] = df_copy['kWhDelivered'] / df_copy['effective_charging_duration_min']

In [114]:
df_copy.head(2)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1
1,2018-04-25 13:45:10,2018-04-26 00:56:16,2018-04-25 16:44:15,10.013,179.083333


In [115]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14278 entries, 0 to 14283
Data columns (total 5 columns):
 #   Column                           Non-Null Count  Dtype         
---  ------                           --------------  -----         
 0   connectionTime                   14278 non-null  datetime64[ns]
 1   disconnectTime                   14278 non-null  datetime64[ns]
 2   doneChargingTime                 14278 non-null  datetime64[ns]
 3   kWhDelivered                     14278 non-null  float64       
 4   effective_charging_duration_min  14278 non-null  float64       
dtypes: datetime64[ns](3), float64(2)
memory usage: 669.3 KB


In [116]:
df_copy.head(1)

Unnamed: 0,connectionTime,disconnectTime,doneChargingTime,kWhDelivered,effective_charging_duration_min
0,2018-04-25 11:08:04,2018-04-25 13:20:10,2018-04-25 13:21:10,7.932,133.1


In [117]:
# connectionTime
df_copy['connectionTime_year'] = df_copy['connectionTime'].dt.year
df_copy['connectionTime_month'] = df_copy['connectionTime'].dt.month
df_copy['connectionTime_day'] = df_copy['connectionTime'].dt.day
df_copy['connectionTime_hour'] = df_copy['connectionTime'].dt.hour
df_copy['connectionTime_min'] = df_copy['connectionTime'].dt.minute
df_copy['connectionTime_sec'] = df_copy['connectionTime'].dt.second

# disconnectTime
df_copy['disconnectTime_year'] = df_copy['disconnectTime'].dt.year
df_copy['disconnectTime_month'] = df_copy['disconnectTime'].dt.month
df_copy['disconnectTime_day'] = df_copy['disconnectTime'].dt.day
df_copy['disconnectTime_hour'] = df_copy['disconnectTime'].dt.hour
df_copy['disconnectTime_min'] = df_copy['disconnectTime'].dt.minute
df_copy['disconnectTime_sec'] = df_copy['disconnectTime'].dt.second

# doneChargingTime
df_copy['doneChargingTime_year'] = df_copy['doneChargingTime'].dt.year
df_copy['doneChargingTime_month'] = df_copy['doneChargingTime'].dt.month
df_copy['doneChargingTime_day'] = df_copy['doneChargingTime'].dt.day
df_copy['doneChargingTime_hour'] = df_copy['doneChargingTime'].dt.hour
df_copy['doneChargingTime_min'] = df_copy['doneChargingTime'].dt.minute
df_copy['doneChargingTime_sec'] = df_copy['doneChargingTime'].dt.second


In [118]:
df_copy.drop(columns=['connectionTime', 'disconnectTime','doneChargingTime'], inplace=True)

In [119]:
df_copy

Unnamed: 0,kWhDelivered,effective_charging_duration_min,connectionTime_year,connectionTime_month,connectionTime_day,connectionTime_hour,connectionTime_min,connectionTime_sec,disconnectTime_year,disconnectTime_month,disconnectTime_day,disconnectTime_hour,disconnectTime_min,disconnectTime_sec,doneChargingTime_year,doneChargingTime_month,doneChargingTime_day,doneChargingTime_hour,doneChargingTime_min,doneChargingTime_sec
0,7.932,133.100000,2018,4,25,11,8,4,2018,4,25,13,20,10,2018,4,25,13,21,10
1,10.013,179.083333,2018,4,25,13,45,10,2018,4,26,0,56,16,2018,4,25,16,44,15
2,5.257,65.900000,2018,4,25,13,45,50,2018,4,25,23,4,45,2018,4,25,14,51,44
3,5.177,88.266667,2018,4,25,14,37,6,2018,4,25,23,55,34,2018,4,25,16,5,22
4,10.119,179.933333,2018,4,25,14,40,34,2018,4,25,23,3,12,2018,4,25,17,40,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14279,12.080,126.883333,2018,11,28,16,25,34,2018,11,28,23,14,56,2018,11,28,18,32,27
14280,0.874,94.200000,2018,11,28,16,33,36,2018,11,28,18,26,34,2018,11,28,18,7,48
14281,13.070,187.166667,2018,11,28,16,35,56,2018,11,29,1,32,58,2018,11,28,19,43,6
14282,13.844,129.383333,2018,11,28,16,38,38,2018,11,29,1,39,18,2018,11,28,18,48,1


In [120]:
X = df_copy.drop(['effective_charging_duration_min'],axis = 1)
## creating a copy of X (independent variables) as X_copy
y = df_copy[['effective_charging_duration_min']]

In [121]:
X.shape,y.shape

((14278, 19), (14278, 1))

In [122]:
##train test split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=42)

In [123]:
X_train.shape,y_train.shape,X_test.shape,y_test.shape

((11422, 19), (11422, 1), (2856, 19), (2856, 1))

In [124]:
X_train.columns

Index(['kWhDelivered', 'connectionTime_year', 'connectionTime_month',
       'connectionTime_day', 'connectionTime_hour', 'connectionTime_min',
       'connectionTime_sec', 'disconnectTime_year', 'disconnectTime_month',
       'disconnectTime_day', 'disconnectTime_hour', 'disconnectTime_min',
       'disconnectTime_sec', 'doneChargingTime_year', 'doneChargingTime_month',
       'doneChargingTime_day', 'doneChargingTime_hour', 'doneChargingTime_min',
       'doneChargingTime_sec'],
      dtype='object')

In [125]:
# Create Column Transformer with 3 types of transformers
num_features= ['kWhDelivered']

from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

numeric_transformer = StandardScaler()


preprocessor = ColumnTransformer(
    [
        ("StandardScaler", numeric_transformer, num_features)
    ],remainder='passthrough'
    
)
#  'connectionTime_year',
#        'connectionTime_month', 'connectionTime_day', 'connectionTime_hour',
#        'connectionTime_min', 'connectionTime_sec', 'disconnectTime_year',
#        'disconnectTime_month', 'disconnectTime_day', 'disconnectTime_hour',
#        'disconnectTime_min', 'disconnectTime_sec', 'doneChargingTime_year',
#        'doneChargingTime_month', 'doneChargingTime_day',
#        'doneChargingTime_hour', 'doneChargingTime_min',
#        'doneChargingTime_sec', 'post_charge_duration_min', 'total_session_duration_min', 'kWh_per_min_effective_charging'

In [126]:
import numpy as np

# Check for inf/-infprint("Inf values in train:\n", np.isinf(X_train[num_features]).sum())
print("Inf values in test:\n", np.isinf(X_test[num_features]).sum())

# Check for NaNs
print("NaNs in train:\n", X_train[num_features].isna().sum())
print("NaNs in test:\n", X_test[num_features].isna().sum())


Inf values in test:
 kWhDelivered    0
dtype: int64
NaNs in train:
 kWhDelivered    0
dtype: int64
NaNs in test:
 kWhDelivered    0
dtype: int64


In [127]:
# Replace infs with large finite numbers (temporarily)
X_train[num_features] = X_train[num_features].replace(np.inf, 1e6)
X_train[num_features] = X_train[num_features].replace(-np.inf, -1e6)

X_test[num_features] = X_test[num_features].replace(np.inf, 1e6)
X_test[num_features] = X_test[num_features].replace(-np.inf, -1e6)


In [128]:
X_train[num_features]=numeric_transformer.fit_transform(X_train[num_features]) ## using fit_transform here
X_test[num_features]=numeric_transformer.transform(X_test[num_features]) ## using only transform here
## summation of (data point - mean / standard deviaition)

In [129]:
X_train.tail(10)

Unnamed: 0,kWhDelivered,connectionTime_year,connectionTime_month,connectionTime_day,connectionTime_hour,connectionTime_min,connectionTime_sec,disconnectTime_year,disconnectTime_month,disconnectTime_day,disconnectTime_hour,disconnectTime_min,disconnectTime_sec,doneChargingTime_year,doneChargingTime_month,doneChargingTime_day,doneChargingTime_hour,doneChargingTime_min,doneChargingTime_sec
467,1.031279,2018,5,4,15,35,6,2018,5,5,0,25,10,2018,5,4,22,47,19
6266,0.620036,2018,8,5,18,48,37,2018,8,5,23,22,30,2018,8,5,21,31,50
5735,4.733757,2018,7,29,3,19,14,2018,7,29,15,19,16,2018,7,29,14,37,34
11288,-0.462649,2018,10,8,15,47,32,2018,10,8,23,37,9,2018,10,8,18,22,56
11970,-0.292206,2018,10,17,15,30,0,2018,10,17,22,2,50,2018,10,17,17,42,8
5192,4.436293,2018,7,21,5,34,3,2018,7,21,13,55,40,2018,7,21,12,14,57
13424,-0.339674,2018,11,7,16,25,27,2018,11,8,1,34,47,2018,11,7,18,22,20
5391,-0.972864,2018,7,24,16,22,57,2018,7,25,3,10,21,2018,7,24,17,34,40
861,-0.966535,2018,5,11,16,37,42,2018,5,12,1,18,8,2018,5,11,18,27,42
7271,0.299701,2018,8,18,19,21,54,2018,8,18,21,8,46,2018,8,18,21,8,22


In [130]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression, Ridge,Lasso
from sklearn.neighbors import KNeighborsRegressor
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [131]:
##Create a Function to Evaluate Model
def evaluate_model(true, predicted):
    mae = mean_absolute_error(true, predicted)
    mse = mean_squared_error(true, predicted)
    rmse = np.sqrt(mean_squared_error(true, predicted))
    r2_square = r2_score(true, predicted)
    return mae, rmse, r2_square

In [132]:
## Beginning Model Training
models = {
    "Linear Regression": LinearRegression(),
    "Lasso": Lasso(),
    "Ridge": Ridge(),
    "K-Neighbors Regressor": KNeighborsRegressor(),
    "Decision Tree": DecisionTreeRegressor(),
    "Random Forest Regressor": RandomForestRegressor(),
    "Adaboost Regressor":AdaBoostRegressor(),
    "Graident BoostRegressor":GradientBoostingRegressor(),
    "xgboost Regressor" : XGBRegressor()
   
}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train, y_train) # Train model

    # Make predictions
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    # Evaluate Train and Test dataset
    model_train_mae , model_train_rmse, model_train_r2 = evaluate_model(y_train, y_train_pred)

    model_test_mae , model_test_rmse, model_test_r2 = evaluate_model(y_test, y_test_pred)

    
    print(list(models.keys())[i])
    
    print('Model performance for Training set')
    print("- Root Mean Squared Error: {:.4f}".format(model_train_rmse))
    print("- Mean Absolute Error: {:.4f}".format(model_train_mae))
    print("- R2 Score: {:.4f}".format(model_train_r2))

    print('----------------------------------')
    
    print('Model performance for Test set')
    print("- Root Mean Squared Error: {:.4f}".format(model_test_rmse))
    print("- Mean Absolute Error: {:.4f}".format(model_test_mae))
    print("- R2 Score: {:.4f}".format(model_test_r2))
    
    print('='*35)
    print('\n')

Linear Regression
Model performance for Training set
- Root Mean Squared Error: 44.3968
- Mean Absolute Error: 8.7413
- R2 Score: 0.9485
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 32.2684
- Mean Absolute Error: 7.3846
- R2 Score: 0.9688


Lasso
Model performance for Training set
- Root Mean Squared Error: 169.1809
- Mean Absolute Error: 94.6020
- R2 Score: 0.2524
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 154.0108
- Mean Absolute Error: 94.6979
- R2 Score: 0.2901


Ridge
Model performance for Training set
- Root Mean Squared Error: 152.2786
- Mean Absolute Error: 86.5802
- R2 Score: 0.3943
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 139.9974
- Mean Absolute Error: 86.8991
- R2 Score: 0.4134


K-Neighbors Regressor
Model performance for Training set
- Root Mean Squared Error: 158.9452
- Mean Absolute Error: 98.3425
- R2 Score: 0.3401
------

  model.fit(X_train, y_train) # Train model


Random Forest Regressor
Model performance for Training set
- Root Mean Squared Error: 43.7498
- Mean Absolute Error: 10.0509
- R2 Score: 0.9500
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 85.5306
- Mean Absolute Error: 24.8986
- R2 Score: 0.7811




  y = column_or_1d(y, warn=True)


Adaboost Regressor
Model performance for Training set
- Root Mean Squared Error: 303.3486
- Mean Absolute Error: 275.6391
- R2 Score: -1.4037
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 303.7462
- Mean Absolute Error: 275.1258
- R2 Score: -1.7612




  y = column_or_1d(y, warn=True)


Graident BoostRegressor
Model performance for Training set
- Root Mean Squared Error: 100.0249
- Mean Absolute Error: 43.6930
- R2 Score: 0.7387
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 101.2033
- Mean Absolute Error: 46.5548
- R2 Score: 0.6935


xgboost Regressor
Model performance for Training set
- Root Mean Squared Error: 16.9627
- Mean Absolute Error: 11.3771
- R2 Score: 0.9925
----------------------------------
Model performance for Test set
- Root Mean Squared Error: 94.5757
- Mean Absolute Error: 27.8528
- R2 Score: 0.7323




In [133]:
df_copy.head(3)

Unnamed: 0,kWhDelivered,effective_charging_duration_min,connectionTime_year,connectionTime_month,connectionTime_day,connectionTime_hour,connectionTime_min,connectionTime_sec,disconnectTime_year,disconnectTime_month,disconnectTime_day,disconnectTime_hour,disconnectTime_min,disconnectTime_sec,doneChargingTime_year,doneChargingTime_month,doneChargingTime_day,doneChargingTime_hour,doneChargingTime_min,doneChargingTime_sec
0,7.932,133.1,2018,4,25,11,8,4,2018,4,25,13,20,10,2018,4,25,13,21,10
1,10.013,179.083333,2018,4,25,13,45,10,2018,4,26,0,56,16,2018,4,25,16,44,15
2,5.257,65.9,2018,4,25,13,45,50,2018,4,25,23,4,45,2018,4,25,14,51,44


In [135]:
df_copy.to_csv(r"../ACN_Data_Session/acndata_sessions_modified.csv", index=False)