## Charging Time Prediction for Battery Electric Vehicles (BEVs) using Time Series Methods

## Project Overview
### Battery Electric Vehicles (BEVs) play a significant role in reducing energy consumption and air pollution, offering a cleaner alternative to conventional internal combustion engine vehicles. Despite their advantages, BEVs face challenges related to limited driving range and prolonged charging durations. These issues contribute to range anxiety, which significantly affects user adoption and satisfaction. Predicting charging time accurately based on real-world data can empower drivers with better travel planning and reduce anxiety. This project aims to develop a predictive model that improves the estimation of BEV charging times by leveraging actual operational data from BEVs

In [505]:
## Important libraries to import for data analysis and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# import warnings
# warnings.filterwarnings("ignore")
%matplotlib inline  

### Loading the data set 
### Basic data analysis
### Creating the copy of the dataset (df_copy)

In [506]:
df = pd.read_csv(r"../data/cadcs_live_project_clean_data.csv")
df_copy = df.copy() ## generated a copy of the original dataframe (df_copy)

In [507]:
df_copy.head(5)

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
0,IND_BEV_136287,MAH8558,USR_87015,01-01-2018 07:30,01-01-2018 08:09,38.366667,7.784,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,51.0,4.4,16.950283,1,1,False,False
1,IND_BEV_105860,TAT3386,USR_48693,01-01-2018 08:33,01-01-2018 10:58,145.0,12.672,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,95.0,4.4,16.950283,1,1,False,False
2,IND_BEV_145876,TVS7311,USR_15892,01-01-2018 09:48,01-01-2018 09:59,11.75,1.406,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Scheduled,6.0,3.8,12.408844,1,1,False,False
3,IND_BEV_150791,BAJ2976,USR_99201,01-01-2018 10:16,01-01-2018 10:21,5.0,0.642,BAJAJ,Urbanite,2.9,...,IS_17017_Compliant,BIS_Certified,Scheduled,81.0,4.7,12.408844,1,1,False,False
4,IND_BEV_121632,TAT8277,USR_31054,01-01-2018 11:44,01-01-2018 11:50,6.2,12.512,TATA,Tiago EV,24.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,13.0,4.1,16.950283,1,1,False,False


In [508]:
df.tail(10)

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
74933,IND_BEV_162189,TAT6169,USR_59191,31-12-2024 14:06,31-12-2024 14:19,12.866667,7.414,TATA,Tiago EV,24.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,13.0,3.7,16.950283,1,366,True,True
74934,IND_BEV_125373,HER4604,USR_74900,31-12-2024 14:35,31-12-2024 14:44,8.833333,0.547,HERO,Optima,1.8,...,IS_17017_Compliant,BIS_Certified,Scheduled,18.0,3.7,12.408844,1,366,True,True
74935,IND_BEV_171642,TAT6791,USR_53098,31-12-2024 14:52,31-12-2024 15:06,13.35,5.965,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,20.0,4.6,16.950283,1,366,True,True
74936,IND_BEV_161162,MAH5737,USR_74580,31-12-2024 16:43,31-12-2024 18:13,89.25,7.399,MAHINDRA,eXUV300,34.5,...,IS_17017_Compliant,BIS_Certified,Opportunistic,6.0,4.3,16.950283,1,366,True,True
74937,IND_BEV_151196,TAT2571,USR_12166,31-12-2024 17:01,31-12-2024 19:41,159.783333,15.099,TATA,Nexon EV,40.5,...,IS_17017_Compliant,BIS_Certified,Scheduled,68.0,4.2,16.950283,1,366,True,True
74938,IND_BEV_126265,MAH5891,USR_99873,31-12-2024 18:19,31-12-2024 20:22,123.133333,13.6,MAHINDRA,eVerito,21.2,...,IS_17017_Compliant,BIS_Certified,Opportunistic,106.0,4.3,16.950283,1,366,True,True
74939,IND_BEV_116010,TVS9578,USR_75798,31-12-2024 18:53,31-12-2024 19:06,12.616667,3.237,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Opportunistic,118.0,4.6,12.408844,1,366,True,True
74940,IND_BEV_166151,BAJ2386,USR_11068,31-12-2024 20:32,31-12-2024 20:37,5.366667,0.629,BAJAJ,Urbanite,2.9,...,IS_17017_Compliant,BIS_Certified,Scheduled,30.0,4.2,12.408844,1,366,True,True
74941,IND_BEV_174026,MAH6729,USR_98840,31-12-2024 21:48,31-12-2024 22:40,51.6,4.968,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,69.0,3.8,16.950283,1,366,True,True
74942,IND_BEV_125407,HER7711,USR_16544,31-12-2024 21:55,31-12-2024 22:13,17.666667,1.04,HERO,Optima,1.8,...,IS_17017_Compliant,BIS_Certified,Opportunistic,53.0,4.6,12.408844,1,366,True,True


In [509]:
df_copy.shape,df_copy.size,df_copy.columns

((74943, 74),
 5545782,
 Index(['session_id', 'vehicle_id', 'user_id', 'start_time', 'end_time',
        'charging_duration_min', 'energy_delivered_kwh', 'vehicle_make',
        'vehicle_model', 'battery_capacity_kwh', 'vehicle_age_years',
        'battery_health_index', 'user_type', 'income_bracket', 'city', 'state',
        'latitude', 'longitude', 'charging_station_id', 'station_operator',
        'charger_type', 'charger_power_kw', 'plug_type', 'initial_soc_percent',
        'final_soc_percent', 'soc_gained_percent', 'day_of_week', 'hour_of_day',
        'month', 'season', 'is_weekend', 'is_peak_hour', 'festival',
        'ambient_temperature_c', 'humidity_percent', 'weather_condition',
        'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
        'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
        'grid_load_mw', 'tariff_per_kwh_inr', 'total_cost_inr',
        'subsidy_amount_inr', 'payment_method', 'payment_success_rate',
        '

In [510]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74943 entries, 0 to 74942
Data columns (total 74 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   session_id                       74943 non-null  object 
 1   vehicle_id                       74943 non-null  object 
 2   user_id                          74943 non-null  object 
 3   start_time                       74943 non-null  object 
 4   end_time                         74943 non-null  object 
 5   charging_duration_min            74943 non-null  float64
 6   energy_delivered_kwh             74943 non-null  float64
 7   vehicle_make                     74943 non-null  object 
 8   vehicle_model                    74943 non-null  object 
 9   battery_capacity_kwh             74943 non-null  float64
 10  vehicle_age_years                74943 non-null  int64  
 11  battery_health_index             74943 non-null  float64
 12  user_type         

In [511]:
df_copy.describe()

Unnamed: 0,charging_duration_min,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,latitude,longitude,charger_power_kw,initial_soc_percent,final_soc_percent,...,session_success_rate,user_satisfaction_score,distance_to_station_km,range_remaining_km,next_destination_distance_km,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year
count,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,...,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0,74943.0
mean,62.271105,7.300049,18.934349,2.038643,0.944291,20.595023,77.624217,20.713981,31.216831,69.188503,...,0.965033,4.000306,2.468798,109.230059,14.872837,59.638399,4.248594,14.459985,26.481393,182.635363
std,87.426272,7.865827,18.975171,2.498549,0.053543,6.190951,4.113718,32.22908,12.419413,13.705514,...,0.01446,0.464787,2.392755,44.971961,14.896025,34.137732,0.434893,2.260093,15.046415,105.33096
min,5.0,0.405,1.8,0.0,0.65,9.911208,72.55143,3.3,10.0,30.0,...,0.94,3.2,0.0,15.1,0.0,0.0,3.5,12.408844,1.0,1.0
25%,7.483333,1.07,2.9,0.0,0.921,17.365092,75.767815,5.0,22.0,59.0,...,0.953,3.6,0.7,75.5,4.4,31.0,3.9,12.408844,13.0,92.0
50%,21.2,5.14,15.0,1.0,0.956,19.095024,77.190506,6.7,29.0,71.0,...,0.965,4.0,1.7,102.9,10.3,59.6384,4.2,12.408844,26.0,183.0
75%,86.983333,10.97,26.0,3.0,0.98,26.86542,80.250885,17.6,39.0,79.0,...,0.978,4.4,3.4,137.9,20.4,89.0,4.6,16.950283,39.0,274.0
max,719.733333,38.52448,95.0,26.0,1.02,28.633896,88.3839,150.0,59.0,100.0,...,0.99,4.8,11.6,226.4,193.6,119.0,5.0,16.950283,53.0,366.0


In [512]:
## Getting All Different Types OF Features
num_features = [feature for feature in df.columns if df[feature].dtype != 'O']
print('Num of Numerical Features :', len(num_features))
cat_features = [feature for feature in df.columns if df[feature].dtype == 'O']
print('Num of Categorical Features :', len(cat_features))
discrete_features=[feature for feature in num_features if len(df[feature].unique())<=25]
print('Num of Discrete Features :',len(discrete_features))
continuous_features=[feature for feature in num_features if feature not in discrete_features]
print('Num of Continuous Features :',len(continuous_features))
## Discrete Feature +continous Feature = Numerical feature 

Num of Numerical Features : 44
Num of Categorical Features : 30
Num of Discrete Features : 12
Num of Continuous Features : 32


In [513]:
## get all the numeric features
num_features = [feature for feature in df.columns if df[feature].dtype != 'O']
print('Num of Numerical Features :', len(num_features))

Num of Numerical Features : 44


In [514]:
## Discrete features but discrete features are also part of numerical features 
discrete_features=[feature for feature in num_features if len(df[feature].unique())<=25]
print('Num of Discrete Features :',len(discrete_features))

Num of Discrete Features : 12


In [515]:
discrete_features

['battery_capacity_kwh',
 'hour_of_day',
 'month',
 'is_weekend',
 'is_peak_hour',
 'grid_reliability_index',
 'subsidy_amount_inr',
 'user_satisfaction_score',
 'session_rating',
 'energy_efficiency_kwh_per_100km',
 'is_month_end',
 'is_quarter_end']

In [516]:
## Discrete features but discrete features are also part of numerical features 
continuous_features=[feature for feature in num_features if len(df[feature].unique())>25]
print('Num of  continuous Features :',len(continuous_features))

Num of  continuous Features : 32


In [517]:
## get all the numeric features
catagorical_features = [feature for feature in df.columns if df[feature].dtype == 'O']
print('Num of categorical Features :', len(catagorical_features))

Num of categorical Features : 30


In [518]:
categorical_features = df_copy.select_dtypes(include=['object']).columns.tolist()
numerical_features = df_copy.select_dtypes(include=['float64','int64']).columns.tolist()
yes_no_features = df_copy.select_dtypes(include=['bool']).columns.tolist()
print(df_copy.columns)
print(len(df_copy.columns))
print(categorical_features)
print(len(categorical_features))
print(numerical_features)
print(len(numerical_features))
print(yes_no_features)
print(len(yes_no_features))


Index(['session_id', 'vehicle_id', 'user_id', 'start_time', 'end_time',
       'charging_duration_min', 'energy_delivered_kwh', 'vehicle_make',
       'vehicle_model', 'battery_capacity_kwh', 'vehicle_age_years',
       'battery_health_index', 'user_type', 'income_bracket', 'city', 'state',
       'latitude', 'longitude', 'charging_station_id', 'station_operator',
       'charger_type', 'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'day_of_week', 'hour_of_day',
       'month', 'season', 'is_weekend', 'is_peak_hour', 'festival',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'tariff_per_kwh_inr', 'total_cost_inr',
       'subsidy_amount_inr', 'payment_method', 'payment_success_rate',
       'station_congestion_level', 'queue_wait

In [519]:
## null values are present or not 
df.isnull().sum().sum()

0

In [520]:
df.isnull().sum()

session_id                         0
vehicle_id                         0
user_id                            0
start_time                         0
end_time                           0
                                  ..
energy_efficiency_kwh_per_100km    0
week_of_year                       0
day_of_year                        0
is_month_end                       0
is_quarter_end                     0
Length: 74, dtype: int64

In [521]:
df_copy['vehicle_model'].value_counts(normalize=True)*100 ## label encoding
df_copy['income_bracket'].value_counts(normalize=True)*100 ## ordinal 
df_copy['station_operator'].value_counts(normalize=True)*100 ## label 
df_copy['is_peak_hour'].value_counts(normalize=True)*100 ## label 
df_copy['load_shedding_event'].value_counts(normalize=True)*100 ## label 
df_copy['cooling_system_active'].value_counts(normalize=True)*100 ## label 
df_copy['thermal_management'].value_counts(normalize=True)*100 ## ordinal 
df_copy['user_type'].value_counts(normalize=True)*100 ## ohe
df_copy['charger_type'].value_counts(normalize=True)*100 ## ohe
df_copy['plug_type'].value_counts(normalize=True)*100 ## ohe
df_copy['weather_condition'].value_counts(normalize=True)*100 ## ohe
df_copy['session_type'].value_counts(normalize=True)*100 ## ohe
df_copy['station_congestion_level'].value_counts(normalize=True)*100 ## ordinal 
df_copy['season'].value_counts(normalize=True)*100 ## ohe


season
Summer          33.497458
Monsoon         25.280547
Winter          24.602698
Post_Monsoon    16.619297
Name: proportion, dtype: float64

In [522]:
## Duplicate values are present or not 
df[df.duplicated()]

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end


In [523]:
from datetime import datetime
## Converting the dtype of the column in datetime format 

def safe_parse_date(x):
    for fmt in ("%d-%m-%Y %H:%M", "%Y-%m-%d %H:%M", "%d/%m/%Y %H:%M:%S"):
        try:
            return datetime.strptime(x, fmt)
        except:
            continue
    return np.nan  # fallback if no format matches

df_copy['start_time'] = df_copy['start_time'].apply(safe_parse_date)
df_copy['end_time'] = df_copy['end_time'].apply(safe_parse_date)


In [524]:
from datetime import datetime
## Converting the dtype of the column in datetime format 

def safe_parse_date(x):
    for fmt in ("%d-%m-%Y %H:%M", "%Y-%m-%d %H:%M", "%d/%m/%Y %H:%M:%S"):
        try:
            return datetime.strptime(x, fmt)
        except:
            continue
    return np.nan  # fallback if no format matches

df['start_time'] = df['start_time'].apply(safe_parse_date)
df['end_time'] = df['end_time'].apply(safe_parse_date)

In [525]:
df

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
0,IND_BEV_136287,MAH8558,USR_87015,2018-01-01 07:30:00,2018-01-01 08:09:00,38.366667,7.784,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,51.0,4.4,16.950283,1,1,False,False
1,IND_BEV_105860,TAT3386,USR_48693,2018-01-01 08:33:00,2018-01-01 10:58:00,145.000000,12.672,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,95.0,4.4,16.950283,1,1,False,False
2,IND_BEV_145876,TVS7311,USR_15892,2018-01-01 09:48:00,2018-01-01 09:59:00,11.750000,1.406,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Scheduled,6.0,3.8,12.408844,1,1,False,False
3,IND_BEV_150791,BAJ2976,USR_99201,2018-01-01 10:16:00,2018-01-01 10:21:00,5.000000,0.642,BAJAJ,Urbanite,2.9,...,IS_17017_Compliant,BIS_Certified,Scheduled,81.0,4.7,12.408844,1,1,False,False
4,IND_BEV_121632,TAT8277,USR_31054,2018-01-01 11:44:00,2018-01-01 11:50:00,6.200000,12.512,TATA,Tiago EV,24.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,13.0,4.1,16.950283,1,1,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74938,IND_BEV_126265,MAH5891,USR_99873,2024-12-31 18:19:00,2024-12-31 20:22:00,123.133333,13.600,MAHINDRA,eVerito,21.2,...,IS_17017_Compliant,BIS_Certified,Opportunistic,106.0,4.3,16.950283,1,366,True,True
74939,IND_BEV_116010,TVS9578,USR_75798,2024-12-31 18:53:00,2024-12-31 19:06:00,12.616667,3.237,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Opportunistic,118.0,4.6,12.408844,1,366,True,True
74940,IND_BEV_166151,BAJ2386,USR_11068,2024-12-31 20:32:00,2024-12-31 20:37:00,5.366667,0.629,BAJAJ,Urbanite,2.9,...,IS_17017_Compliant,BIS_Certified,Scheduled,30.0,4.2,12.408844,1,366,True,True
74941,IND_BEV_174026,MAH6729,USR_98840,2024-12-31 21:48:00,2024-12-31 22:40:00,51.600000,4.968,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,69.0,3.8,16.950283,1,366,True,True


In [526]:
df_copy.head()

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
0,IND_BEV_136287,MAH8558,USR_87015,2018-01-01 07:30:00,2018-01-01 08:09:00,38.366667,7.784,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,51.0,4.4,16.950283,1,1,False,False
1,IND_BEV_105860,TAT3386,USR_48693,2018-01-01 08:33:00,2018-01-01 10:58:00,145.0,12.672,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,95.0,4.4,16.950283,1,1,False,False
2,IND_BEV_145876,TVS7311,USR_15892,2018-01-01 09:48:00,2018-01-01 09:59:00,11.75,1.406,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Scheduled,6.0,3.8,12.408844,1,1,False,False
3,IND_BEV_150791,BAJ2976,USR_99201,2018-01-01 10:16:00,2018-01-01 10:21:00,5.0,0.642,BAJAJ,Urbanite,2.9,...,IS_17017_Compliant,BIS_Certified,Scheduled,81.0,4.7,12.408844,1,1,False,False
4,IND_BEV_121632,TAT8277,USR_31054,2018-01-01 11:44:00,2018-01-01 11:50:00,6.2,12.512,TATA,Tiago EV,24.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,13.0,4.1,16.950283,1,1,False,False


In [527]:
import pandas as pd

# Load and parse datetime columns
df_copy['start_time'] = pd.to_datetime(df_copy['start_time'])
df_copy['end_time'] = pd.to_datetime(df_copy['end_time'])

In [528]:
df_copy.head(3)

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
0,IND_BEV_136287,MAH8558,USR_87015,2018-01-01 07:30:00,2018-01-01 08:09:00,38.366667,7.784,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,51.0,4.4,16.950283,1,1,False,False
1,IND_BEV_105860,TAT3386,USR_48693,2018-01-01 08:33:00,2018-01-01 10:58:00,145.0,12.672,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,95.0,4.4,16.950283,1,1,False,False
2,IND_BEV_145876,TVS7311,USR_15892,2018-01-01 09:48:00,2018-01-01 09:59:00,11.75,1.406,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Scheduled,6.0,3.8,12.408844,1,1,False,False


In [529]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74943 entries, 0 to 74942
Data columns (total 74 columns):
 #   Column                           Non-Null Count  Dtype         
---  ------                           --------------  -----         
 0   session_id                       74943 non-null  object        
 1   vehicle_id                       74943 non-null  object        
 2   user_id                          74943 non-null  object        
 3   start_time                       74943 non-null  datetime64[ns]
 4   end_time                         74943 non-null  datetime64[ns]
 5   charging_duration_min            74943 non-null  float64       
 6   energy_delivered_kwh             74943 non-null  float64       
 7   vehicle_make                     74943 non-null  object        
 8   vehicle_model                    74943 non-null  object        
 9   battery_capacity_kwh             74943 non-null  float64       
 10  vehicle_age_years                74943 non-null  int64    

In [530]:
# 1. ⚡ Charging Duration (in minutes)
#df_copy['effective_charging_duration_min'] = (df_copy['end_time'] - df_copy['start_time']).dt.total_seconds() / 60

In [531]:
# 4. 🧠 Efficiency: kWh delivered per minute of charging duration
#df_copy['kWh_per_min_effective_charging'] = df_copy['energy_delivered_kwh'] / df_copy['effective_charging_duration_min']

In [532]:
df_copy.head(3)

Unnamed: 0,session_id,vehicle_id,user_id,start_time,end_time,charging_duration_min,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,...,charging_standard_compliance,safety_certification,session_type,booking_lead_time_min,session_rating,energy_efficiency_kwh_per_100km,week_of_year,day_of_year,is_month_end,is_quarter_end
0,IND_BEV_136287,MAH8558,USR_87015,2018-01-01 07:30:00,2018-01-01 08:09:00,38.366667,7.784,MAHINDRA,e2o Plus,15.0,...,IS_17017_Compliant,BIS_Certified,Scheduled,51.0,4.4,16.950283,1,1,False,False
1,IND_BEV_105860,TAT3386,USR_48693,2018-01-01 08:33:00,2018-01-01 10:58:00,145.0,12.672,TATA,Tigor EV,26.0,...,IS_17017_Compliant,BIS_Certified,Opportunistic,95.0,4.4,16.950283,1,1,False,False
2,IND_BEV_145876,TVS7311,USR_15892,2018-01-01 09:48:00,2018-01-01 09:59:00,11.75,1.406,TVS,X,4.4,...,IS_17017_Compliant,BIS_Certified,Scheduled,6.0,3.8,12.408844,1,1,False,False


In [533]:
# connectionTime
df_copy['start_time_year'] = df_copy['start_time'].dt.year
df_copy['start_time_month'] = df_copy['start_time'].dt.month
df_copy['start_time_day'] = df_copy['start_time'].dt.day
df_copy['start_time_hour'] = df_copy['start_time'].dt.hour
df_copy['start_time_min'] = df_copy['start_time'].dt.minute
df_copy['start_time_sec'] = df_copy['start_time'].dt.second

# disconnectTime
df_copy['end_time_year'] = df_copy['end_time'].dt.year
df_copy['end_time_month'] = df_copy['end_time'].dt.month
df_copy['end_time_day'] = df_copy['end_time'].dt.day
df_copy['end_time_hour'] = df_copy['end_time'].dt.hour
df_copy['end_time_min'] = df_copy['end_time'].dt.minute
df_copy['end_time_sec'] = df_copy['end_time'].dt.second


In [534]:
# connectionTime
df['start_time_year'] = df['start_time'].dt.year
df['start_time_month'] = df['start_time'].dt.month
df['start_time_day'] = df['start_time'].dt.day
df['start_time_hour'] = df['start_time'].dt.hour
df['start_time_min'] = df['start_time'].dt.minute
df['start_time_sec'] = df['start_time'].dt.second

# disconnectTime
df['end_time_year'] = df['end_time'].dt.year
df['end_time_month'] = df['end_time'].dt.month
df['end_time_day'] = df['end_time'].dt.day
df['end_time_hour'] = df['end_time'].dt.hour
df['end_time_min'] = df['end_time'].dt.minute
df['end_time_sec'] = df['end_time'].dt.second

In [535]:
df_copy.drop(columns=['start_time','end_time'], inplace=True)

In [536]:
df_copy.columns

Index(['session_id', 'vehicle_id', 'user_id', 'charging_duration_min',
       'energy_delivered_kwh', 'vehicle_make', 'vehicle_model',
       'battery_capacity_kwh', 'vehicle_age_years', 'battery_health_index',
       'user_type', 'income_bracket', 'city', 'state', 'latitude', 'longitude',
       'charging_station_id', 'station_operator', 'charger_type',
       'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'day_of_week', 'hour_of_day',
       'month', 'season', 'is_weekend', 'is_peak_hour', 'festival',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'tariff_per_kwh_inr', 'total_cost_inr',
       'subsidy_amount_inr', 'payment_method', 'payment_success_rate',
       'station_congestion_level', 'queue_wait_time_min',
       'charge

### Our target variable is 'charging_duration_min' for this dataset ....................
### We are trying to dividing our data set into dependent and independent variable .....
### Our problem is basically a  supervised regreession problem of machine learning ..... 

In [537]:
X = df_copy.drop(['charging_duration_min'],axis = 1)
## creating a copy of X (independent variables) as X_copy
y = df_copy[['charging_duration_min']]
X_copy = X.copy()

In [538]:
X.head(1)

Unnamed: 0,session_id,vehicle_id,user_id,energy_delivered_kwh,vehicle_make,vehicle_model,battery_capacity_kwh,vehicle_age_years,battery_health_index,user_type,...,start_time_day,start_time_hour,start_time_min,start_time_sec,end_time_year,end_time_month,end_time_day,end_time_hour,end_time_min,end_time_sec
0,IND_BEV_136287,MAH8558,USR_87015,7.784,MAHINDRA,e2o Plus,15.0,1,0.943,Ride_Share,...,1,7,30,0,2018,1,1,8,9,0


In [539]:
X.shape

(74943, 83)

In [540]:
y.head(1)

Unnamed: 0,charging_duration_min
0,38.366667


In [541]:
y.shape,X.shape

((74943, 1), (74943, 83))

In [542]:
print(len(X.columns))

83


In [543]:
print(X.columns)

Index(['session_id', 'vehicle_id', 'user_id', 'energy_delivered_kwh',
       'vehicle_make', 'vehicle_model', 'battery_capacity_kwh',
       'vehicle_age_years', 'battery_health_index', 'user_type',
       'income_bracket', 'city', 'state', 'latitude', 'longitude',
       'charging_station_id', 'station_operator', 'charger_type',
       'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'day_of_week', 'hour_of_day',
       'month', 'season', 'is_weekend', 'is_peak_hour', 'festival',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'tariff_per_kwh_inr', 'total_cost_inr',
       'subsidy_amount_inr', 'payment_method', 'payment_success_rate',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'sta

In [544]:
# y = y.iloc[:,0].to_numpy() ## target variable is converted from dataframe to 1d array 

In [545]:
y

Unnamed: 0,charging_duration_min
0,38.366667
1,145.000000
2,11.750000
3,5.000000
4,6.200000
...,...
74938,123.133333
74939,12.616667
74940,5.366667
74941,51.600000


In [546]:
X.shape,y.shape,X_copy.shape

((74943, 83), (74943, 1), (74943, 83))

## Lets do a data preprocessing and feature engineering on the dataset

### Dropping the columns which are not required for analysis
- session_id
- vehicle_id
- user_id
- latitude
- longitude
- charging_station_id
- festival
- payment_method
- payment_success_rate
- app_reliability_rating
- state_ev_policy
- charging_standard_compliance
- safety_certification
- is_month_end
- is_quarter_end
- vehicle_make 
- day_of_week  
- city   
- month  
- state  
- hour_of_day  
- tariff_per_kwh_inr  
- subsidy_amount_inr 
- session_success_rate 
- user_satisfaction_score 
- session_rating  
- energy_efficiency_kwh_per_100km  
- trip_purpose  
- is_weekend  
- session_type
- user_type
- week_of_year
- day_of_year 
- booking_lead_time_min
- distance_to_station_km 
- range_remaining_km
- next_destination_distance_km
- is_peak_hour
- income_bracket


In [547]:
X_copy.columns

Index(['session_id', 'vehicle_id', 'user_id', 'energy_delivered_kwh',
       'vehicle_make', 'vehicle_model', 'battery_capacity_kwh',
       'vehicle_age_years', 'battery_health_index', 'user_type',
       'income_bracket', 'city', 'state', 'latitude', 'longitude',
       'charging_station_id', 'station_operator', 'charger_type',
       'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'day_of_week', 'hour_of_day',
       'month', 'season', 'is_weekend', 'is_peak_hour', 'festival',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'tariff_per_kwh_inr', 'total_cost_inr',
       'subsidy_amount_inr', 'payment_method', 'payment_success_rate',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'sta

In [548]:
X_copy.shape

(74943, 83)

In [549]:
X_copy.drop(columns=['session_id', 'vehicle_id', 'user_id','latitude', 'longitude',
       'charging_station_id','festival','payment_method', 'payment_success_rate','app_reliability_rating', 'state_ev_policy',
       'charging_standard_compliance', 'safety_certification','is_month_end', 'is_quarter_end','vehicle_make','day_of_week','city',
       'month','state','hour_of_day','tariff_per_kwh_inr','subsidy_amount_inr','session_success_rate','user_satisfaction_score','session_rating','energy_efficiency_kwh_per_100km','trip_purpose','is_weekend',
       'session_type','user_type','week_of_year', 'day_of_year', 'booking_lead_time_min', 
      'distance_to_station_km', 'range_remaining_km', 'next_destination_distance_km',
       'is_peak_hour', 'income_bracket'], inplace=True)


In [550]:
X_copy.shape,X.shape

((74943, 44), (74943, 83))

In [551]:
X_copy.shape,X.shape

((74943, 44), (74943, 83))

In [552]:
X_copy.columns

Index(['energy_delivered_kwh', 'vehicle_model', 'battery_capacity_kwh',
       'vehicle_age_years', 'battery_health_index', 'station_operator',
       'charger_type', 'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'season',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'total_cost_inr', 'station_congestion_level',
       'queue_wait_time_min', 'charger_utilization_rate',
       'station_uptime_percent', 'charging_efficiency_percent',
       'charging_curve_efficiency', 'cooling_system_active',
       'thermal_management', 'start_time_year', 'start_time_month',
       'start_time_day', 'start_time_hour', 'start_time_min', 'start_time_sec',
       'end_time_year', 'end_time_month', 'end_time_day', 'end_time_hour',
       'end_time

### Now we are going to the label Encoder part 
- load_shedding_event
- Cooling_system_active

In [553]:
from sklearn.preprocessing import LabelEncoder

In [554]:
X_copy['vehicle_model'].unique()

array(['e2o Plus', 'Tigor EV', 'X', 'Urbanite', 'Tiago EV', 'ZS EV',
       'eMax', 'iX', 'Verito Electric', 'e6', 'Photon', 'iQube',
       'eXUV300', 'S1 Air', 'Comet', 'Chetak', 'Nexon EV', 'Ampere Zeal',
       'eVerito', 'Vida V1', 'Ampere Magnus', 'S1 Pro', 'Ioniq 5',
       'Optima', 'Kona', 'EV6', 'Atto 3', 'e-tron'], dtype=object)

In [555]:
X_copy['station_operator'].unique()

array(['Ather', 'Ola Electric', 'Shell', 'BPCL', 'HPCL', 'Tata Power',
       'Fortum', 'Adani Total', 'EESL', 'ChargeZone', 'IOCL'],
      dtype=object)

In [556]:
#le1=LabelEncoder()
#le2=LabelEncoder()
le3=LabelEncoder()
le4=LabelEncoder()
#le5=LabelEncoder()

In [557]:
import warnings
warnings.filterwarnings('ignore')
#X_copy['vehicle_model']=le1.fit_transform(X_copy['vehicle_model'])
# X_copy['is_peak_hour']=le2.fit_transform(X_copy['is_peak_hour'])
X_copy['load_shedding_event']=le3.fit_transform(X_copy['load_shedding_event'])
X_copy['cooling_system_active']=le4.fit_transform(X_copy['cooling_system_active'])
#X_copy['station_operator']=le5.fit_transform(X_copy['station_operator'])

In [558]:
X_copy['vehicle_model'].unique(),X_copy['load_shedding_event'].unique(),X_copy['cooling_system_active'].unique(),X_copy['station_operator'].unique()

(array(['e2o Plus', 'Tigor EV', 'X', 'Urbanite', 'Tiago EV', 'ZS EV',
        'eMax', 'iX', 'Verito Electric', 'e6', 'Photon', 'iQube',
        'eXUV300', 'S1 Air', 'Comet', 'Chetak', 'Nexon EV', 'Ampere Zeal',
        'eVerito', 'Vida V1', 'Ampere Magnus', 'S1 Pro', 'Ioniq 5',
        'Optima', 'Kona', 'EV6', 'Atto 3', 'e-tron'], dtype=object),
 array([0, 1]),
 array([0, 1]),
 array(['Ather', 'Ola Electric', 'Shell', 'BPCL', 'HPCL', 'Tata Power',
        'Fortum', 'Adani Total', 'EESL', 'ChargeZone', 'IOCL'],
       dtype=object))

In [559]:
X_copy.shape,X.shape

((74943, 44), (74943, 83))

In [560]:
X_copy.columns

Index(['energy_delivered_kwh', 'vehicle_model', 'battery_capacity_kwh',
       'vehicle_age_years', 'battery_health_index', 'station_operator',
       'charger_type', 'charger_power_kw', 'plug_type', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'season',
       'ambient_temperature_c', 'humidity_percent', 'weather_condition',
       'air_quality_index', 'battery_temperature_c', 'grid_frequency_hz',
       'grid_reliability_index', 'power_quality_score', 'load_shedding_event',
       'grid_load_mw', 'total_cost_inr', 'station_congestion_level',
       'queue_wait_time_min', 'charger_utilization_rate',
       'station_uptime_percent', 'charging_efficiency_percent',
       'charging_curve_efficiency', 'cooling_system_active',
       'thermal_management', 'start_time_year', 'start_time_month',
       'start_time_day', 'start_time_hour', 'start_time_min', 'start_time_sec',
       'end_time_year', 'end_time_month', 'end_time_day', 'end_time_hour',
       'end_time

In [561]:
df_copy.shape

(74943, 84)

In [562]:
#'user_type','charger_type','plug_type','season','weather_condition','session_type'
df_copy['charger_type'].value_counts(normalize=True),df_copy['plug_type'].value_counts(normalize=True),df_copy['season'].value_counts(normalize=True), df_copy['weather_condition'].value_counts(normalize=True),df['vehicle_model'].value_counts(normalize=True),df['station_operator'].value_counts(normalize=True)


(charger_type
 AC_SLOW    0.602565
 AC_FAST    0.247628
 DC_FAST    0.149807
 Name: proportion, dtype: float64,
 plug_type
 Type2           0.299014
 CCS2            0.250951
 Bharat AC001    0.200592
 CHAdeMO         0.149554
 Bharat DC001    0.099889
 Name: proportion, dtype: float64,
 season
 Summer          0.334975
 Monsoon         0.252805
 Winter          0.246027
 Post_Monsoon    0.166193
 Name: proportion, dtype: float64,
 weather_condition
 Clear            0.397302
 PARTLY_CLOUDY    0.200165
 Overcast         0.150488
 LIGHT_RAIN       0.100703
 HEAVY_RAIN       0.051092
 THUNDERSTORM     0.029969
 Foggy            0.029502
 DUST_STORM       0.020656
 HEATWAVE         0.020122
 Name: proportion, dtype: float64,
 vehicle_model
 Tigor EV           0.084784
 Nexon EV           0.084624
 Tiago EV           0.083784
 Urbanite           0.075577
 Chetak             0.074697
 e2o Plus           0.066290
 eVerito            0.066197
 eXUV300            0.065970
 iQube              0

### Now we are goint to the One hot encoding world !!!! 
- charger_type
- plug_type
- season
- weather_condition
- vehicle_model
- station_operator

In [563]:
## Onehot encoding--- ColumnTrnasformer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

In [564]:
onehot_columns = ['charger_type','plug_type','season','weather_condition','vehicle_model','station_operator']
# Apply One-Hot Encoding only to those columns
X_copy_encoded = pd.get_dummies(X_copy, columns=onehot_columns, drop_first=True,dtype=int)

# Overwrite the original DataFrame with the updated one
X_copy = X_copy_encoded

In [565]:
X_copy.shape,X.shape

((74943, 92), (74943, 83))

In [566]:
X_copy.columns

Index(['energy_delivered_kwh', 'battery_capacity_kwh', 'vehicle_age_years',
       'battery_health_index', 'charger_power_kw', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'ambient_temperature_c',
       'humidity_percent', 'air_quality_index', 'battery_temperature_c',
       'grid_frequency_hz', 'grid_reliability_index', 'power_quality_score',
       'load_shedding_event', 'grid_load_mw', 'total_cost_inr',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'station_uptime_percent',
       'charging_efficiency_percent', 'charging_curve_efficiency',
       'cooling_system_active', 'thermal_management', 'start_time_year',
       'start_time_month', 'start_time_day', 'start_time_hour',
       'start_time_min', 'start_time_sec', 'end_time_year', 'end_time_month',
       'end_time_day', 'end_time_hour', 'end_time_min', 'end_time_sec',
       'charger_type_AC_SLOW', 'charger_type_DC_FAST',
       'plug_type_Bharat DC001', 

### Now we are going to the label Encoder part 
- load_shedding_event
- Cooling_system_active

### Now we are going to the Ordinal Encoder part 
- station_congestion_level
- thermal__management

In [567]:
#income_bracket
#station_congestion_level
#thermal_management
# Example: encoding education levels manually
#income_bracket_order= {'High':5, 'Upper_Middle':4, 'Middle':3, 'Lower_Middle':2,'Low':1}
station_congestion_level_order= {'High':3, 'Medium':2, 'Low':1}
thermal_management_order =  {'Active':2, 'Passive':1}
# Apply encoding
# X_copy['income_bracket'] = X_copy['income_bracket'].map(income_bracket_order)
X_copy['station_congestion_level'] = X_copy['station_congestion_level'].map(station_congestion_level_order)
X_copy['thermal_management'] = X_copy['thermal_management'].map(thermal_management_order)

# These changes are saved directly in the original DataFrame

In [568]:
X_copy['station_congestion_level'].value_counts(normalize=True)*100,X_copy['thermal_management'].value_counts(normalize=True)*100

(station_congestion_level
 1    50.108749
 2    34.871836
 3    15.019415
 Name: proportion, dtype: float64,
 thermal_management
 1    73.082209
 2    26.917791
 Name: proportion, dtype: float64)

### Spliting the Start_time and end_time columns into  and drop the end_time and start_time column
-  start_time_day
-  start_time_month
-  start_time_year
-  start_time_hour
-  start_time_minute
-  start_time_sec
-  end_time_day
-  end_time_month
-  end_time_year
-  end_time_hour
-  end_time_minute
-  end_time_sec

In [569]:
# X_copy['start_time_day']= X_copy['start_time'].str.split(' ').str[0].str.split('-').str[0]
# X_copy['start_time_month']= X_copy['start_time'].str.split(' ').str[0].str.split('-').str[1]
# X_copy['start_time_year']= X_copy['start_time'].str.split(' ').str[0].str.split('-').str[2]
# X_copy['start_time_hr']= X_copy['start_time'].str.split(' ').str[1].str.split(':').str[0]
# X_copy['start_time_min']= X_copy['start_time'].str.split(' ').str[1].str.split(':').str[1]


In [570]:
# X_copy['end_time_day']= X_copy['end_time'].str.split(' ').str[0].str.split('-').str[0]
# X_copy['end_time_month']= X_copy['end_time'].str.split(' ').str[0].str.split('-').str[1]
# X_copy['end_time_year']= X_copy['end_time'].str.split(' ').str[0].str.split('-').str[2]
# X_copy['end_time_hr']= X_copy['end_time'].str.split(' ').str[1].str.split(':').str[0]
# X_copy['end_time_min']= X_copy['end_time'].str.split(' ').str[1].str.split(':').str[1]


In [571]:
# X_copy['start_time_day']=X_copy['start_time_day'].astype(int)
# X_copy['start_time_month']=X_copy['start_time_month'].astype(int)
# X_copy['start_time_year']=X_copy['start_time_year'].astype(int)
# X_copy['start_time_hr']=X_copy['start_time_hr'].astype(int)
# X_copy['start_time_min']=X_copy['start_time_min'].astype(int)
# X_copy['end_time_day']=X_copy['end_time_day'].astype(int)
# X_copy['end_time_month']=X_copy['end_time_month'].astype(int)
# X_copy['end_time_year']=X_copy['end_time_year'].astype(int)
# X_copy['end_time_hr']=X_copy['end_time_hr'].astype(int)
# X_copy['end_time_min']=X_copy['end_time_min'].astype(int)


In [572]:
# X_copy.drop(columns=['start_time', 'end_time'], inplace=True)

In [573]:
X_copy.shape , X.shape

((74943, 92), (74943, 83))

In [574]:
X_copy.columns

Index(['energy_delivered_kwh', 'battery_capacity_kwh', 'vehicle_age_years',
       'battery_health_index', 'charger_power_kw', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'ambient_temperature_c',
       'humidity_percent', 'air_quality_index', 'battery_temperature_c',
       'grid_frequency_hz', 'grid_reliability_index', 'power_quality_score',
       'load_shedding_event', 'grid_load_mw', 'total_cost_inr',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'station_uptime_percent',
       'charging_efficiency_percent', 'charging_curve_efficiency',
       'cooling_system_active', 'thermal_management', 'start_time_year',
       'start_time_month', 'start_time_day', 'start_time_hour',
       'start_time_min', 'start_time_sec', 'end_time_year', 'end_time_month',
       'end_time_day', 'end_time_hour', 'end_time_min', 'end_time_sec',
       'charger_type_AC_SLOW', 'charger_type_DC_FAST',
       'plug_type_Bharat DC001', 

In [575]:
X_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74943 entries, 0 to 74942
Data columns (total 92 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   energy_delivered_kwh             74943 non-null  float64
 1   battery_capacity_kwh             74943 non-null  float64
 2   vehicle_age_years                74943 non-null  int64  
 3   battery_health_index             74943 non-null  float64
 4   charger_power_kw                 74943 non-null  float64
 5   initial_soc_percent              74943 non-null  int64  
 6   final_soc_percent                74943 non-null  int64  
 7   soc_gained_percent               74943 non-null  int64  
 8   ambient_temperature_c            74943 non-null  float64
 9   humidity_percent                 74943 non-null  float64
 10  air_quality_index                74943 non-null  int64  
 11  battery_temperature_c            74943 non-null  float64
 12  grid_frequency_hz 

### Train test split is our task so we divide our data set into train and test set 
### Testing is 20 percent of the complete dataset
### Training is 80 percent of the complete dataset

In [576]:
##train test split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X_copy,y,test_size=0.20,random_state=42)

In [577]:
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((59954, 92), (14989, 92), (59954, 1), (14989, 1))

In [578]:
X_train.head(1)

Unnamed: 0,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,charger_power_kw,initial_soc_percent,final_soc_percent,soc_gained_percent,ambient_temperature_c,humidity_percent,...,station_operator_Ather,station_operator_BPCL,station_operator_ChargeZone,station_operator_EESL,station_operator_Fortum,station_operator_HPCL,station_operator_IOCL,station_operator_Ola Electric,station_operator_Shell,station_operator_Tata Power
25640,1.013,2.9,0,1.002,4.9,43,77,34,44.3,80.4,...,0,0,1,0,0,0,0,0,0,0


In [579]:
X_test.head(1)

Unnamed: 0,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,charger_power_kw,initial_soc_percent,final_soc_percent,soc_gained_percent,ambient_temperature_c,humidity_percent,...,station_operator_Ather,station_operator_BPCL,station_operator_ChargeZone,station_operator_EESL,station_operator_Fortum,station_operator_HPCL,station_operator_IOCL,station_operator_Ola Electric,station_operator_Shell,station_operator_Tata Power
6404,1.011,1.8,0,0.96,4.1,29,81,52,27.2,75.0,...,0,0,1,0,0,0,0,0,0,0


In [580]:
y_train,y_test

(       charging_duration_min
 25640               7.666667
 19809              10.650000
 811               178.933333
 44128               7.533333
 38719             190.950000
 ...                      ...
 37194              15.016667
 6265               15.300000
 54886               5.000000
 860                26.466667
 15795              17.533333
 
 [59954 rows x 1 columns],
        charging_duration_min
 6404               12.733333
 38404              13.816667
 3132                5.000000
 34995             122.116667
 28097              23.383333
 ...                      ...
 39872              44.316667
 18129              16.750000
 39796               5.000000
 1520                7.566667
 20488              63.733333
 
 [14989 rows x 1 columns])

In [581]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 59954 entries, 25640 to 15795
Data columns (total 92 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   energy_delivered_kwh             59954 non-null  float64
 1   battery_capacity_kwh             59954 non-null  float64
 2   vehicle_age_years                59954 non-null  int64  
 3   battery_health_index             59954 non-null  float64
 4   charger_power_kw                 59954 non-null  float64
 5   initial_soc_percent              59954 non-null  int64  
 6   final_soc_percent                59954 non-null  int64  
 7   soc_gained_percent               59954 non-null  int64  
 8   ambient_temperature_c            59954 non-null  float64
 9   humidity_percent                 59954 non-null  float64
 10  air_quality_index                59954 non-null  int64  
 11  battery_temperature_c            59954 non-null  float64
 12  grid_frequency_hz  

In [582]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14989 entries, 6404 to 20488
Data columns (total 92 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   energy_delivered_kwh             14989 non-null  float64
 1   battery_capacity_kwh             14989 non-null  float64
 2   vehicle_age_years                14989 non-null  int64  
 3   battery_health_index             14989 non-null  float64
 4   charger_power_kw                 14989 non-null  float64
 5   initial_soc_percent              14989 non-null  int64  
 6   final_soc_percent                14989 non-null  int64  
 7   soc_gained_percent               14989 non-null  int64  
 8   ambient_temperature_c            14989 non-null  float64
 9   humidity_percent                 14989 non-null  float64
 10  air_quality_index                14989 non-null  int64  
 11  battery_temperature_c            14989 non-null  float64
 12  grid_frequency_hz   

In [583]:
X_train.columns

Index(['energy_delivered_kwh', 'battery_capacity_kwh', 'vehicle_age_years',
       'battery_health_index', 'charger_power_kw', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'ambient_temperature_c',
       'humidity_percent', 'air_quality_index', 'battery_temperature_c',
       'grid_frequency_hz', 'grid_reliability_index', 'power_quality_score',
       'load_shedding_event', 'grid_load_mw', 'total_cost_inr',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'station_uptime_percent',
       'charging_efficiency_percent', 'charging_curve_efficiency',
       'cooling_system_active', 'thermal_management', 'start_time_year',
       'start_time_month', 'start_time_day', 'start_time_hour',
       'start_time_min', 'start_time_sec', 'end_time_year', 'end_time_month',
       'end_time_day', 'end_time_hour', 'end_time_min', 'end_time_sec',
       'charger_type_AC_SLOW', 'charger_type_DC_FAST',
       'plug_type_Bharat DC001', 

In [584]:
# Create Column Transformer with 3 types of transformers
num_features= ['energy_delivered_kwh','battery_capacity_kwh',
       'vehicle_age_years', 'battery_health_index' ,'charger_power_kw', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent',
       'ambient_temperature_c', 'humidity_percent', 'air_quality_index',
       'battery_temperature_c', 'grid_frequency_hz', 'grid_reliability_index',
       'power_quality_score','grid_load_mw',
       'total_cost_inr','queue_wait_time_min',
       'charger_utilization_rate', 'station_uptime_percent',
       'charging_efficiency_percent', 'charging_curve_efficiency',
       ]

from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

numeric_transformer = StandardScaler()


preprocessor = ColumnTransformer(
    [
        ("StandardScaler", numeric_transformer, num_features)
    ],remainder='passthrough'
    
)
#  'start_time_year', 'start_time_month', 'start_time_day',
#        'start_time_hour', 'start_time_min', 'start_time_sec', 'end_time_year',
#        'end_time_month', 'end_time_day', 'end_time_hour', 'end_time_min',   'end_time_sec'
#        'end_time_sec'
#'effective_charging_duration_min', 'kWh_per_min_effective_charging'

In [585]:
X_train_scaled = X_train.copy()
X_train_scaled[num_features]=numeric_transformer.fit_transform(X_train[num_features]) ## using fit_transform here
X_test_scaled = X_test.copy()
X_test_scaled[num_features]=numeric_transformer.transform(X_test[num_features]) ## using only transform here
## summation of (data point - mean / standard deviaition)

In [586]:
X_train_scaled.shape

(59954, 92)

In [587]:
X_train_scaled.head()

Unnamed: 0,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,charger_power_kw,initial_soc_percent,final_soc_percent,soc_gained_percent,ambient_temperature_c,humidity_percent,...,station_operator_Ather,station_operator_BPCL,station_operator_ChargeZone,station_operator_EESL,station_operator_Fortum,station_operator_HPCL,station_operator_IOCL,station_operator_Ola Electric,station_operator_Shell,station_operator_Tata Power
25640,-0.797725,-0.843202,-0.818079,1.081072,-0.489688,0.950128,0.572638,-0.303768,1.897417,0.664051,...,0,0,1,0,0,0,0,0,0,0
19809,-0.828114,-0.890541,-0.417128,-0.153682,-0.508361,1.594605,1.373461,-0.07179,-0.206033,-0.837637,...,0,1,0,0,0,0,0,0,0,0
811,0.374106,1.066117,-0.417128,0.51982,-0.536371,0.386211,-0.737798,-1.154352,-1.174098,-1.687347,...,0,0,0,0,0,1,0,0,0,0
44128,-0.875033,-0.90106,-0.818079,0.632071,-0.533259,-0.097146,-1.320214,-1.309004,-0.672138,0.733524,...,0,0,0,0,0,1,0,0,0,0
38719,1.585735,2.185409,-0.417128,0.220486,-0.452341,0.063973,-0.519392,-0.613071,0.295927,1.032793,...,1,0,0,0,0,0,0,0,0,0


In [588]:
X_train.head()

Unnamed: 0,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,charger_power_kw,initial_soc_percent,final_soc_percent,soc_gained_percent,ambient_temperature_c,humidity_percent,...,station_operator_Ather,station_operator_BPCL,station_operator_ChargeZone,station_operator_EESL,station_operator_Fortum,station_operator_HPCL,station_operator_IOCL,station_operator_Ola Electric,station_operator_Shell,station_operator_Tata Power
25640,1.013,2.9,0,1.002,4.9,43,77,34,44.3,80.4,...,0,0,1,0,0,0,0,0,0,0
19809,0.774,2.0,1,0.936,4.3,51,88,37,26.7,52.3,...,0,1,0,0,0,0,0,0,0,0
811,10.229,39.2,1,0.972,3.4,36,59,23,18.6,36.4,...,0,0,0,0,0,1,0,0,0,0
44128,0.405,1.8,0,0.978,3.5,30,51,21,22.8,81.7,...,0,0,0,0,0,1,0,0,0,0
38719,19.758,60.48,1,0.956,6.1,32,62,30,30.9,87.3,...,1,0,0,0,0,0,0,0,0,0


### Making csv file of X_train_copy , y_train and y_test 

In [589]:
X_train_scaled.to_csv(r"../data/X_train.csv", index=False)
X_test_scaled.to_csv(r"../data/X_test.csv", index=False)
y_train.to_csv(r"../data/y_train.csv", index=False)
y_test.to_csv(r"../data/y_test.csv", index=False)

In [590]:
# import numpy as np

# # Save y_train
# np.savetxt(r"./data/y_train.csv", y_train, delimiter=",", fmt='%s')



In [591]:
# X_test_scaled.to_csv(r"./data/X_test.csv", index=False)

In [592]:
# import numpy as np

# # Save y_test
# np.savetxt(r"./data/y_test.csv", y_test, delimiter=",", fmt='%s')




In [593]:
X_train.columns

Index(['energy_delivered_kwh', 'battery_capacity_kwh', 'vehicle_age_years',
       'battery_health_index', 'charger_power_kw', 'initial_soc_percent',
       'final_soc_percent', 'soc_gained_percent', 'ambient_temperature_c',
       'humidity_percent', 'air_quality_index', 'battery_temperature_c',
       'grid_frequency_hz', 'grid_reliability_index', 'power_quality_score',
       'load_shedding_event', 'grid_load_mw', 'total_cost_inr',
       'station_congestion_level', 'queue_wait_time_min',
       'charger_utilization_rate', 'station_uptime_percent',
       'charging_efficiency_percent', 'charging_curve_efficiency',
       'cooling_system_active', 'thermal_management', 'start_time_year',
       'start_time_month', 'start_time_day', 'start_time_hour',
       'start_time_min', 'start_time_sec', 'end_time_year', 'end_time_month',
       'end_time_day', 'end_time_hour', 'end_time_min', 'end_time_sec',
       'charger_type_AC_SLOW', 'charger_type_DC_FAST',
       'plug_type_Bharat DC001', 

In [594]:
X_train.corr()

Unnamed: 0,energy_delivered_kwh,battery_capacity_kwh,vehicle_age_years,battery_health_index,charger_power_kw,initial_soc_percent,final_soc_percent,soc_gained_percent,ambient_temperature_c,humidity_percent,...,station_operator_Ather,station_operator_BPCL,station_operator_ChargeZone,station_operator_EESL,station_operator_Fortum,station_operator_HPCL,station_operator_IOCL,station_operator_Ola Electric,station_operator_Shell,station_operator_Tata Power
energy_delivered_kwh,1.000000,0.904791,-0.046864,0.051145,-0.000689,-0.119583,0.170471,0.295847,-0.002404,0.009982,...,-0.001448,-0.000790,-0.000798,0.004623,-0.000866,-0.001781,-0.005965,0.003391,0.007365,-0.004641
battery_capacity_kwh,0.904791,1.000000,-0.001870,0.001992,-0.000042,0.005372,-0.001588,-0.006843,0.005381,0.003511,...,-0.000919,-0.002970,-0.000477,0.004183,0.000195,-0.001733,-0.004562,0.004095,0.003918,-0.002942
vehicle_age_years,-0.046864,-0.001870,1.000000,-0.925426,0.001073,-0.004894,-0.001743,0.002846,0.003479,0.003335,...,-0.000471,-0.006547,-0.001871,-0.006076,-0.000807,0.004246,0.003137,0.010260,-0.003533,0.003065
battery_health_index,0.051145,0.001992,-0.925426,1.000000,-0.000123,0.004998,0.002110,-0.002556,-0.004836,-0.000368,...,-0.000117,0.007143,0.002579,0.003692,-0.002125,-0.003800,-0.003926,-0.006807,0.002367,-0.002202
charger_power_kw,-0.000689,-0.000042,0.001073,-0.000123,1.000000,-0.000452,-0.004843,-0.004710,-0.004980,-0.002996,...,-0.000051,-0.001899,-0.001502,-0.003259,0.002403,0.005031,-0.012214,0.011148,0.000505,-0.002219
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
station_operator_HPCL,-0.001781,-0.001733,0.004246,-0.003800,0.005031,-0.000594,-0.002937,-0.002549,0.001807,0.009309,...,-0.102190,-0.101199,-0.102263,-0.102787,-0.102036,1.000000,-0.101551,-0.101571,-0.100597,-0.101365
station_operator_IOCL,-0.005965,-0.004562,0.003137,-0.003926,-0.012214,-0.003724,0.000811,0.004436,0.001215,-0.008887,...,-0.099547,-0.098581,-0.099617,-0.100128,-0.099397,-0.101551,1.000000,-0.098944,-0.097995,-0.098743
station_operator_Ola Electric,0.003391,0.004095,0.010260,-0.006807,0.011148,-0.001073,-0.001638,-0.000710,-0.001939,0.004502,...,-0.099567,-0.098601,-0.099638,-0.100148,-0.099417,-0.101571,-0.098944,1.000000,-0.098015,-0.098763
station_operator_Shell,0.007365,0.003918,-0.003533,0.002367,0.000505,-0.002902,0.005030,0.008128,0.003405,0.006901,...,-0.098612,-0.097655,-0.098682,-0.099187,-0.098463,-0.100597,-0.097995,-0.098015,1.000000,-0.097815


In [595]:
import pickle
pickle.dump(numeric_transformer,open(r"../Saving The Model Results/scaler_data_preprocessing_feature_eng.pkl",'wb'))  # save the scaler model