ELECTRIC VEHICLES DATASET

About this Dataset:     
"This dataset contains information from 3,395 high resolution electric vehicle charging sessions. The data contains sessions from 85 EV drivers with repeat usage at 105 stations across 25 sites at a workplace charging program. The workplace locations include facilities such as research and innovation centers, manufacturing, testing facilities and office headquarters for a firm participating in the U.S. Department of Energy (DOE) workplace charging challenge. The data is in a human and machine readable *.CSV format."

Using this dataset one will try to make an inference on the average time it takes a electric vehicle to charge, How much it cost, How much kilowatts on average is a vehicle charge per session. The dataset could be obtain on Kaggle. 
The link is : https://www.kaggle.com/datasets/michaelbryantds/electric-vehicle-charging-dataset

In [131]:
import pandas as pd

Before running our second line of code. It is neccesary for us to know the path of the csv file. It is more convenient if the csv file is on the same folder as our jupyter notebook. if it is not on the same folder as the jupyter notebook then make sure you trace the file path correctly for it to work properly.

In [None]:
df = pd.read_csv("station_data_dataverse.csv")

This print is just to make sure that everything is working correctly.

In [None]:
print(df)

To view a preview of the dataset instead of printing the whole dataset one could simply use the function head().

In [None]:
print(df.head())

As we could see everything is working properly but there seem to be some formatting error with the dates. Lets fix that.

In [None]:
for i, value in enumerate(df["created"]):
    if value.startswith("00"):
        df.loc[i:, "created"] = value.lstrip("00")

In [None]:
for i, value in enumerate(df["ended"]):
    if value.startswith("00"):
        df.loc[i:, "ended"] = value.lstrip("00")

In [None]:
print(df.head())

The dates have been fixed. Its time to start making some assumptions with about the data and then running some code to see if our assumptions are correct. 
Let's start by making an inference on which platform appears more often. 

In [None]:
platform_users = df["platform"].value_counts()
print(platform_users)

The limitation of this data is that we cannot get the actual ios to android users because this data only comes from 85 electric vehicles. Meaning that the number of ios users vs android users would in reality be less than 100. We could try to use these numbers to guess the percentage of users out of those 85 total users.

In [None]:
ios = 2234
android = 1155
web = 6
total = ios + android + web

In [None]:
percentage_ios = 2234 / total * 100
users_ios = percentage_ios / 100 * 85

In [None]:
print(f"The percentage of ios users in this dataset was approximately %{round(percentage_ios, 2)} This equals to about 55 ios users.")

In [None]:
percentage_android = 1155 / total * 100
users_android =  percentage_android / 100 * 85

In [None]:
print(f"The percentage of android users in this dataset was approximately %{round(percentage_android, 2)} This equals to about {round(users_android)} ios users.")

In [None]:
percentage_web = web / total * 100
web_users =  percentage_web / 100 * 85

In [None]:
print(f"The percentage of web users in this dataset was approximately %{round(percentage_web, 2)} This equals to about 1 ios users.")

Now lets try to see on average how much time did each user spent charging their car.

Now that we know that information lets see which day of the week did users prefer to charge their car.

In [None]:
average_time = df["chargeTimeHrs"].mean()
print(average_time)


We could also compare the average time of just ios or andriod to see if the charge time changes base on platform

In [None]:
time_android = df[["chargeTimeHrs"]] [df["platform"] == "android"]
time_android_avg = time_android.mean()


Lets do them same for ios

In [None]:
time_ios = df[["chargeTimeHrs"]] [df["platform"] == "ios"]
time_ios_avg = time_ios.mean()

In [None]:
print("Averahe charge time of android users is: ", time_android_avg )
print("Average Charge time of ios user is:", time_ios_avg)
print(average_time)

There is no real significant difference between these two numbers but we could still plot them to see if we see anything interesting.

In [None]:
import matplotlib.pyplot as plt

In [None]:
platform = df["platform"]

charge_time = df.loc[:,"chargeTimeHrs"].values.reshape(-1, 1)

In [None]:
plt.scatter(platform, charge_time)
plt.title("Total Charge Time Per Platform")
plt.xlabel("Platform")
plt.ylabel("charge_time")
plt.show()

as you could see. We have an outlier in the ios system lets try to find the session id link to that outlier.

In [None]:
outlier = df[["sessionId"]] [df["chargeTimeHrs"] > 15]

print(outlier)

In [132]:
outlier_info = df[df["sessionId"] == 2162299]

print(outlier_info)

     sessionId  kwhTotal  dollars            created              ended  \
173    2162299       4.1     0.83  14-11-18 15:40:26  14-11-18 17:11:04   

     startTime  endTime  chargeTimeHrs weekday platform  ...  managerVehicle  \
173         18        1      55.238056     Mon      ios  ...               1   

     facilityType  Mon  Tues  Wed  Thurs  Fri  Sat  Sun  reportedZip  
173             4    1     0    0      0    0    0    0            0  

[1 rows x 24 columns]
