# Global Electric Vehicle Usage from 2010 to 2023

## Background
_Research Question: Which countries and continents have the highest usage of Electric Vehicles and has there been an increase in uptake in recent years (2010 – 2023)?_
The increasing focus on Climate Change over the past two decades has led to a rise in seeking environmentally friendly alternatives. One of the biggest areas of focus was on automobiles due to the high dependency on these for private and public travel and the high level of pollution caused as a result of this. As such there has been a larger appetite for Electric Vehicles (EV) as a solution. However, has there been a definitive increase in the use of this in recent years, or has the uptake been slow? Are some areas more enthusiastic in actualising this solution than others and is there the infrastructure to support this?

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from scipy.stats import pearsonr

In [3]:
dtf = pd.read_csv("../data/IEA Global EV Data 2024.csv")

## Dataset and Exploratory Data Analysis
After importing the dataset, call the head of the data to get a small snapshot and an understanding of the column names and their types with dtypes. The isnull will ensure there are no missing values, and summarises the data. There are 12654 rows and 8 columns. 

In [4]:
dtf.head()

Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
0,Australia,Historical,EV stock share,Cars,EV,2011,percent,0.00039
1,Australia,Historical,EV sales share,Cars,EV,2011,percent,0.0065
2,Australia,Historical,EV sales,Cars,BEV,2011,Vehicles,49.0
3,Australia,Historical,EV stock,Cars,BEV,2011,Vehicles,49.0
4,Australia,Historical,EV stock,Cars,BEV,2012,Vehicles,220.0


In [None]:
print(dtf.dtypes)
print(dtf.shape)

In [None]:
print(dtf.isnull())
print(dtf.info())

In [None]:
unique_region = dtf['region'].unique()
print(unique_region)
print(len(unique_region))

In [None]:
print(dtf['year'].unique())
print(dtf['unit'].unique())
print(dtf['category'].unique())

The analysis I want to run is based on actual data. As the category type shows the dataset contains projection data also, this will need to be dropped. The Unit also shows additional variables than Vehicles such as charging points, so again the data will be filtered to show only vehicles. This means the vehicle variable will show actual data based around electric vehicles, compromising of 5078 rows of data. This is then plotted to show the different types electric vehicles and their %s within the dataset.

In [None]:
historical = dtf[dtf['category'] == 'Historical']
print(historical.shape)

In [None]:
vehicle = historical[historical['unit'] == 'Vehicles']
print(vehicle.shape)

In [None]:
modes = vehicle['mode'].value_counts()
plt.figure(figsize=(7, 7))
plt.pie(modes,
        startangle = 90,
        labels = modes.index,
        autopct = '%1.2f%%',
        wedgeprops = {'width': 0.5},
        colors = sns.color_palette("pastel"))
plt.title('Electric Vehicle Transport Types')
plt.axis('equal')
plt.show()

In [None]:
vehicle_groups = vehicle.groupby('region')['value'].sum()
country_ev = pd.DataFrame(vehicle_groups).reset_index()
country_ev.columns = ['country', 'total value']
country_ev = country_ev[country_ev['country'] != 'World']
print (country_ev)

In [None]:
sns.set(style = "whitegrid")
plt.figure(figsize = (15, 8))
sns.barplot(country_ev, x = 'country', y = 'total value', hue = 'country', palette = 'magma')
plt.title('Total Vehicle Sales by Country 2010-2023')
plt.xlabel('Country')
plt.xticks(rotation = 90)
plt.ylabel('Total Sales 2010 - 2023 (log)')
plt.yscale('log')
plt.show()

In [None]:
total_sum = country_ev['total value'].sum()
country_ev['Percentage'] = (country_ev['total value'] / total_sum) * 100

sns.set(style = "whitegrid")
plt.figure(figsize = (15, 8))
sns.barplot(country_ev, y = 'Percentage', x = 'country', hue = 'country', palette = 'magma')
plt.title('Total Vehicle  % Sales by Country 2010-2023')
plt.xlabel('Country')
plt.xticks(rotation = 90)
plt.ylabel('Total Sales 2010 - 2023 %')
plt.show()

In [None]:
top_countries = country_ev.sort_values('total value', ascending = False).head(10)
top_countries = pd.DataFrame(top_countries).reset_index(drop=True)
top_countries.index +=1
print(top_countries)

In [None]:
sns.set(style = 'whitegrid')
plt.figure(figsize = (15, 8))
sns.barplot(top_countries, x = 'country', y = 'total value', hue = 'country', palette = 'magma')
plt.title('Top 10 Countries - Total Sales 2010-2023')
plt.xlabel('Country')
plt.xticks(rotation = 90)
plt.ylabel('Total Sales')
plt.yscale('log')
plt.show()

In [None]:
world = vehicle[vehicle['region'] == 'World']
print(world.shape)

In [None]:
sns.set(style = 'darkgrid')
plt.figure(figsize = (14, 8))
sns.lineplot(data = world,
             x = 'year',
             y = 'value',
             color = 'purple',
             marker = 'x')
plt.title('Global EV Sales 2010-2023')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()

In [None]:
charging = historical[historical['unit'] == 'charging points']
print(charging.shape)
print(charging['region'].unique())

In [None]:
charge_groups = charging.groupby('region')['value'].sum()
country_char = pd.DataFrame(charge_groups).reset_index()
country_char.columns = ['country', 'total value']
country_char = country_char[country_char['country'] != 'World']
print (country_char)

In [None]:
top_charge = country_char.sort_values('total value', ascending = False).head(10)
top_charge = pd.DataFrame(top_charge).reset_index(drop=True)
top_charge.index+=1
print(top_charge)

In [None]:
sns.set(style = 'darkgrid')
plt.figure(figsize = (14, 8))
sns.lineplot(data = top_charge,
             x = 'country',
             y = 'total value',
             label = 'Charging Points',
             color = 'green',
             marker = 'o')
sns.lineplot(data = top_countries,
             x = 'country',
             y = 'total value',
             label = 'Vehicles',
             color = 'purple',
             marker = 'o')
plt.title('Charging Points & Vehicle Total Sales 2010 - 2023')
plt.xlabel('Country')
plt.ylabel('Sales/NO. of charging points')
plt.legend(loc='upper right')
plt.annotate('Korea not in Top 10 Vehicles',
             xy = ('Korea', 1.5e7),
             xytext = ('USA', 3e7),
             arrowprops=dict(color = 'purple', arrowstyle = '->'))
plt.show()

In [None]:
world_char = charging[charging['region'] == 'World']
print(world_char.shape)

In [None]:
sns.set(style = 'darkgrid')
plt.figure(figsize = (14, 8))
sns.lineplot(data = world_char,
             x = 'year',
             y = 'value',
             color = 'green',
             marker = 'o')
plt.title('Global EV Charging Points 2010-2023')
plt.xlabel('Year')
plt.ylabel('Charging Points Sales')
plt.show()

In [None]:
sns.set(style = 'darkgrid')
plt.figure(figsize = (14, 8))
sns.lineplot(data = world_char,
             x = 'year',
             y = 'value',
             label = 'Charging Point',
             color = 'green',
             marker = 'o')
sns.lineplot(data = world,
             x = 'year',
             y = 'value',
             label = 'Vehicles',
             color = 'purple',
             marker = 'o')
plt.title('Global EV Charging Points and Vehicle 2010-2023')
plt.xlabel('Year')
plt.ylabel('Unit of Sales')
plt.legend(loc = 'upper left')
plt.show()

In [None]:
world_merge = world.groupby('year')['value'].sum()
world_set = pd.DataFrame(world_merge).reset_index()
world_set.columns = ['year', 'value']
print(world_set)

In [None]:
char_merge = world_char.groupby('year')['value'].sum()
char_set = pd.DataFrame(char_merge).reset_index()
char_set.columns = ['year', 'value']
print(char_set)

In [None]:
world_rename = world_set.rename(columns={'value': 'vehicle'})
char_rename = char_set.rename(columns={'value': 'charge'})
merged_ev = pd.merge(world_rename, char_rename, on= 'year', how= 'inner')
print(merged_ev)

In [None]:
corr_coef, p_value = pearsonr(merged_ev['vehicle'], merged_ev['charge'])

print("Correlation coefficient ", corr_coef)
print("P-value ", p_value)

In [None]:
world_log = np.log(world['value'])
world_log_stat = world_log.diff().dropna()

In [None]:
answer = adfuller(world_log_stat)
print('ADF:', answer[0],
      'p-value:', answer[1])