# **GLOBAL ELECTRIC VEHICLE SALES FROM 2010 - 2024**

## Objectives

* The objective of this project is to analyse global Electric Vehicle sale trends between 2010 and 2024. 
* The analysis is to gain insights into which regions and contries of the world are leading in EV sales.
* Understand the trends and interpret sales trends between 2010 and 2024 and the projections to 2035.
* Gain insights from the data about the distribution of EV adoptions in the regions of the world
* Analyse different correlations between different variations.

The objectives above will be achieved by the following activities below; (ETL, EDA & Visualisation)
- Fetch data from Kaggle and save it in the Inputs folder.
- Preprocess the data.
- Perform Exploratory Data Analysis (EDA) for data distribution.
- Clean the data using different methods.
- Create various charts and graphs to gain insights to the data.
- Create and understand correlations to different variables of the data set.
- Generate visualisations of different variables using different libraries in Python.
- Use correlation heat maps to determine correlations to prove trends on variables.
- Use Power Bi to gain further data insights and Correlations.



## Inputs

* Data set:Global-Electric-Vehicle-sales-trends-from-2010-2024\Input\IEA Global EV Data 2024.csv
* Libraries to be used:NumPy, Pandas, Matplotlib, Seaborn, Plotly.
* Input variables/features and target variable from dataset.

## Outputs

* Cleaned Dataset: Ready for use or exported to Power Bi dashboarding.
* Exploratory Data Analysis (EDA):
* Graphs and Charts
* Correlations
* Data insights
* Evaluations


## Additional Comments

* If you have any additional comments that don't fit in the previous bullets, please state them here. 



---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\nmnko\\Documents\\vscode-projects\\Global-Electric-Vehicle-sales-trends-from-2010-2024\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [5]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\nmnko\\Documents\\vscode-projects\\Global-Electric-Vehicle-sales-trends-from-2010-2024'

# Section 1: Data Extraction, Transformation & Loading (ETL)

Section 1 : Loading of the libraries that will be used.
* Pandas -  To load a CSV file with Python, we will use a library called Pandas as It has a specific function that reads CSV files by parsing the file path.
* NumPy - For processing data in arrays
* Matplotlib -plotting Charts and graphs
* Seaborn - plotting Charts and graphs
* Plotly -  plotting Charts and graphs



In [3]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
import plotly.express as px

Data Extraction

* Loading and reading of the CSV file data set and extracting it to a data frame using the following function: pd.read_csv("csv_file_example.csv")

In [4]:
df = pd.read_csv ("input\\iea-global-ev-data-2024.csv")

Data set Generic information checking using the .info() method or function.
* column names 
* datatypes of columns
* number of entries and the memory space used through

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12654 entries, 0 to 12653
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   region      12654 non-null  object 
 1   category    12654 non-null  object 
 2   parameter   12654 non-null  object 
 3   mode        12654 non-null  object 
 4   powertrain  12654 non-null  object 
 5   year        12654 non-null  int64  
 6   unit        12654 non-null  object 
 7   value       12654 non-null  float64
dtypes: float64(1), int64(1), object(6)
memory usage: 791.0+ KB


Checking the top 5 rows and the number of columns of the data set to have an overview of the data frame using the .head() and .tail() methods 

Number of rows and columns

In [6]:
df.shape

(12654, 8)

In [7]:


df.head()
df.tail()


Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
12649,World,Projection-STEPS,EV sales share,Cars,EV,2035,percent,55.0
12650,World,Projection-STEPS,EV stock share,Cars,EV,2035,percent,31.0
12651,World,Projection-APS,EV charging points,EV,Publicly available fast,2035,charging points,9400000.0
12652,World,Projection-APS,EV charging points,EV,Publicly available slow,2035,charging points,15000000.0
12653,World,Projection-STEPS,EV stock share,Trucks,EV,2035,percent,9.0


---

Checking for data types of the columns

In [8]:
df.dtypes

region         object
category       object
parameter      object
mode           object
powertrain     object
year            int64
unit           object
value         float64
dtype: object

Checking if there is any missing data in the data set using the .isnull().sum() function

In [9]:
df.isnull().sum()

region        0
category      0
parameter     0
mode          0
powertrain    0
year          0
unit          0
value         0
dtype: int64

Check if there is any duplicated values in the data set

In [13]:
duplicate_check = df.duplicated().any()
print('Duplicate entries found:', duplicate_check,'.')

Duplicate entries found: False .


Fill any missing values in the columns with zeros 0

In [14]:
df.fillna(0)

Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
0,Australia,Historical,EV stock share,Cars,EV,2011,percent,3.900000e-04
1,Australia,Historical,EV sales share,Cars,EV,2011,percent,6.500000e-03
2,Australia,Historical,EV sales,Cars,BEV,2011,Vehicles,4.900000e+01
3,Australia,Historical,EV stock,Cars,BEV,2011,Vehicles,4.900000e+01
4,Australia,Historical,EV stock,Cars,BEV,2012,Vehicles,2.200000e+02
...,...,...,...,...,...,...,...,...
12649,World,Projection-STEPS,EV sales share,Cars,EV,2035,percent,5.500000e+01
12650,World,Projection-STEPS,EV stock share,Cars,EV,2035,percent,3.100000e+01
12651,World,Projection-APS,EV charging points,EV,Publicly available fast,2035,charging points,9.400000e+06
12652,World,Projection-APS,EV charging points,EV,Publicly available slow,2035,charging points,1.500000e+07


Counting the unique values for the different types of powertrains to understand the breakdown of the EV powertrain numbers

In [10]:
df['powertrain'].value_counts()

powertrain
EV                         4894
BEV                        3204
PHEV                       2126
FCEV                       1512
Publicly available slow     463
Publicly available fast     455
Name: count, dtype: int64

Perform Statistical tests on the dataset to generate discptive summary of statistics using the .describe() method

In [15]:
df.describe()

Unnamed: 0,year,value
count,12654.0,12654.0
mean,2019.822112,427374.2
std,5.476494,6860498.0
min,2010.0,1.2e-06
25%,2016.0,2.0
50%,2020.0,130.0
75%,2022.0,5500.0
max,2035.0,440000000.0


Place or save clean dataset in its location folder as a csv using the df.to_csv method

* Dataframe converted to csv after cleaning and saved in a seperate folder from in the directory

In [10]:
df.to_csv('cleaned-file\\cleaned.csv', index = False)

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
