# Explanation of Script 
The dataset "Electric_Vehicle_Population_Size_History_By_County.csv" has been investigated and later cleaned.

The cleaned file can be found in the *data/out* path.

**Cleaning/Transformation operations**

**(1.** Filtering data:
- State = Washington (WA)

**(2.** Dropping columns

**(3.** Changing dtype of column:
- 'Date'

**(4.** Sorting data

# Load packages

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("data/in/Electric_Vehicle_Population_Size_History_By_County.csv")

# (1 Investigate data

In [3]:
df.columns

Index(['Date', 'County', 'State', 'Vehicle Primary Use',
       'Battery Electric Vehicles (BEVs)',
       'Plug-In Hybrid Electric Vehicles (PHEVs)',
       'Electric Vehicle (EV) Total', 'Non-Electric Vehicle Total',
       'Total Vehicles', 'Percent Electric Vehicles'],
      dtype='object')

In [4]:
pd.DatetimeIndex(df['Date']).year.value_counts()

2021    2908
2020    2787
2019    2666
2022    2600
2018    2426
2017    2216
Name: Date, dtype: int64

Period ranging from 2017 to 2021

#### Vehicle Primary use

In [5]:
df['Vehicle Primary Use'].value_counts()

Passenger    12824
Truck         2779
Name: Vehicle Primary Use, dtype: int64

Dataset contains only Truck and Passenger values

#### Example of filtering by 'County' column - Note: no data is removed by this operation, while its not using the equal sign

In [6]:
df.query('County == "Yakima"')

Unnamed: 0,Date,County,State,Vehicle Primary Use,Battery Electric Vehicles (BEVs),Plug-In Hybrid Electric Vehicles (PHEVs),Electric Vehicle (EV) Total,Non-Electric Vehicle Total,Total Vehicles,Percent Electric Vehicles
187,January 31 2017,Yakima,WA,Truck,0,0,0,57338,57338,0.00
188,January 31 2017,Yakima,WA,Passenger,36,49,85,159951,160036,0.05
373,February 28 2017,Yakima,WA,Passenger,36,50,86,159147,159233,0.05
374,February 28 2017,Yakima,WA,Truck,0,0,0,57224,57224,0.00
557,March 31 2017,Yakima,WA,Passenger,36,50,86,159246,159332,0.05
...,...,...,...,...,...,...,...,...,...,...
15160,September 30 2022,Yakima,WA,Truck,3,0,3,60782,60785,0.00
15379,October 31 2022,Yakima,WA,Passenger,385,210,595,163697,164292,0.36
15380,October 31 2022,Yakima,WA,Truck,5,0,5,60109,60114,0.01
15599,November 30 2022,Yakima,WA,Passenger,399,208,607,161264,161871,0.37



# (2 Cleaning Data

#### (2.1 Filtering data

In [7]:
# Limiting data to Washington state.
df = df.query("State == 'WA'")

#### (2.2 Dropping columns

In [8]:
drop = ['Vehicle Primary Use', 'Non-Electric Vehicle Total', 'Total Vehicles', 'Percent Electric Vehicles']
df.drop(drop, axis='columns', inplace=True)

#### (2.3 Changing dtype 

In [9]:
# Changing the 'Date' column into dtype: datetime64[ns].
df['Date'] = pd.to_datetime(df['Date'])

#### (2.4 Sorting data

In [10]:
# Sorting data by 'Date' column - Ascending.
df = df.sort_values('Date')

In [11]:
df['Date']

111     2017-01-31
167     2017-01-31
166     2017-01-31
165     2017-01-31
164     2017-01-31
           ...    
15545   2022-11-30
15544   2022-11-30
15543   2022-11-30
15550   2022-11-30
15600   2022-11-30
Name: Date, Length: 5538, dtype: datetime64[ns]

In [12]:
df.head(5)

Unnamed: 0,Date,County,State,Battery Electric Vehicles (BEVs),Plug-In Hybrid Electric Vehicles (PHEVs),Electric Vehicle (EV) Total
111,2017-01-31,Adams,WA,0,0,0
167,2017-01-31,Skagit,WA,1,0,1
166,2017-01-31,San Juan,WA,84,31,115
165,2017-01-31,San Juan,WA,0,0,0
164,2017-01-31,Pierce,WA,822,750,1572


# Exporting cleaned file as a .csv file

In [13]:
df.to_csv('data/out/populationHistory.csv', index=False)