## Author: Aditya Sundar
## Email: aditya.1094@gmail.com

## <font color='blue'>Electric vehicles (EVs) are predicted to be an integral part of modern, sustainable economies.  Replacing combustion engines with batteries reduces our carbon footprint and improves air quality. However, the current costs associated with researching & manufacturing batteries makes EVs more expensive. As such, it is crucial for the buyer to be informed about the pros and cons of purchasing EVs. The project is intended to help vehicle buyers make informed selections of EVs, by providing </font> <font color='red'> 1) interactive dashboards to explore the performance metrics of commercial EVs </font><font color='blue'>and<font> <font color='red'>2) models to estimate vehicle costs.</font>

### Note: There is a heading before each cell or group of cells. Headings in black font denote code for data retrieval and cleaning. Headings in <font color='red'>red</font> font denote visualization codes that can be run directly using processed data.

In [2]:
import pandas as pd 
import warnings
warnings.filterwarnings("ignore")
import time
import numpy as np
import requests
from bs4 import BeautifulSoup
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
import unicodedata
import re
import seaborn as sns
import inspect
import matplotlib.pyplot as plt

# Web scraping to get data for electric vehicles (EV) sales in the USA
Data includes various types of electric vehicles, manufactured by several automobile companies, showing purchasing trends from 2000 to 2022.

In [279]:
url = 'https://www.atlasevhub.com/materials/state-ev-registration-data/#data'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')

links=[link.get('href') for link in soup.find_all('a') ]
data=[i for i in links if len(i.split('/'))>1 and i.split('/')[1]=='public']
data=[i for i in data if i.split('.')[-1]=='csv']
state_name=[i.split('/')[3].split('_')[0].upper() for i in data]

print(state_name)

sn=3
dt=[]
for i in [data[sn]]:
    dt.append(pd.read_csv('https://www.atlasevhub.com/'+i))

['CA', 'CO', 'CT', 'FL', 'MT', 'MI', 'MN', 'NJ', 'NY', 'OR', 'TN', 'TX', 'VT', 'VA', 'WA', 'WI']


## Data cleaning

In [280]:
dt[0].head()
dt[0]=dt[0].dropna(axis=1)
dt[0].rename({'Registration Valid Date':'Date'},axis=1, inplace=True)
dt[0]['Make'] = dt[0]['Vehicle Name'].str.split().str[0]
dt[0].head()

Unnamed: 0,DMV ID,DMV Snapshot (Date),County,Vehicle Name,Date,Make
0,1,Registration Data from FPL (6/30/2018),Dade,Tesla Model X,6/30/2018,Tesla
1,1,Registration Data from FPL (6/30/2018),Dade,Tesla Model X,6/30/2018,Tesla
2,1,Registration Data from FPL (6/30/2018),Dade,Tesla Model X,6/30/2018,Tesla
3,1,Registration Data from FPL (6/30/2018),Dade,Tesla Model X,6/30/2018,Tesla
4,1,Registration Data from FPL (6/30/2018),Dade,Tesla Model X,6/30/2018,Tesla


## Save data as json file

In [281]:
dt[0].to_json(state_name[sn]+'_EVdata.json',orient='records')

In [284]:
df=pd.read_json(state_name[sn]+'_EVdata.json', orient='records')

## Class to patch year and obtain cumulative sales from 2000-2022

In [285]:
class patch_year:
    
    start=2000
    end=2021
    
    def __init__(self,state,data):
        self.state=state
        self.data=data
            
    def start_year(self):
        return min(self.data['Date'].groupby(pd.to_datetime(self.data['Date']).dt.year).count().index)
    
    def end_year(self):
        return max(self.data['Date'].groupby(pd.to_datetime(self.data['Date']).dt.year).count().index)
    
    def cumulative_sales(self):
        cum_sum=self.data.groupby(pd.to_datetime(self.data['Date']).dt.year).count()['Date'].cumsum()
        year_data=cum_sales.index
        cum_sales[self.start]=0
        for y in range(self.start+1,self.end+1,1):
            if y not in year_data:
                cum_sales[y]=cum_sales[y-1]
        return cum_sales.sort_index()

p = patch_year(state_name[sn],df)