## (E) Extract - Raw Data from sources

This notebook has been used to scrap data from website

Web Scraping:
* Electric Vehicle Statistics and Trends as of 2019

Source: https://www.findthebestcarprice.com/electric-vehicle-statistics-trends/


In [10]:
# Import Request, BeautifulSoup and Pymongo
import requests
from bs4 import BeautifulSoup as soup
import pymongo
import time


In [11]:
# visit the url site
url = 'https://www.findthebestcarprice.com/electric-vehicle-statistics-trends/'
reqs = requests.get(url)
time.sleep(1)


In [12]:
# Convert the browser html to a soup object
ev_soup = soup(reqs.text, 'lxml')
print(ev_soup)

aboxplugin-icon-color .sab-github{border-color:#264874}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-google{border-color:#0b51c5}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-googleplus{border-color:#96271a}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-html5{border-color:#902e13}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-instagram{border-color:#1630aa}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-linkedin{border-color:#00344f}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-pinterest{border-color:#5b040e}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-reddit{border-color:#992900}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-rss{border-color:#a43b0a}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-sharethis{border-color:#5d8420}.saboxplugin-socials.sabox-colored .saboxplugin-icon-color .sab-skype{border-color:#00658a}.saboxplugin-so

In [13]:
# Scrap list of stats
ev_stat_soup = ev_soup.find(class_='siteorigin-widget-tinymce textwidget').find_all('li')
ev_stat_soup

[<li><a href="#The_Current_State_of_EVs">The Current State of EVs</a></li>,
 <li><a href="#EV_Car_Buying_Consumer_Habits_Trends">EV Car Buying Consumer Habits &amp; Trends</a></li>,
 <li><a href="#EV_Demographics">EV Demographics</a></li>,
 <li><a href="#The_Future_of_EVs">The Future of EVs</a></li>,
 <li><a href="#Overall_EV_Trends_and_Analysis">Overall EV Trends and Analysis</a></li>,
 <li><a href="#Best_Car_Deals_by_Category">Best Car Deals by Category</a></li>,
 <li><a href="#Frequently_Asked_Questions">Frequently Asked Questions</a></li>,
 <li>EV costs are falling quickly: battery pack prices in 2019 were an average of $156/kWh, which is down from $1,100/kWh in 2010. 1 kWh = ~4 miles of range. (IEA)<img alt="ev costs falling" class="aligncenter size-full wp-image-19211" height="209" loading="lazy" sizes="(max-width: 491px) 100vw, 491px" src="https://www.findthebestcarprice.com/wp-content/uploads/ev-costs-falling.jpg" srcset="https://www.findthebestcarprice.com/wp-content/uploads/e

# Transform (T) - Clean the data

In [14]:
# Get relevant random stats from list
ev_stat_soup = ev_stat_soup[7:]
ev_random_stats = []
for stat in ev_stat_soup: ev_random_stats.append(stat.text)
ev_random_stats

['EV costs are falling quickly: battery pack prices in 2019 were an average of $156/kWh, which is down from $1,100/kWh in 2010. 1 kWh = ~4 miles of range. (IEA)',
 'The average pack size for a BEV/PHEV increased, up to 44 kWh in 2019 from 37 kWh in 2018. Battery electric vehicles in most countries are now in the 50-70 kWh range – or 200-280 miles of range per full charge. (IEA)',
 'The national average of EV charging is $0.15 per kWh (which is $9 to fill up a Chevy Bolt EV – 250 miles). The low is $0.08 per kWh and the high is $0.27 per kWh. (NREL)',
 'The lifetime fueling cost (15 years) for EVs is between $3,000 to $10,500 lower than the costs of a traditional gasoline-powered vehicle. (NREL)',
 'The significant growth of EVs leading up to 2030 will present major opportunities and challenges for traditional original equipment manufacturers (OEMs), new-entrant OEMs, captive finance companies, and dealerships. (Deloitte)',
 'After an encouraging start to 2019, falling fuel prices in th

# Load the data into MongoDB (L)

In [15]:
#Connection string 
conn = 'mongodb://localhost:27017'#local host
#Create pymongo object instance of connection to the new client
client = pymongo.MongoClient(conn)

# Define the 'electric_vehicles' database in Mongo
db = client.electric_vehicles
# Define new collections to load cleaned data into
ev_random_stats_coll = db.ev_random_stats

In [16]:
#Convert list to dict for MongoDb upload
ev_random_stats_dict = {}
for index, stat in enumerate(ev_random_stats): #enumerate exposes the index
   ev_random_stats_dict[str(index)] = stat #key must be of type string to before being loaded in MongoDB
ev_random_stats_dict

{'0': 'EV costs are falling quickly: battery pack prices in 2019 were an average of $156/kWh, which is down from $1,100/kWh in 2010. 1 kWh = ~4 miles of range. (IEA)',
 '1': 'The average pack size for a BEV/PHEV increased, up to 44 kWh in 2019 from 37 kWh in 2018. Battery electric vehicles in most countries are now in the 50-70 kWh range – or 200-280 miles of range per full charge. (IEA)',
 '2': 'The national average of EV charging is $0.15 per kWh (which is $9 to fill up a Chevy Bolt EV – 250 miles). The low is $0.08 per kWh and the high is $0.27 per kWh. (NREL)',
 '3': 'The lifetime fueling cost (15 years) for EVs is between $3,000 to $10,500 lower than the costs of a traditional gasoline-powered vehicle. (NREL)',
 '4': 'The significant growth of EVs leading up to 2030 will present major opportunities and challenges for traditional original equipment manufacturers (OEMs), new-entrant OEMs, captive finance companies, and dealerships. (Deloitte)',
 '5': 'After an encouraging start to 2

In [17]:
#Clear collection of existing documents
ev_random_stats_coll.delete_many({})
# Insert new documents in empty collection 
ev_random_stats_coll.insert_many([ev_random_stats_dict]) # must be list of dicts

<pymongo.results.InsertManyResult at 0x1d4ec69cc00>

range(0, 10)
