In [2]:
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pickle

### Import functions unique to this project

In [3]:
from master_functions import get_car_urls
from master_functions import make_model_df

## BUICK - Create master dataframe for all Buick models

This worksheet is intended to build a datafame for all **Buick** cars.  There are some 'quirks' in how the data is structured from its source on https://www.fueleconomy.gov/, so more manual steps are taken below to check files for issues, combine first what is 'normal' and then add in those that required special attention.

**Step #1:** Create unique urls for every Buick model for the years 1984 - 2021<br>
- Uses `get_car_urls` from master function, Inputs: (car_make, [list of all models])

In [4]:
buick_urls = get_car_urls('Buick',
                          ['Cascada','Century','Century Estate Wagon',
                           'Century Wagon','Coachbuilder Wagon',
                           'Electra/Park Avenue','Enclave','Encore',
                           'Envision','Estate Wagon','Funeral Coach/Hearse',
                           'LaCrosse','Lacrosse/Allure','LaSabre',
                           'LaSabre/Electra Wagon','Lucerne','Park Avenue',
                           'Rainier','Reatta','Regal','Regal/Century',
                           'Rendezvous','Rivera','Roadmaster','Skyhawk',
                           'Skylark','Somerset Regal','Somerset/Skylark',
                           'Terraza','Verano'
                          ])

-

**Step #2:** Get length of list created in Step 1.  This number will be how many times you run the function in Step 3 to check all of the urls<br>

In [5]:
# Verify number of urls and use this number
# to know how many urls need to 'check below'

len(buick_urls)

30

-

**Step #3:** Check all of the urls you just created.<br>
- If works, add to `buick_okay_modelranks` list in Step 4
- If does not work, add to 'problem' URLs string in Step 4

In [40]:
# Test area for each url with audi_urls[index]
# by seeing if data appears correctly

make_model_df('Buick',buick_urls[0])

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2019,Buick,Cascada,1.6,4,Automatic,S6,Premium Gasoline,373,24
1,2018,Buick,Cascada,1.6,4,Automatic,S6,Premium Gasoline,373,24
2,2017,Buick,Cascada,1.6,4,Automatic,S6,Premium Gasoline,395,23
3,2016,Buick,Cascada,1.6,4,Automatic,S6,Regular Gasoline,394,23


-

**Step #4:** From Step 3 above you should have populated this section so URLs either are in the category of "problem" or "normal"

In [6]:
#'Problem' URLs
'''
buick_urls[15]
buick_urls[28]
buick_urls[29]
'''

#'Normal' URLs

buick_okay_modelranks = [0,1,2,3,4,5,6,
                         7,8,9,10,11,12,
                         13,14,16,17,
                         18,19,20,21,22,
                         23,24,25,26,27,
                        ]

-

**Step #5:** Create dfs for all 'okay' urls and place each into a master list
- Automate where possible, but some may need to be added one by one to avoid 'problem' urls

In [42]:
#'normal' urls to make a df and add to master df list, automate it!

buick_dfs = []

for x in range(0,15):
    buick_dfs.append(make_model_df('Buick',buick_urls[x]))
    
for x in range(16,28):
    buick_dfs.append(make_model_df('Buick',buick_urls[x]))


-

**Step #6:** Concatenate all of the 'normal' Buick model dfs into one master dataframe

In [44]:
buick_dfs = pd.concat(buick_dfs, ignore_index=True)

buick_dfs

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2019.0,Buick,Cascada,1.6,4.0,Automatic,S6,Premium Gasoline,373.0,24.0
1,2018.0,Buick,Cascada,1.6,4.0,Automatic,S6,Premium Gasoline,373.0,24.0
2,2017.0,Buick,Cascada,1.6,4.0,Automatic,S6,Premium Gasoline,395.0,23.0
3,2016.0,Buick,Cascada,1.6,4.0,Automatic,S6,Regular Gasoline,394.0,23.0
4,1985.0,Buick,Century,4.3,6.0,Automatic,3,Diesel,377.0,27.0
...,...,...,...,...,...,...,...,...,...,...
182,1986.0,Buick,Somerset/Skylark,2.5,4.0,Automatic,3,Regular Gasoline,386.0,23.0
183,1987.0,Buick,Somerset/Skylark,2.5,4.0,Automatic,3,Regular Gasoline,386.0,23.0
184,1987.0,Buick,Somerset/Skylark,2.5,4.0,Manual,5,Regular Gasoline,386.0,23.0
185,1987.0,Buick,Somerset/Skylark,3.0,6.0,Automatic,3,Regular Gasoline,444.0,20.0


-

**Step #7:** Pickle the dataframe made in Step 6 of all Audi models with 'normal' dataframes
- Will now be saved so further work on dataframe can start at this place

In [45]:
with open('pickles/buick_dfs.pickle', 'wb') as to_write:
    pickle.dump(buick_dfs, to_write)

-

**Step #8:** Un-pickle the dataframe made in Step 7 of all Audi models with 'normal' dataframes

In [46]:
with open('pickles/buick_dfs.pickle','rb') as read_file:
    buick_dfs = pickle.load(read_file)