In [1]:
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pickle

### Import functions unique to this project

In [2]:
from master_functions import get_car_urls
from master_functions import make_model_df

## CADILLAC - Create master dataframe for all Cadillac models

This worksheet is intended to build a datafame for all **Cadillac** cars.  There are some 'quirks' in how the data is structured from its source on https://www.fueleconomy.gov/, so more manual steps are taken below to check files for issues, combine first what is 'normal' and then add in those that required special attention.

**Step #1:** Create unique urls for every Cadillac model for the years 1984 - 2021<br>
- Uses `get_car_urls` from master function, Inputs: (car_make, [list of all models])

In [3]:
cadillac_urls = get_car_urls('Cadillac',
                             ['Allante','Armored Deville','Armored DTS',
                              'ATS','ATS-V','Brougham','Brougham/DeVille',
                              'Catera','Cimarron','Commercial Chassis',
                              'CT4','CT5','CT6','CTS','CTS-V','DeVille',
                              'DeVille/60 Special','DeVille/Concourse',
                              'DTS','Eldorado','ELR','Escalade','Fleetwood',
                              'Fleetwood Brougham','Fleetwood/DeVille',
                              'Funeral Coach/Hearse','Limousine','Seville',
                              'SRX','STS','STS-V','XLR','XLR-V','XT4','XT5',
                              'XT6','XTS','XTS Hearse','XTS Limo'
                             ])

-

**Step #2:** Get length of list created in Step 1.  This number will be how many times you run the function in Step 3 to check all of the urls<br>

In [4]:
# Verify number of urls and use this number
# to know how many urls need to 'check below'

len(cadillac_urls)

39

-

**Step #3:** Check all of the urls you just created.<br>
- If works, add to `cadillac_okay_modelranks` list in Step 4
- If does not work, add to 'problem' URLs string in Step 4

In [45]:
# Test area for each url with audi_urls[index]
# by seeing if data appears correctly

make_model_df('Cadillac',cadillac_urls[38])

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2019,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,521,17
1,2018,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,525,17
2,2017,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,525,17
3,2016,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,524,17
4,2015,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,527,17
5,2014,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,519,17
6,2013,Cadillac,XTS Limo,3.6,6,Automatic,S6,Regular Gasoline,524,17


-

**Step #4:** From Step 3 above you should have populated this section so URLs either are in the category of "problem" or "normal"

In [6]:
#'Problem' URLs
'''
cadillac_urls[12]
cadillac_urls[20]
cadillac_urls[28]
'''

#'Normal' URLs

cadillac_okay_modelranks = [0,1,2,3,4,6,7,8,9,10,
                            11,12,13,14,15,16,17,
                            18,19,20,21,22,23,24,
                            25,26,27,29,30,31,32,
                            33,34,35,36,37,38
                           ]

-

**Step #5:** Create dfs for all 'okay' urls and place each into a master list
- Automate where possible, but some may need to be added one by one to avoid 'problem' urls

In [55]:
# for 'normal' urls to make a df and add to master df list, automate it!

cadillac_dfs = []

for x in range(0,12):
    cadillac_dfs.append(make_model_df('Cadillac',cadillac_urls[x]))
    
for x in range(13,20):
    cadillac_dfs.append(make_model_df('Cadillac',cadillac_urls[x]))

for x in range(21,28):
    cadillac_dfs.append(make_model_df('Cadillac',cadillac_urls[x]))

for x in range(29,38):
    cadillac_dfs.append(make_model_df('Cadillac',cadillac_urls[x]))

-

**Step #6:** Concatenate all of the 'normal' Audi model dfs into one master dataframe

In [56]:
cadillac_dfs = pd.concat(cadillac_dfs, ignore_index=True)

cadillac_dfs

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,1988.0,Cadillac,Allante,4.1,8.0,Automatic,4,Premium Gasoline,523.0,17.0
1,1987.0,Cadillac,Allante,4.1,8.0,Automatic,4,Premium Gasoline,523.0,17.0
2,1989.0,Cadillac,Allante,4.5,8.0,Automatic,4,Premium Gasoline,555.0,16.0
3,1992.0,Cadillac,Allante,4.5,8.0,Automatic,4,Premium Gasoline,555.0,16.0
4,1991.0,Cadillac,Allante,4.5,8.0,Automatic,4,Premium Gasoline,555.0,16.0
...,...,...,...,...,...,...,...,...,...,...
224,2017.0,Cadillac,XTS Hearse,3.6,6.0,Automatic,S6,Regular Gasoline,525.0,17.0
225,2016.0,Cadillac,XTS Hearse,3.6,6.0,Automatic,S6,Regular Gasoline,524.0,17.0
226,2014.0,Cadillac,XTS Hearse,3.6,6.0,Automatic,S6,Regular Gasoline,526.0,17.0
227,2013.0,Cadillac,XTS Hearse,3.6,6.0,Automatic,S6,Regular Gasoline,524.0,17.0


-

**Step #7:** Pickle the dataframe made in Step 6 of all Audi models with 'normal' dataframes
- Will now be saved so further work on dataframe can start at this place

In [58]:
with open('pickles/cadillac_dfs.pickle', 'wb') as to_write:
    pickle.dump(cadillac_dfs, to_write)

-

**Step #8:** Un-pickle the dataframe made in Step 7 of all Audi models with 'normal' dataframes

In [59]:
with open('pickles/cadillac_dfs.pickle','rb') as read_file:
    cadillac_dfs = pickle.load(read_file)