In [2]:
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pickle

### Import functions unique to this project

In [3]:
from master_functions import get_car_urls
from master_functions import make_model_df

## Bentley - Create master dataframe for all Bentley models

This worksheet is intended to build a datafame for all **Bentley** cars.  There are some 'quirks' in how the data is structured from its source on https://www.fueleconomy.gov/, so more manual steps are taken below to check files for issues, combine first what is 'normal' and then add in those that required special attention.

**Step #1:** Create unique urls for every Bentley model for the years 1984 - 2021<br>
- Uses `get_car_urls` from master function, Inputs: (car_make, [list of all models])

In [4]:
bentley_urls = get_car_urls('Bentley',
                            ['Arnage','Azure','Bentayga','Brooklands',
                             'Continental','Continental Flying Spur',
                             'Continental GT Convertible',
                             'Continental GT Supersports',
                             'Continental Superports Convertible',
                             'Flying Spur','Mulsanne','Turbo RT'
                            ])

-

**Step #2:** Get length of list created in Step 1.  This number will be how many times you run the function in Step 3 to check all of the urls<br>

In [5]:
# Verify number of urls and use this number
# to know how many urls need to 'check below'

len(bentley_urls)

12

-

**Step #3:** Check all of the urls you just created.<br>
- If works, add to `bentley_okay_modelranks` list in Step 4
- If does not work, add to 'problem' URLs string in Step 4

In [22]:
# Test area for each url with audi_urls[index]
# by seeing if data appears correctly

make_model_df('Bentley',bentley_urls[11])

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,1999,Bentley,Turbo RT,6.8,8,Automatic,4,Premium Gasoline,741,12
1,1998,Bentley,Turbo RT,6.7,8,Automatic,4,Premium Gasoline,741,12


-

**Step #4:** From Step 3 above you should have populated this section so URLs either are in the category of "problem" or "normal"

In [23]:
#'Problem' URLs
'''
bentley_urls[2]
bentley_urls[5]
bentley_urls[6]
bentley_urls[9]

'''

#'Normal' URLs

audi_okay_modelranks = [0,1,3,4,7,8,
                        10,11
                       ]

-

**Step #5:** Create dfs for all 'okay' urls and place each into a master list
- Automate where possible, but some may need to be added one by one to avoid 'problem' urls

In [25]:
bentley_0 = make_model_df('Bentley',bentley_urls[0])
bentley_1 = make_model_df('Bentley',bentley_urls[1])
bentley_3 = make_model_df('Bentley',bentley_urls[3])
bentley_4 = make_model_df('Bentley',bentley_urls[4])
bentley_7 = make_model_df('Bentley',bentley_urls[7])
bentley_8 = make_model_df('Bentley',bentley_urls[8])
bentley_10 = make_model_df('Bentley',bentley_urls[10])
bentley_11 = make_model_df('Bentley',bentley_urls[11])



In [26]:
bentley_dfs = [bentley_0,
               bentley_1,
               bentley_3,
               bentley_4,
               bentley_7,
               bentley_8,
               bentley_10,
               bentley_11
              ]

len(bentley_dfs)

8

-

**Step #6:** Concatenate all of the 'normal' Bentley model dfs into one master dataframe

In [27]:
bentley_dfs = pd.concat(bentley_dfs, ignore_index=True)

bentley_dfs

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,1999.0,Bentley,Arnage,4.4,8.0,Automatic,5,Premium Gasoline,684.0,13.0
1,2000.0,Bentley,Arnage,4.4,8.0,Automatic,5,Premium Gasoline,741.0,12.0
2,2001.0,Bentley,Arnage,6.8,8.0,Automatic,4,Premium Gasoline,741.0,12.0
3,2000.0,Bentley,Arnage,6.7,8.0,Automatic,4,Premium Gasoline,741.0,12.0
4,2008.0,Bentley,Arnage,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0
5,2007.0,Bentley,Arnage,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0
6,2009.0,Bentley,Arnage,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0
7,2007.0,Bentley,Arnage LWB,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0
8,2009.0,Bentley,Arnage RL,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0
9,2008.0,Bentley,Arnage RL,6.7,8.0,Automatic,S6,Premium Gasoline,808.0,11.0


-

**Step #7:** Pickle the dataframe made in Step 6 of all Audi models with 'normal' dataframes
- Will now be saved so further work on dataframe can start at this place

In [28]:
with open('pickles/bentley_dfs.pickle', 'wb') as to_write:
    pickle.dump(bentley_dfs, to_write)

-

**Step #8:** Un-pickle the dataframe made in Step 7 of all Audi models with 'normal' dataframes

In [29]:
with open('pickles/bentley_dfs.pickle','rb') as read_file:
    bentley_dfs = pickle.load(read_file)