In [1]:
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pickle

### Import functions unique to this project

In [2]:
from master_functions import get_car_urls
from master_functions import make_model_df

## JAGUAR - Create master dataframe for all models

This worksheet is intended to build a datafame for all **Jaguar** cars.  There are some 'quirks' in how the data is structured from its source on https://www.fueleconomy.gov/, so more manual steps are taken below to check files for issues, combine first what is 'normal' and then add in those that required special attention.

**Step #1:** Create unique urls for every car model for the years 1984 - 2021<br>
- Uses `get_car_urls` from master function, Inputs: (car_make, [list of all models])

In [3]:
jaguar_urls = get_car_urls('Jaguar',
                           ['E-Pace','F-Pace','F-Type',
                            'I-Pace','S-Type','Super V8',
                            'Vanden Plas','Vdp','X-Type',
                            'XE','XF','XJ','XJL','XJR',
                            'XJRS','XJS','XK','XK8','XKR'
                           ])

-

**Step #2:** Get length of list created in Step 1.  This number will be how many times you run the function in Step 3 to check all of the urls<br>

In [4]:
# Verify number of urls and use this number
# to know how many urls need to 'check below'

len(jaguar_urls)

19

-

**Step #3:** Check all of the urls you just created.<br>
- If does not work, add to 'problem' URLs string below this cell

In [25]:
# Test area for each url with [carmake]_urls[index]
# by seeing if data appears correctly

make_model_df('Jaguar',jaguar_urls[18])

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2009,Jaguar,XKR,4.2,8,Automatic,6,Premium Gasoline,494,18
1,2008,Jaguar,XKR,4.2,8,Automatic,6,Premium Gasoline,494,18
2,2007,Jaguar,XKR,4.2,8,Automatic,6,Premium Gasoline,494,18
3,2009,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,494,18
4,2008,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,494,18
5,2007,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,494,18
6,2006,Jaguar,XKR,4.2,8,Automatic,6,Premium Gasoline,523,17
7,2006,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,523,17
8,2002,Jaguar,XKR,4.0,8,Automatic,5,Premium Gasoline,523,17
9,2001,Jaguar,XKR,4.0,8,Automatic,5,Premium Gasoline,523,17


-

Populate this section so if there are any 'problem' URLs from your test above

In [26]:
#'Problem' URLs
'''
jaguar_urls[3]
jaguar_urls[11]
jaguar_urls[12]
'''

#Print list length again to 
#set length of range in next cell
len(jaguar_urls)

19

-

**Step #4:** Create dfs for all 'okay' urls and place each into a master list
- Automate where possible, but some may need to be added one by one to avoid 'problem' urls

In [27]:
# for 'normal' urls to make a df and add to master df list, automate it!

jaguar_dfs = []

for x in range(0,3):
    jaguar_dfs.append(make_model_df('Jaguar',jaguar_urls[x]))

for x in range(4,11):
    jaguar_dfs.append(make_model_df('Jaguar',jaguar_urls[x]))
    
for x in range(13,19):
    jaguar_dfs.append(make_model_df('Jaguar',jaguar_urls[x]))


-

**Step #5:** Concatenate all of the 'normal' car model dfs into one master dataframe

In [28]:
jaguar_dfs = pd.concat(jaguar_dfs, ignore_index=True)

jaguar_dfs

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2020,Jaguar,E-Pace,2.0,4,Automatic,S9,Premium Gasoline,371,24
1,2018,Jaguar,E-Pace,2.0,4,Automatic,S9,Premium Gasoline,395,24
2,2019,Jaguar,E-Pace P250,2.0,4,Automatic,S9,Premium Gasoline,395,24
3,2018,Jaguar,E-Pace (296 Hp),2.0,4,Automatic,S9,Premium Gasoline,379,23
4,2020,Jaguar,E-Pace P300,2.0,4,Automatic,S9,Premium Gasoline,379,23
...,...,...,...,...,...,...,...,...,...,...
136,2007,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,494,18
137,2006,Jaguar,XKR,4.2,8,Automatic,6,Premium Gasoline,523,17
138,2006,Jaguar,XKR Convertible,4.2,8,Automatic,6,Premium Gasoline,523,17
139,2002,Jaguar,XKR,4.0,8,Automatic,5,Premium Gasoline,523,17


-

**Step #6:** Pickle the dataframe made in Step 6 of all car's models with 'normal' dataframes
- Will now be saved so further work on dataframe can start at this place

In [29]:
with open('pickles/jaguar_dfs.pickle', 'wb') as to_write:
    pickle.dump(jaguar_dfs, to_write)

-

**Step #7:** Un-pickle the dataframe made in Step 7 of all car's models with 'normal' dataframes

In [30]:
with open('pickles/jaguar_dfs.pickle','rb') as read_file:
    jaguar_dfs = pickle.load(read_file)