In [1]:
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import pickle

### Import functions unique to this project

In [2]:
from master_functions import get_car_urls
from master_functions import make_model_df

## CHRYSLER - Create master dataframe for all Chrysler models

This worksheet is intended to build a datafame for all **chrysler** cars.  There are some 'quirks' in how the data is structured from its source on https://www.fueleconomy.gov/, so more manual steps are taken below to check files for issues, combine first what is 'normal' and then add in those that required special attention.

**Step #1:** Create unique urls for every Chrysler model for the years 1984 - 2021<br>
- Uses `get_car_urls` from master function, Inputs: (car_make, [list of all models])

In [3]:
chrysler_urls = get_car_urls('Chrysler',
                             ['200','300','300 SRT8','Aspen',
                              'Cirrus','Concorde','Concorde',
                              'Concorde/LHS','Conquest',
                              'Crossfire','E Class/New Yorker',
                              'Executive Sedan/Limousine',
                              'Fifth Avenue/Impreial',
                              'Imperial/New Yorker Fifth Avenue',
                              'JX/JXI/Limited Convertible','Laser',
                              'Laser/Daytona','LeBaron','LHS',
                              'Limousine','New Yorker',
                              'New Yorker Fifth Avenue/Imperial',
                              'New Yorker/5th Avenue',
                              'New Yorker/LHS','Newport/Fifth Avenue',
                              'Pacifica','Prowler','PT Crusier',
                              'QC Car','Sebring','TC',
                              'Town and Country','Voyager',
                              'Voyager/Town and Country'
                             ])

-

**Step #2:** Get length of list created in Step 1.  This number will be how many times you run the function in Step 3 to check all of the urls<br>

In [5]:
# Verify number of urls and use this number
# to know how many urls need to 'check below'

len(chrysler_urls)

34

-

**Step #3:** Check all of the urls you just created.<br>
- If works, add to `chrysler_okay_modelranks` list in Step 4
- If does not work, add to 'problem' URLs string in Step 4

In [44]:
# Test area for each url with audi_urls[index]
# by seeing if data appears correctly

make_model_df('Chrysler',chrysler_urls[30])

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,1989,Chrysler,TC By,2.2,4,Manual,5,Premium Gasoline,468,19
1,1990,Chrysler,TC By Convertible,2.2,4,Manual,5,Premium Gasoline,494,18
2,1991,Chrysler,TC by Maserati,3.0,6,Automatic,4,Premium Gasoline,494,18
3,1989,Chrysler,TC By,2.2,4,Automatic,3,Premium Gasoline,494,18
4,1990,Chrysler,TC By Convertible,3.0,6,Automatic,4,Premium Gasoline,494,18
5,1989,Chrysler,TC By,2.5,4,Automatic,3,Premium Gasoline,523,17
6,1989,Chrysler,TC By,2.5,4,Automatic,3,Premium Gasoline,523,17


-

**Step #4:** From Step 3 above you should have populated this section so URLs either are in the category of "problem" or "normal"

In [45]:
#'Problem' URLs
'''
chrysler_urls[0]
chrysler_urls[1]
chrysler_urls[3]
chrysler_urls[25]
chrysler_urls[31]
chrysler_urls[32]
chrysler_urls[33]
'''

#'Normal' URLs

chevrolet_okay_modelranks = [2,4,5,6,7,8,
                             9,10,11,12,13,
                             14,15,16,17,18,
                             19,20,21,22,23,
                             24,26,27,28,29,
                             30
                            ]
                             

-

**Step #5:** Create dfs for all 'okay' urls and place each into a master list
- Automate where possible, but some may need to be added one by one to avoid 'problem' urls

In [46]:
# for 'normal' urls to make a df and add to master df list, automate it!

chrysler_dfs = []

chrysler_dfs.append(make_model_df('Chrysler',chrysler_urls[2]))

for x in range(4,25):
    chrysler_dfs.append(make_model_df('Chrysler',chrysler_urls[x]))
    
for x in range(26,30):
    chrysler_dfs.append(make_model_df('Chrysler',chrysler_urls[x]))


-

**Step #6:** Concatenate all of the 'normal' Chrysler model dfs into one master dataframe

In [47]:
chrysler_dfs = pd.concat(chrysler_dfs, ignore_index=True)

chrysler_dfs

Unnamed: 0,year,make,model,capacity_liters,cylinders,transmission,trans_speed,fuel_type,gg_emissions,mpg
0,2010.0,Chrysler,300/SRT-8,2.7,6.0,Automatic,4,Regular Gasoline,423.0,21.0
1,2009.0,Chrysler,300/SRT-8,2.7,6.0,Automatic,4,Regular Gasoline,423.0,21.0
2,2008.0,Chrysler,300/SRT-8,2.7,6.0,Automatic,4,Regular Gasoline,423.0,21.0
3,2007.0,Chrysler,300/SRT-8,2.7,6.0,Automatic,4,Regular Gasoline,423.0,21.0
4,2005.0,Chrysler,300C/SRT-8,2.7,6.0,Automatic,4,Regular Gasoline,423.0,21.0
...,...,...,...,...,...,...,...,...,...,...
136,2005.0,Chrysler,Sebring,2.4,4.0,Manual,5,Regular Gasoline,370.0,24.0
137,2004.0,Chrysler,Sebring,2.4,4.0,Manual,5,Regular Gasoline,370.0,24.0
138,2003.0,Chrysler,Sebring,2.4,4.0,Manual,5,Regular Gasoline,370.0,24.0
139,2010.0,Chrysler,Sebring Convertible,2.4,4.0,Automatic,4,Regular Gasoline,386.0,23.0


-

**Step #7:** Pickle the dataframe made in Step 6 of all Chrysler models with 'normal' dataframes
- Will now be saved so further work on dataframe can start at this place

In [50]:
with open('pickles/chrysler_dfs.pickle', 'wb') as to_write:
    pickle.dump(chrysler_dfs, to_write)

-

**Step #8:** Un-pickle the dataframe made in Step 7 of all Chrysler models with 'normal' dataframes

In [51]:
with open('pickles/chrysler_dfs.pickle','rb') as read_file:
    chrysler_dfs = pickle.load(read_file)