# Ebay Car Sales

Data was crawled from the german version of Ebay and comprises 50,000 data point of car ads. The aim is to clean the data and analyze the listings.


|Column name (index)| Description|
|---|---|
|dateCrawled (0)| When ad was first crawled|
|name (1)| Name of the car|
|seller (2)| Private or dealer|
|offerType (3)| Type of listing|
|price (4)| Car price on ad|
|abtest (5)| Whether listing included A/B test|
|vehicleType (6)| Vehicle Type|
|yearOfRegistration (7)| Year car was first registered|
|gearbox (8)| Transmission type|
|powerPS (9)| Power of car in PS|
|model (10)| Car model name|
|odometer (11)| Mileage (km)|
|monthOfRegistration (12)| Month in which car first registered|
|fuelType (13)| Type of fuel car uses|
|brand (14)| Brand of car|
|notRepairedDamage (15)| If car has non-repaired damage|
|dateCreated (16)| Date eBay listing was created|
|nrOfPictures (17)| Nmber of pictures in ad|
|postalCode (18)| Postal code for location of vehicle|
|lastSeenOnline (19)| When crawler saw ad last online|

In [111]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn  as sn

## Exploratory analysis

In [50]:
autos = pd.read_csv('C:/Users/User/Documents/data_sets/autos.csv', encoding='Latin-1') # utf-8 not adequate
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

In [51]:
autos.head()

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


In [52]:
autos.describe(include='all')

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-30 19:48:02,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


5 columns have missing data `vehicleType`, `gearbox`, `model`, `fuelType`, and `notRepairedDamage`. Car names are messy and inconsistant, the first two elements separated by an underscore seem most relevant. Of the numeric data `yearOfRegistration` and `powerPS` have  imposible minimum as well as maximum values; the `postalCode` column has a 4 digit minimum value which violates the rules for german postal codes (must be 5 digits long). Lastly, column names use camelcase rather than snakecase.

The description of the non-numeric data reveals that the columns `seller` and `offerType` have virtually only a single value with one exception. The most common `price` is 0$ which is impossible, furthermore, that data should be numeric and so should the `odometer` data.

## Data Cleaning

### Column name changes 

In [104]:
# Drop columns with mostly single value
autos.drop(['seller','offerType'], axis=1, inplace=True)

In [53]:
autos.columns

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [54]:
# Change column names according to change dictionary
change = {'dateCrawled':'date_crawled', 'abtest':'ab_test', 'vehicleType': 'vehicle_type', 'yearOfRegistration': 'registration_year',
         'gearbox':'gear_box', 'powerPS':'power_ps','monthOfRegistration': 'registration_month', 'fuelType':'fuel_type', 'notRepairedDamage': 'unrepaired_damage',
         'dateCreated':'ad_created','nrOfPictures':'nr_pictures','postalCode':'postal_code','lastSeen':'last_seen'}
autos.rename(columns=change, inplace=True)
autos.columns

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',
       'vehicle_type', 'registration_year', 'gear_box', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_pictures', 'postal_code',
       'last_seen'],
      dtype='object')

Column names were altered to facilitate interpretation and indexing.

 ### Change data types for price and odometer

In [55]:
autos['price'].value_counts()

$0          1421
$500         781
$1,500       734
$2,500       643
$1,000       639
            ... 
$40,400        1
$9,590         1
$250,000       1
$25,590        1
$2,495         1
Name: price, Length: 2357, dtype: int64

In [56]:
autos['odometer'].value_counts()

150,000km    32424
125,000km     5170
100,000km     2169
90,000km      1757
80,000km      1436
70,000km      1230
60,000km      1164
50,000km      1027
5,000km        967
40,000km       819
30,000km       789
20,000km       784
10,000km       264
Name: odometer, dtype: int64

For both columns, the metric symbols `($/km)` as well as the commas `(,)` need to be removed. 32 bit integer should be the most appropriate data type given the expected magnitude of prices and odometer readings.

In [57]:
print(type(autos['price'][0]), type(autos['odometer'][0]))

<class 'str'> <class 'str'>


It seems like the elements are registered as integers. To use the `str.replace()` funtion, we need to transform the values into strings.

In [58]:
#Convert to string
price_str = autos['price'].astype('str')
odometer_str = autos['odometer'].astype('str')

# Clean from string characters
price_clean = price_str.str.replace('$', '').str.replace(',','')
odometer_clean = odometer_str.str.replace('km','').str.replace(',','')

# transform to numeric
price = price_clean.astype('int32')
odometer = odometer_clean.astype('int32')

# Assign numeric colums to the data set
autos['price'] = price
autos['odometer'] = odometer

#Rename columns to indicate the respective metrics
autos.rename({'odometer':'odometer_km', 'price':'price_$'}, axis=1, inplace=True) 

#Verify results
autos[['price_$','odometer_km']].describe()

Unnamed: 0,price_$,odometer_km
count,50000.0,50000.0
mean,9840.044,125732.7
std,481104.4,40042.211706
min,0.0,5000.0
25%,1100.0,125000.0
50%,2950.0,150000.0
75%,7200.0,150000.0
max,100000000.0,150000.0


In [61]:
print('Shape of unique price data: ', autos['price_$'].unique().shape)
print('Shape of unique odometer data: ', autos['odometer_km'].unique().shape)

Shape of unique price data:  (2357,)
Shape of unique odometer data:  (13,)


In [92]:
#Frequency counts of odometer data
autos['odometer_km'].value_counts()

150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
5000        967
40000       819
30000       789
20000       784
10000       264
Name: odometer_km, dtype: int64

The values in the `odometer` data appear to be reasonable.

In [93]:
# Most and least frequent price data
print('Lowest price values and associated frequencies:')
print(autos['price_$'].value_counts().sort_index(ascending=True).head(15), '\n')
print('Highes price values and associated frequencies')
print(autos['price_$'].value_counts().sort_index(ascending=False).head(15))

Lowest price values and associated frequencies:
0     1421
1      156
2        3
3        1
5        2
8        1
9        1
10       7
11       2
12       3
13       2
14       1
15       2
17       3
18       1
Name: price_$, dtype: int64 

Highes price values and associated frequencies
99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
3890000     1
1300000     1
1234566     1
999999      2
999990      1
350000      1
345000      1
299000      1
295000      1
265000      1
Name: price_$, dtype: int64


For the high `price` values, there is a sharp decline from 999990 to 350000. Since 350000$ is a reasonable price for a luxurious used car this might be a good cut-off. The low prices are a bit more difficult to judge. There are many values that appear to be much to cheap to be appropriate. A more detailed inspection of the outliers should bring some clarity.

In [105]:
# Select columns of car ads with exceptionally low prices
below_50 = autos[autos['price_$']<50].sort_values('price_$', ascending=False) # ads with prices below 50$ sorted by descending prices
below_50.loc[:,['name','price_$','registration_year','offer_type','unrepaired_damage']] # select columns to judge accuracy of pricing

Unnamed: 0,name,price_$,registration_year,offer_type,unrepaired_damage
6936,Zu_Verkauf__4_reifen_von_vw_passat,49,1995,,
4164,Verkaufe_DESIGN_Streifen_/_Aufkleber_VW__Opel_...,49,5000,,
20541,Stossstange__neu__und_4_Alus_fuer_Polo_6N,49,1999,,
46890,Vw.Sharan__Stuhls,49,2002,,
46184,Bmw_e36_316i_Limousine_calypso_rot_metallic,47,1999,,ja
...,...,...,...,...,...
17541,Suche_VW_T5_Multivan,0,2005,,
17489,Suche_ein_Fahrzeug_alles_anbieten,0,2000,,
17445,RENAULT_Megane_Scenic,0,1999,,
17435,Seat_Leon_1M,0,2005,,


In [125]:
high_prices = autos[autos['price_$']>300000].sort_values('price_$', ascending=False)
high_prices.reindex(columns=['name','price_$','registration_year','offer_type','unrepaired_damage'])

Unnamed: 0,name,price_$,registration_year,offer_type,unrepaired_damage
39705,Tausch_gegen_gleichwertiges,99999999,1999,,
42221,Leasinguebernahme,27322222,2014,,
27371,Fiat_Punto,12345678,2017,,
39377,Tausche_volvo_v40_gegen_van,12345678,2018,,nein
47598,Opel_Vectra_B_1_6i_16V_Facelift_Tuning_Showcar...,12345678,2001,,nein
2897,Escort_MK_1_Hundeknochen_zum_umbauen_auf_RS_2000,11111111,1973,,nein
24384,Schlachte_Golf_3_gt_tdi,11111111,1995,,
11137,suche_maserati_3200_gt_Zustand_unwichtig_laufe...,10000000,1960,,nein
47634,Ferrari_FXX,3890000,2006,,nein
7814,Ferrari_F40,1300000,1992,,nein


The price of 350,000 seems reasonable as a cut-off since the Porsche 991 is indeed a very expensive vehicle. For the lower end price range, it remains difficult to define a reasonable cut-off. Arbitrarily will remove prices below 1000 for car ads. This is reasonable since the description indicates that less than 25% of the data fall below that value and 1000$ is a reasonable lower end price for a used car.

In [128]:
# Filter prices above 1000 and below 400000
price_bool = (autos['price_$'] > 1000) & (autos["price_$"] < 400000)
autos[price_bool].shape

(37987, 18)

### Date formatting

In [145]:
# Current formatting and data-type of 
for date in autos[['date_crawled','ad_created','last_seen','registration_year','registration_month']].iloc[0,:]:
    print(type(date))
autos[['date_crawled','ad_created','last_seen']][0:5]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'numpy.int64'>
<class 'numpy.int64'>


Unnamed: 0,date_crawled,ad_created,last_seen
0,2016-03-26 17:47:46,2016-03-26 00:00:00,2016-04-06 06:45:54
1,2016-04-04 13:38:56,2016-04-04 00:00:00,2016-04-06 14:45:08
2,2016-03-26 18:57:24,2016-03-26 00:00:00,2016-04-06 20:15:37
3,2016-03-12 16:58:10,2016-03-12 00:00:00,2016-03-15 03:16:28
4,2016-04-01 14:38:50,2016-04-01 00:00:00,2016-04-01 14:38:50


The format appears to be `%Y-%m-%d %H:%M:%S` and dates are currently encoded as strings. Registration year and month are numeric.

In [141]:
# Frequency percentages of dates
# Use .str[:10] to extract first 10 characters, corresponding to dates without time information 
crawled_freq = autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False)
created_freq = autos['ad_created'].str[:10].value_counts(normalize=True, dropna=False)
last_freq = autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False)

# Sort frequencies
crawled_freq_sorted = crawled_freq.sort_index(ascending=True)
created_freq_sorted = created_freq.sort_index(ascending=True)
last_freq_sorted =last_freq.sort_index(ascending=True)

# Print frequencies
print("Crawled dates: \n", crawled_freq_sorted[:10])
print("\nCreated dates: \n", created_freq_sorted[:10])
print("\nLast seen dates: \n", last_freq_sorted[:10])

Crawled dates: 
 2016-03-05    0.02538
2016-03-06    0.01394
2016-03-07    0.03596
2016-03-08    0.03330
2016-03-09    0.03322
2016-03-10    0.03212
2016-03-11    0.03248
2016-03-12    0.03678
2016-03-13    0.01556
2016-03-14    0.03662
Name: date_crawled, dtype: float64

Created dates: 
 2015-06-11    0.00002
2015-08-10    0.00002
2015-09-09    0.00002
2015-11-10    0.00002
2015-12-05    0.00002
2015-12-30    0.00002
2016-01-03    0.00002
2016-01-07    0.00002
2016-01-10    0.00004
2016-01-13    0.00002
Name: ad_created, dtype: float64

Last seen dates: 
 2016-03-05    0.00108
2016-03-06    0.00442
2016-03-07    0.00536
2016-03-08    0.00760
2016-03-09    0.00986
2016-03-10    0.01076
2016-03-11    0.01252
2016-03-12    0.02382
2016-03-13    0.00898
2016-03-14    0.01280
Name: last_seen, dtype: float64


Dates when ads where first created start earlier than those generated through crawling, which is expected. There is a 9 month difference between the first crawling and the oldest ad on ebay, which is a rather significant difference. As a result, the early dates of add creation are quite unique, which should be less the case for newer ads. 

In [142]:
autos['registration_year'].describe()

count    50000.000000
mean      2005.073280
std        105.712813
min       1000.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

Given that the data was crawled up until 2016, any registrations years above are false. To find a reasonable cut-off for the lower end, we will generate more detailed frequency counts.

In [146]:
# Frequency counts of earliest 10 registration years
autos['registration_year'].value_counts().sort_index(ascending=True)[:10]

1000    1
1001    1
1111    1
1500    1
1800    2
1910    9
1927    1
1929    1
1931    1
1934    2
Name: registration_year, dtype: int64

In [148]:
autos[(autos['registration_year'] >= 1910) & (autos['registration_year'] <= 1950)]

Unnamed: 0,date_crawled,name,price_$,ab_test,vehicle_type,registration_year,gear_box,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_pictures,postal_code,last_seen
1171,2016-03-29 17:53:03,Seat_Leon_Spielzeug_Auto,2,control,limousine,1950,automatik,5,leon,5000,0,diesel,seat,,2016-03-29 00:00:00,0,26919,2016-04-06 03:45:23
2221,2016-03-15 14:57:07,Sehr_seltener_Oldtimer_Opel_1210_zum_Restaurieren,3350,control,andere,1934,manuell,0,andere,5000,0,benzin,opel,ja,2016-03-15 00:00:00,0,49828,2016-04-06 06:17:51
2573,2016-03-19 22:51:25,Hanomag_rekord_15k_Suche_ersatz_teile,3000,test,andere,1934,,0,,90000,1,benzin,sonstige_autos,nein,2016-03-19 00:00:00,0,90489,2016-03-19 22:51:25
3679,2016-04-04 00:36:17,Suche_Auto,1,test,,1910,,0,,5000,0,,sonstige_autos,,2016-04-04 00:00:00,0,40239,2016-04-04 07:49:15
11047,2016-03-08 20:50:10,Andere_Simca_5_Fourgonette_Kombilimousine,17500,control,kombi,1948,manuell,0,,60000,6,benzin,sonstige_autos,nein,2016-03-08 00:00:00,0,47546,2016-04-05 21:15:42
11246,2016-03-26 19:49:59,Ford_Model_A_Roadster_Deluxe_1931,27500,control,cabrio,1931,manuell,39,andere,10000,7,benzin,ford,nein,2016-03-26 00:00:00,0,9322,2016-04-06 09:46:59
11585,2016-03-11 21:48:36,Volkswagen__VW_Typ_82,41900,test,cabrio,1943,,0,andere,100000,7,,volkswagen,ja,2016-03-11 00:00:00,0,84174,2016-03-21 13:18:05
13963,2016-03-20 17:51:49,Mercedes_Benz_L1500S_Wehrmacht_/_Luftwaffe___F...,26900,test,andere,1941,manuell,60,andere,60000,7,benzin,mercedes_benz,nein,2016-03-20 00:00:00,0,38723,2016-04-07 01:17:51
14020,2016-03-19 11:52:47,Oldtimeraufloesung,10000,test,coupe,1950,manuell,130,andere,5000,1,benzin,alfa_romeo,nein,2016-03-19 00:00:00,0,34128,2016-04-06 14:46:35
15898,2016-03-08 10:50:05,Tausch_alles_aus_meinen_Anzeigen_gegen_Auto,0,test,,1910,,0,,5000,0,,sonstige_autos,,2016-03-08 00:00:00,0,6108,2016-03-08 17:47:19


Form the name column we can see that ads from 1910 are clearly erroneous, whereas the years above or equal to 1927 appear to be valid classic car sales.

In [149]:
# Data cleaned from cars registred before 1927 and after 2016
autos = autos[(autos['registration_year'] >= 1927) & (autos['registration_year'] <= 2016)]

## Data Aggregation on Car Brands

In [151]:
# Unique brands in the brand column 
unique_brands = autos['brand'].unique()
unique_brands

array(['peugeot', 'bmw', 'volkswagen', 'smart', 'ford', 'chrysler',
       'seat', 'renault', 'mercedes_benz', 'audi', 'sonstige_autos',
       'opel', 'mazda', 'porsche', 'mini', 'toyota', 'dacia', 'nissan',
       'jeep', 'saab', 'volvo', 'mitsubishi', 'jaguar', 'fiat', 'skoda',
       'subaru', 'kia', 'citroen', 'chevrolet', 'hyundai', 'honda',
       'daewoo', 'suzuki', 'trabant', 'land_rover', 'alfa_romeo', 'lada',
       'rover', 'daihatsu', 'lancia'], dtype=object)

In [155]:
# Brand percent frequency counts
brand_freq = autos['brand'].value_counts(normalize=True)
brand_freq

volkswagen        0.212166
bmw               0.110040
opel              0.108145
mercedes_benz     0.095379
audi              0.086403
ford              0.069806
renault           0.047335
peugeot           0.029530
fiat              0.025865
seat              0.018180
skoda             0.016035
mazda             0.015140
nissan            0.015098
citroen           0.013932
smart             0.013911
toyota            0.012474
sonstige_autos    0.010850
hyundai           0.009850
volvo             0.009246
mini              0.008642
mitsubishi        0.008143
honda             0.007851
kia               0.007101
alfa_romeo        0.006622
porsche           0.006102
suzuki            0.005914
chevrolet         0.005706
chrysler          0.003665
dacia             0.002561
daihatsu          0.002561
jeep              0.002249
subaru            0.002187
land_rover        0.002041
saab              0.001604
jaguar            0.001583
trabant           0.001541
daewoo            0.001499
r

We will aggregate brands with frequencies of at least 1% of total brand counts.

In [179]:
brand_list = ['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault', 'peugeot', 'fiat', 'seat', 'skoda', 'mazda', 
              'nissan', 'citroen', 'smart', 'toyota', 'sonstige_autos']

brand_median = {} # Brands and aggregate median prices
brand_mean = {} # Brands and aggregate mean prices
brand_mean_mileage = {} # Brands and aggregate mean mileage

# For selected brands calulate median price and save in brand_aggregate dictionary
for brand in brand_list:
    tmp_brand = autos[autos['brand'] == brand] # Currnet brand slice of autos in loop
    
    # Median
    brand_median[brand] = tmp_brand.loc[:,'price_$'].median()
    brand_median[brand] = round(brand_median[brand])
    
    # Mean
    brand_mean[brand] = tmp_brand.loc[:,'price_$'].mean()
    brand_mean[brand] = round(brand_mean[brand])
    
    # Mean mileage
    brand_mean_mileage[brand] = tmp_brand.loc[:,'odometer_km'].mean()
    brand_mean_mileage[brand] = round(brand_mean_mileage[brand])
    
# Iterate through tuples of items and sort according to median price (index 1) in descending order
sorted(brand_median.items(), key=lambda x: x[1], reverse=True) 

[('audi', 5999),
 ('bmw', 5500),
 ('sonstige_autos', 5200),
 ('mercedes_benz', 4999),
 ('skoda', 4990),
 ('toyota', 4000),
 ('smart', 2924),
 ('volkswagen', 2890),
 ('seat', 2600),
 ('citroen', 2600),
 ('mazda', 2500),
 ('nissan', 2300),
 ('peugeot', 2000),
 ('ford', 1700),
 ('opel', 1500),
 ('fiat', 1500),
 ('renault', 1300)]

There is a substantial difference in median prices between the top 6 brands (>4000$) and the remaining (<3000$). Audi and bmw are the highest valued brands wherase opel, fiat and renault are the lowest.

In [190]:
# New data frame object with mean, median and mean mileage aggregate data

# Transform dictionaries into series
median = pd.Series(brand_median)
mean = pd.Series(brand_mean)
mileage = pd.Series(brand_mean_mileage)

# Merge series into data frame
aggregates = pd.DataFrame(data=median, columns=['median_price'])
aggregates['mean_price'] = mean
aggregates['mean_mileage'] = mileage

aggregates

Unnamed: 0,median_price,mean_price,mean_mileage
volkswagen,2890,6516,128730
bmw,5500,8335,132435
opel,1500,5255,129229
mercedes_benz,4999,30317,130860
audi,5999,9094,129288
ford,1700,7263,124047
renault,1300,2396,128238
peugeot,2000,3039,127137
fiat,1500,2712,116554
seat,2600,4296,121564


volkswagen        128730
bmw               132435
opel              129229
mercedes_benz     130860
audi              129288
ford              124047
renault           128238
peugeot           127137
fiat              116554
seat              121564
skoda             110955
mazda             124746
nissan            118572
citroen           119462
smart              99596
toyota            115710
sonstige_autos     88052
dtype: int64
