# Exploring eBay Car Sales Data

This project is part of the [Dataquest.io](https://dataquest.io) guided project. The goal of the project is to clean and analyze the data of used car listings.

The dataset was originally scraped and uploaded to Kaggle by user orgesleka. Original dataset was deleted and now is available on [data.world](https://data.world/data-society/used-cars-data).

The training dataset was modified by Dataquest, with following changes:
- Sample of 50,000 data points from the full dataset, to ensure your code runs quickly in our hosted environment
- Dirtied the dataset a bit to more closely resemble what you would expect from a scraped dataset (the version uploaded to Kaggle was cleaned to be easier to work with)


**Data Dictionary**

|**Field**|**Comments**|
|:------|:-----|
|`dateCrawled`|When this ad was first crawled. All field-values are taken from this date.|
|`name`|Name of the car.|
|`seller`|Whether the seller is private or a dealer.|
|`offerType`|The type of listing.|
|`price`|The price on the ad to sell the car.|
|`abtest`|Whether the listing is included in an A/B test.|
|`vehicleType`|The vehicle Type.|
|`yearOfRegistration`|The year in which the car was first registered.|
|`gearbox`|The transmission type.|
|`powerPS`|The power of the car in PS.|
|`model`|The car model name.|
|`kilometer`|How many kilometers the car has driven.|
|`monthOfRegistration`|The month in which the car was first registered.|
|`fuelType`|What type of fuel the car uses.|
|`brand`|The brand of the car.|
|`notRepairedDamage`|If the car has a damage which is not yet repaired.|
|`dateCreated`|The date on which the eBay listing was created.|
|`nrOfPictures`|The number of pictures in the ad.|
|`postalCode`|The postal code for the location of the vehicle.|
|`lastSeenOnline`|When the crawler saw this ad last online.|

In [1]:
# importing pandas and numpy
import pandas as pd
import numpy as np

In [2]:
# reading the `autos.csv` file into pandas
autos = pd.read_csv("autos.csv", encoding="Latin-1")
autos.info()
autos.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  date

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


The dataset has 20 columns, with 50,000 rows. Majority of the data are stored as `object` type, some are integers. The header row is not united, having some typos and is using camel case. In some colums values are missing.

## 1. Cleaning the dataset

As there are many discrepancies in the dataset we'll begin with cleaning the dataset. 

### 1.2 Header row
For the header row I do following changes:
- swtich from cammel case to snake case
- change the column names where appropriate to shorten

In [3]:
autos.columns

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [4]:
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
'vehicle_type', 'registration_year', 'gearbox', 'powerPS', 'model',
'odometer', 'registration_month', 'fuel_type', 'brand',
'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
'last_seen']

In [5]:
autos.columns

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'powerPS', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')

### 1.3 Cleaning the values

In [6]:
# determining what values can be altered
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,powerPS,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-23 19:38:20,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


There are columns `seler` and `offer_type` that mostly have one value. For `seler` it is value `privat` and `offer_type` has value `angebot`. Since there is almost no variety in the dataset we can delete those columns. 

The column `nr_of_pictures` is showing NaN values, I will investigate it further.

There are columns `price`, `odometer`, that contains numeric data stored as strings. I will convert those to numeric type.

#### 1.3.1 Investigating `nr_of_pictures` column

In [7]:
autos["nr_of_pictures"].value_counts()

0    50000
Name: nr_of_pictures, dtype: int64

#### 1.3.2 Dropping columns
Since the `nr_of_pictures` contains only zero values I will drop it together the `seller` and `offer_type` columns.

In [8]:
#df.drop(columns=['B', 'C'])

autos.drop(columns=["nr_of_pictures","seller", "offer_type"])

Unnamed: 0,date_crawled,name,price,abtest,vehicle_type,registration_year,gearbox,powerPS,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,39218,2016-04-01 14:38:50
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,2016-03-27 14:38:19,Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon,"$24,900",control,limousine,2011,automatik,239,q5,"100,000km",1,diesel,audi,nein,2016-03-27 00:00:00,82131,2016-04-01 13:47:40
49996,2016-03-28 10:50:25,Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+...,"$1,980",control,cabrio,1996,manuell,75,astra,"150,000km",5,benzin,opel,nein,2016-03-28 00:00:00,44807,2016-04-02 14:18:02
49997,2016-04-02 14:44:48,Fiat_500_C_1.2_Dualogic_Lounge,"$13,200",test,cabrio,2014,automatik,69,500,"5,000km",11,benzin,fiat,nein,2016-04-02 00:00:00,73430,2016-04-04 11:47:27
49998,2016-03-08 19:25:42,Audi_A3_2.0_TDI_Sportback_Ambition,"$22,900",control,kombi,2013,manuell,150,a3,"40,000km",11,diesel,audi,nein,2016-03-08 00:00:00,35683,2016-04-05 16:45:07


#### 1.3.3 Converting strings to numeric values
Converting the `price` column to float and `odometer` to integer.

In [9]:
# removing the $ sign from the price column
autos["price"] = autos["price"].str.replace("$","")

# removing the comma
autos["price"] = autos["price"].str.replace(",","")

# converting price to float
autos["price"] = autos["price"].astype(float)

# printing descriptive stats on the price column
autos["price"].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

In [10]:
# removing the km sign and comma from the price column
autos["odometer"] = (autos["odometer"]
                            .str.replace("km","")
                            .str.replace(",","")
                            .astype(int)
                    )
            
# renaming the odometer to odometer_km
autos.rename(columns={"odometer":"odometer_km"},inplace=True)

# printing descriptive stats on the price column
autos["odometer_km"].describe()

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64

## 1.4 Exploring the data
### 1.4.1 Price and Odometer

In [11]:
print("Unique values", autos["odometer_km"].unique().shape)
print("Count unique values", autos["odometer_km"].value_counts().sort_index(ascending=False))

Unique values (13,)
Count unique values 150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
40000       819
30000       789
20000       784
10000       264
5000        967
Name: odometer_km, dtype: int64


In the dataset there is 13 unique distance values measured by the odometer. It is highly unlikely that all sold card on the german eBay would have almost the same km's traveled. People are probably rounding the values or the system logic is set to select the closest value. **Most cars have high mileage** as even the mean value is 125,732 km.

In [12]:
print("Unique values:", autos["price"].unique().shape)
print("\n")
print("20 cheapest car offers with counts")
print(autos["price"].value_counts().sort_index(ascending=True).head(20))
print("\n")
print("20 most expensive car offers with counts")
print(autos["price"].value_counts().sort_index(ascending=False).head(20))

Unique values: (2357,)


20 cheapest car offers with counts
0.0     1421
1.0      156
2.0        3
3.0        1
5.0        2
8.0        1
9.0        1
10.0       7
11.0       2
12.0       3
13.0       2
14.0       1
15.0       2
17.0       3
18.0       1
20.0       4
25.0       5
29.0       1
30.0       7
35.0       1
Name: price, dtype: int64


20 most expensive car offers with counts
99999999.0    1
27322222.0    1
12345678.0    3
11111111.0    2
10000000.0    1
3890000.0     1
1300000.0     1
1234566.0     1
999999.0      2
999990.0      1
350000.0      1
345000.0      1
299000.0      1
295000.0      1
265000.0      1
259000.0      1
250000.0      1
220000.0      1
198000.0      1
197000.0      1
Name: price, dtype: int64


In the dataset there is 2357 unique price labels. Mean price is 9840. 
The lowest price is 0 with 1421 cars (about 2%) of the whole dataset). The most expensive car is for 99,999,999. Guessing why the price is so high it might be that the sold car was special (veteran, collectable) or someone was just messing around.
To remove outlines I have decide to keep only cars worht more than $1, since eBay is an auction site and remove everything above 350,000 since then the prices rises to almost a million. 

#### Removing price outliers

In [13]:
autos = autos[autos["price"].between(1,350000)]
autos["price"].describe()

count     48565.000000
mean       5888.935591
std        9059.854754
min           1.000000
25%        1200.000000
50%        3000.000000
75%        7490.000000
max      350000.000000
Name: price, dtype: float64

### 1.4.2 Date columns
There are several columns that store date values as strings or floats. The columns can be differentiate further:
- `date_crawled`: added by the crawler, string
- `last_seen`: added by the crawler, string
- `ad_created`: from the website, string
- `registration_month`: from the website, float
- `registration_year`: from the website, float

In [14]:
# Displaying first 5 rows of the string date values
autos[['date_crawled','ad_created','last_seen']][0:5]

Unnamed: 0,date_crawled,ad_created,last_seen
0,2016-03-26 17:47:46,2016-03-26 00:00:00,2016-04-06 06:45:54
1,2016-04-04 13:38:56,2016-04-04 00:00:00,2016-04-06 14:45:08
2,2016-03-26 18:57:24,2016-03-26 00:00:00,2016-04-06 20:15:37
3,2016-03-12 16:58:10,2016-03-12 00:00:00,2016-03-15 03:16:28
4,2016-04-01 14:38:50,2016-04-01 00:00:00,2016-04-01 14:38:50


In [15]:
(autos["date_crawled"].str[:10]
                      .value_counts(normalize=True, dropna=False)
                      .sort_index(ascending=True) * 100
)

2016-03-05    2.532688
2016-03-06    1.404304
2016-03-07    3.601359
2016-03-08    3.329558
2016-03-09    3.308967
2016-03-10    3.218367
2016-03-11    3.257490
2016-03-12    3.691959
2016-03-13    1.566972
2016-03-14    3.654896
2016-03-15    3.428395
2016-03-16    2.960980
2016-03-17    3.162772
2016-03-18    1.291053
2016-03-19    3.477813
2016-03-20    3.788737
2016-03-21    3.737259
2016-03-22    3.298672
2016-03-23    3.222485
2016-03-24    2.934212
2016-03-25    3.160712
2016-03-26    3.220426
2016-03-27    3.109235
2016-03-28    3.486050
2016-03-29    3.409863
2016-03-30    3.368681
2016-03-31    3.183363
2016-04-01    3.368681
2016-04-02    3.547823
2016-04-03    3.860805
2016-04-04    3.648718
2016-04-05    1.309585
2016-04-06    0.317101
2016-04-07    0.140019
Name: date_crawled, dtype: float64

**The site was creawled on daily basis** form March 5th till April 7th.

In [16]:
(autos["last_seen"].str[:10]
                      .value_counts(normalize=True, dropna=False)
                      .sort_index(ascending=True) * 100
)

2016-03-05     0.107073
2016-03-06     0.432410
2016-03-07     0.539483
2016-03-08     0.741275
2016-03-09     0.959539
2016-03-10     1.066612
2016-03-11     1.237517
2016-03-12     2.378256
2016-03-13     0.889529
2016-03-14     1.260167
2016-03-15     1.587563
2016-03-16     1.645218
2016-03-17     2.808607
2016-03-18     0.735097
2016-03-19     1.583445
2016-03-20     2.065273
2016-03-21     2.063214
2016-03-22     2.137342
2016-03-23     1.853186
2016-03-24     1.976732
2016-03-25     1.921137
2016-03-26     1.680222
2016-03-27     1.564913
2016-03-28     2.085864
2016-03-29     2.234119
2016-03-30     2.477093
2016-03-31     2.378256
2016-04-01     2.279419
2016-04-02     2.491506
2016-04-03     2.520334
2016-04-04     2.448265
2016-04-05    12.476063
2016-04-06    22.180583
2016-04-07    13.194688
Name: last_seen, dtype: float64

Last seen indicates when the add was registered by the craweler for the last time. Last three days of crawling have disproportionally higher percentage.

In [17]:
(autos["ad_created"].str[:10]
                      .value_counts(normalize=True, dropna=False)
                      .sort_index(ascending=True) * 100
)

2015-06-11    0.002059
2015-08-10    0.002059
2015-09-09    0.002059
2015-11-10    0.002059
2015-12-05    0.002059
                ...   
2016-04-03    3.885514
2016-04-04    3.685782
2016-04-05    1.181921
2016-04-06    0.325337
2016-04-07    0.125605
Name: ad_created, Length: 76, dtype: float64

In [18]:
(autos["ad_created"].str[:10]
                      .value_counts(normalize=True, dropna=False)
                      .sort_values(ascending=False) * 100
)

2016-04-03    3.885514
2016-03-20    3.794914
2016-03-21    3.757850
2016-04-04    3.685782
2016-03-12    3.675486
                ...   
2016-02-11    0.002059
2016-02-01    0.002059
2016-02-17    0.002059
2016-02-22    0.002059
2016-01-14    0.002059
Name: ad_created, Length: 76, dtype: float64

In [19]:
(autos["ad_created"].str[:10]
                      .value_counts(normalize=True, dropna=False)
                      .sort_values(ascending=False)
                      .head(20) * 100
)

2016-04-03    3.885514
2016-03-20    3.794914
2016-03-21    3.757850
2016-04-04    3.685782
2016-03-12    3.675486
2016-03-14    3.518995
2016-04-02    3.514877
2016-03-28    3.498404
2016-03-07    3.473695
2016-03-29    3.403686
2016-03-15    3.401627
2016-03-19    3.368681
2016-04-01    3.368681
2016-03-30    3.350149
2016-03-08    3.331617
2016-03-09    3.315145
2016-03-11    3.290435
2016-03-22    3.280140
2016-03-26    3.226604
2016-03-23    3.206013
Name: ad_created, dtype: float64

Some of the car adds were created almost a year ago. The oldest add is form June 11th. Most of the adds vere created in the crawling month.

In [20]:
autos["registration_year"].describe()

count    48565.000000
mean      2004.755421
std         88.643887
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

The minimum value in the registratino year is 1000 and maximum is 9999. Since the site itself was crawled in 2016 it is not possible to have registration year higher than 2016. It is also safe to assume that registration year 1000 is out of place. First cars were invented in 1886. First mass produced car, Ford T, was introduced to the market in 1908. 
Before removing values that are older than 1908 or after 2016 I will investigate how many of those values are in the dataset.

In [21]:
before_1907 = autos["registration_year"].between(1000,1907)
after_2016 = autos["registration_year"].between(2017,9999)

((before_1907.sum() + after_2016.sum()) / autos.shape[0]) * 100

3.8793369710697

Only 3.8% of the registration year are having wrong registration year. It is safe to remove those values.

In [22]:
autos = autos[autos["registration_year"].between(1908,2016)]
autos["registration_year"].value_counts(normalize=True).head(10).sort_values(ascending=False) * 100

2000    6.760781
2005    6.289497
1999    6.205951
2004    5.790364
2003    5.781796
2006    5.719672
2001    5.646837
2002    5.325507
1998    5.062017
2007    4.877788
Name: registration_year, dtype: float64

In [23]:
autos["registration_year"].describe()

count    46681.000000
mean      2002.910756
std          7.185103
min       1910.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       2016.000000
Name: registration_year, dtype: float64

After removing the outliners we can see that the mean value for registration year is 2002 (2 years difference than before) and most of the cars were registrered after 1999.

## 2. Exploring Price by Brand

In [24]:
# exploring unique car brands
autos["brand"].value_counts()

volkswagen        9862
bmw               5137
opel              5022
mercedes_benz     4503
audi              4041
ford              3263
renault           2201
peugeot           1393
fiat              1197
seat               853
skoda              766
nissan             713
mazda              709
smart              661
citroen            654
toyota             593
hyundai            468
sonstige_autos     458
volvo              427
mini               409
mitsubishi         384
honda              366
kia                330
alfa_romeo         310
porsche            286
suzuki             277
chevrolet          266
chrysler           164
dacia              123
daihatsu           117
jeep               106
subaru             100
land_rover          98
saab                77
jaguar              73
daewoo              70
trabant             65
rover               62
lancia              50
lada                27
Name: brand, dtype: int64

Ater looking at the brands I have decided to select the top 10 brands.

In [39]:
brand_counts = autos["brand"].value_counts().head(10)
common_brands = list(brand_counts.index)
print(common_brands)

['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault', 'peugeot', 'fiat', 'seat']


In [42]:
boolean_brands = autos.brand.isin(common_brands)
filtered_brands = autos[boolean_brands]

In [47]:
brand_mean_price = (filtered_brands.groupby("brand")["price"]
                .mean().sort_values(ascending=False)
                .round()
)
print(brand_mean_price)

brand
audi             9337.0
mercedes_benz    8628.0
bmw              8333.0
volkswagen       5402.0
seat             4397.0
ford             3749.0
peugeot          3094.0
opel             2975.0
fiat             2814.0
renault          2475.0
Name: price, dtype: float64


Top 3 most expensive car brands are:
- Audi
- Mercedes Benz
- BMW

Only BMW is among top 3 selling brands. Volkswage, the most selling car in Germany is sold on average for 5402, that is almost half less than Audi. 

## 3. Exploring mileage

In [87]:
df_mean_price = brand_mean_price.to_frame(name="mean_price")

In [74]:
brand_mean_mileage = (filtered_brands.groupby("brand")["odometer_km"]
                .mean().sort_values(ascending=False)
                .round()
)
print(brand_mean_mileage)

brand
bmw              132573.0
mercedes_benz    130788.0
opel             129310.0
audi             129157.0
volkswagen       128707.0
renault          128071.0
peugeot          127154.0
ford             124266.0
seat             121131.0
fiat             117122.0
Name: odometer_km, dtype: float64


In [88]:
df_mean_mileage = brand_mean_mileage.to_frame(name="mean_mileage")

In [93]:
brand_info = df_mean_mileage
brand_info["mean_price"] = df_mean_price
brand_info

Unnamed: 0_level_0,mean_mileage,mean_price
brand,Unnamed: 1_level_1,Unnamed: 2_level_1
bmw,132573.0,8333.0
mercedes_benz,130788.0,8628.0
opel,129310.0,2975.0
audi,129157.0,9337.0
volkswagen,128707.0,5402.0
renault,128071.0,2475.0
peugeot,127154.0,3094.0
ford,124266.0,3749.0
seat,121131.0,4397.0
fiat,117122.0,2814.0


The average mileage does not differe that much as average pices do. Usually the more expensive cars have higher mileage with some exceptions like Opel.