# Exploring eBay Car Sales Data

This project looks to explore used car listings on eBay.

Here is a list of all of the columns that the dataset contains:

 - dateCrawled - When this ad was first crawled. All field-values are taken    from this date.
 - name - Name of the car.
 - seller - Whether the seller is private or a dealer.
 - offerType - The type of listing
 - price - The price on the ad to sell the car.
 - abtest - Whether the listing is included in an A/B test.
 - vehicleType - The vehicle Type.
 - yearOfRegistration - The year in which the car was first registered.
 - gearbox - The transmission type.
 - powerPS - The power of the car in PS.
 - model - The car model name.
 - kilometer - How many kilometers the car has driven.
 - monthOfRegistration - The month in which the car was first registered.
 - fuelType - What type of fuel the car uses.
 - brand - The brand of the car.
 - notRepairedDamage - If the car has a damage which is not yet repaired.
 - dateCreated - The date on which the eBay listing was created.
 - nrOfPictures - The number of pictures in the ad.
 - postalCode - The postal code for the location of the vehicle.
 - lastSeenOnline - When the crawler saw this ad last online.

In [1]:
import numpy as np
import pandas as pd

autos = pd.read_csv('autos.csv',encoding='Latin-1') 
#this reads in the dataset

In [2]:
autos.info()
autos.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


The majority of columns have the data type object, suggesting that the data in those columns contains both numbers and strings. 5 out of the 20 columns contain null values.

## Renaming the columns

In this section, I converted the column names from camel case to snake case, and changed some names to be more descriptive.

In [3]:
autos_column_copy = autos.columns.copy()

In [4]:
# Python3 program to convert string from camel case to snake case
from functools import reduce
 
def change_case(str):
     
    return reduce(lambda x, y: x + ('_' if y.isupper() else '') + y, str).lower()


#test_str = "dateCrawled"
#print(change_case(test_str))

In [5]:
new_col_names = []

for col_name in autos_column_copy:
    new_col_name = change_case(col_name)
    new_col_names.append(new_col_name)
    

autos_column_copy = new_col_names

#print(autos_column_copy)

autos.columns = autos_column_copy

autos.rename(columns={"year_of_registration": "registration_year", "month_of_registration": "registration_month", "not_repaired_damage": "unrepaired_damage", "date_created": "ad_created", "power_p_s": "power_ps"}, inplace=True)

In [6]:
autos.columns

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')

## Exploring the data 

In [7]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-16 21:50:53,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


The seller, offer type, AB test, gearbox, and unrepaired damage columns only have 2 unique values. There are 8 vehicle types and 7 fuel types. 
<br>
<br>
The price and odometer columns contain numeric values that are stored as text.

In [8]:
autos["name"].value_counts()

autos["vehicle_type"].value_counts()

limousine     12859
kleinwagen    10822
kombi          9127
bus            4093
cabrio         3061
coupe          2537
suv            1986
andere          420
Name: vehicle_type, dtype: int64

The most common car sold is a Ford Fiesta. Interestingly, the most common vehicle type is a limousine, which wasn't what I was expecting.

## Data cleaning

### Cleaning the odometer column

In [9]:
## Converting the odometer column into a numeric datatype ##

#print(autos["odometer"])

autos["odometer"] = autos["odometer"].str.replace('km', '').str.replace(',', '').astype(float)

autos.rename(columns={"odometer": "odometer_km"}, inplace=True)

#autos["odometer_km"].describe()
#print(autos["odometer_km"])

In [10]:
autos["odometer_km"].unique().shape

autos["odometer_km"].describe()

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64

In [11]:
autos["odometer_km"].value_counts(dropna=False)

150000.0    32424
125000.0     5170
100000.0     2169
90000.0      1757
80000.0      1436
70000.0      1230
60000.0      1164
50000.0      1027
5000.0        967
40000.0       819
30000.0       789
20000.0       784
10000.0       264
Name: odometer_km, dtype: int64

The majority value for odometers is 150,000km, and the least common value is 10,000km. The lowest value, 5000km, is the 9th most common value. Overall, the data in this column doesn't have any outliers.

### Cleaning the price column

In [12]:
autos["price"] = autos["price"].str.replace('$','').str.replace(',', '').astype(float)



print(autos["price"])

0         5000.0
1         8500.0
2         8990.0
3         4350.0
4         1350.0
5         7900.0
6          300.0
7         1990.0
8          250.0
9          590.0
10         999.0
11         350.0
12        5299.0
13        1350.0
14        3999.0
15       18900.0
16         350.0
17        5500.0
18         300.0
19        4150.0
20        3500.0
21       41500.0
22       25450.0
23        7999.0
24       48500.0
25          90.0
26         777.0
27           0.0
28        5250.0
29        4999.0
          ...   
49970    15800.0
49971      950.0
49972     3300.0
49973     6000.0
49974        0.0
49975     9700.0
49976     5900.0
49977     5500.0
49978      900.0
49979    11000.0
49980      400.0
49981     2000.0
49982     1950.0
49983      600.0
49984        0.0
49985     1000.0
49986    15900.0
49987    21990.0
49988     9550.0
49989      150.0
49990    17500.0
49991      500.0
49992     4800.0
49993     1650.0
49994     5000.0
49995    24900.0
49996     1980.0
49997    13200

In [13]:
autos["price"].unique().shape

autos["price"].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

In [14]:
autos["price"].value_counts().head(20)

0.0       1421
500.0      781
1500.0     734
2500.0     643
1200.0     639
1000.0     639
600.0      531
800.0      498
3500.0     498
2000.0     460
999.0      434
750.0      433
900.0      420
650.0      419
850.0      410
700.0      395
4500.0     394
300.0      384
2200.0     382
950.0      379
Name: price, dtype: int64

There are very big outliers in the data for price, as there are prices of 0 and also 100 million. The minimum will be set to 1 dollar, as eBay is an auction site and the maximum price will be set to 350,000 dollars.

In [15]:
autos = autos[autos["price"].between(1,350000)]

autos.loc[autos['price'].between(0,1),'price'] = np.nan
autos.loc[autos['price'].between(351000,100000000),'price'] = np.nan

In [16]:
autos["price"].unique().shape

autos["price"].describe()

count     48409.000000
mean       5907.909707
std        9068.263463
min           2.000000
25%        1250.000000
50%        3000.000000
75%        7490.000000
max      350000.000000
Name: price, dtype: float64

The standard deviation has gone down significantly, and the mean price of a car is 5900 dollars.

### Cleaning the date columns

In [17]:
autos["date_crawled"].unique().shape

(46882,)

In [18]:
## Calculating the distribution of values in the date_crawled column ##

autos["date_crawled"].str[:10].value_counts(normalize=True, dropna=False).sort_index() * 100

2016-03-05    2.532688
2016-03-06    1.404304
2016-03-07    3.601359
2016-03-08    3.329558
2016-03-09    3.308967
2016-03-10    3.218367
2016-03-11    3.257490
2016-03-12    3.691959
2016-03-13    1.566972
2016-03-14    3.654896
2016-03-15    3.428395
2016-03-16    2.960980
2016-03-17    3.162772
2016-03-18    1.291053
2016-03-19    3.477813
2016-03-20    3.788737
2016-03-21    3.737259
2016-03-22    3.298672
2016-03-23    3.222485
2016-03-24    2.934212
2016-03-25    3.160712
2016-03-26    3.220426
2016-03-27    3.109235
2016-03-28    3.486050
2016-03-29    3.409863
2016-03-30    3.368681
2016-03-31    3.183363
2016-04-01    3.368681
2016-04-02    3.547823
2016-04-03    3.860805
2016-04-04    3.648718
2016-04-05    1.309585
2016-04-06    0.317101
2016-04-07    0.140019
Name: date_crawled, dtype: float64

The adverts are mostly crawled from March and April 2016, and there are 46882 unique values for the date crawled column. 

In [19]:
## Calculating the distribution of values in the ad_created column ##

autos["ad_created"].str[:10].value_counts(normalize=True, dropna=False).sort_index() * 100

2015-06-11    0.002059
2015-08-10    0.002059
2015-09-09    0.002059
2015-11-10    0.002059
2015-12-05    0.002059
2015-12-30    0.002059
2016-01-03    0.002059
2016-01-07    0.002059
2016-01-10    0.004118
2016-01-13    0.002059
2016-01-14    0.002059
2016-01-16    0.002059
2016-01-22    0.002059
2016-01-27    0.006177
2016-01-29    0.002059
2016-02-01    0.002059
2016-02-02    0.004118
2016-02-05    0.004118
2016-02-07    0.002059
2016-02-08    0.002059
2016-02-09    0.002059
2016-02-11    0.002059
2016-02-12    0.004118
2016-02-14    0.004118
2016-02-16    0.002059
2016-02-17    0.002059
2016-02-18    0.004118
2016-02-19    0.006177
2016-02-20    0.004118
2016-02-21    0.006177
                ...   
2016-03-09    3.315145
2016-03-10    3.189540
2016-03-11    3.290435
2016-03-12    3.675486
2016-03-13    1.700813
2016-03-14    3.518995
2016-03-15    3.401627
2016-03-16    3.012458
2016-03-17    3.127767
2016-03-18    1.359003
2016-03-19    3.368681
2016-03-20    3.794914
2016-03-21 

In [20]:
autos["ad_created"].unique().shape

(76,)

Some ads appeared to have been created before the date crawled.

In [21]:
## Calculating the distribution of values in the last_seen column ##

autos["last_seen"].str[:10].value_counts(normalize=True, dropna=False).sort_index() * 100

2016-03-05     0.107073
2016-03-06     0.432410
2016-03-07     0.539483
2016-03-08     0.741275
2016-03-09     0.959539
2016-03-10     1.066612
2016-03-11     1.237517
2016-03-12     2.378256
2016-03-13     0.889529
2016-03-14     1.260167
2016-03-15     1.587563
2016-03-16     1.645218
2016-03-17     2.808607
2016-03-18     0.735097
2016-03-19     1.583445
2016-03-20     2.065273
2016-03-21     2.063214
2016-03-22     2.137342
2016-03-23     1.853186
2016-03-24     1.976732
2016-03-25     1.921137
2016-03-26     1.680222
2016-03-27     1.564913
2016-03-28     2.085864
2016-03-29     2.234119
2016-03-30     2.477093
2016-03-31     2.378256
2016-04-01     2.279419
2016-04-02     2.491506
2016-04-03     2.520334
2016-04-04     2.448265
2016-04-05    12.476063
2016-04-06    22.180583
2016-04-07    13.194688
Name: last_seen, dtype: float64

The ads seem to have been mostly last seen at around the same time the ads were crawled.

In [22]:
autos["last_seen"].unique().shape

(38474,)

In [23]:
autos["registration_year"].describe()

count    48565.000000
mean      2004.755421
std         88.643887
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

There are definitely a lot of outliers in the registration year column, as the maximum year given is 9999, which is a year we have not reached yet, and the minimum year is 1000, which is about 900 years before cars were invented.

In [24]:
autos["registration_year"].unique()

array([2004, 1997, 2009, 2007, 2003, 2006, 1995, 1998, 2000, 2017, 2010,
       1999, 1982, 1990, 2015, 2014, 1996, 1992, 2002, 2012, 2011, 2005,
       2008, 1985, 2016, 1994, 1986, 2001, 2018, 2013, 1972, 1993, 1988,
       1989, 1973, 1967, 1976, 4500, 1987, 1991, 1983, 1960, 1969, 1950,
       1978, 1980, 1984, 1963, 1977, 1961, 1968, 1934, 1965, 1971, 1966,
       1979, 1981, 1970, 1974, 1910, 1975, 5000, 4100, 2019, 1956, 9999,
       6200, 1964, 1959, 1958, 1800, 1948, 1931, 1943, 1941, 1962, 1927,
       1937, 1929, 1000, 1957, 1952, 1111, 1955, 1939, 8888, 1954, 1938,
       2800, 5911, 1953, 1951, 4800, 1001, 9000], dtype=int64)

The highest acceptable value for registration year is 2016, as cars need to be registered before an advert can be made for them, and the last seen advert was in 2016. I have chosen 1908 as the lowest acceptable registration year, as the Model T Ford was introduced in that year, which is one of the earliest cars that is still likely to hve survived to this day.

In [25]:
autos.loc[autos["registration_year"] < 1908, "registration_year"] = np.nan
autos.loc[autos["registration_year"] > 2016, "registration_year"] = np.nan

In [26]:
autos["registration_year"].value_counts(normalize=True).sort_index().tail(50) * 100

1967.0    0.055697
1968.0    0.055697
1969.0    0.040702
1970.0    0.081404
1971.0    0.055697
1972.0    0.070693
1973.0    0.049271
1974.0    0.051413
1975.0    0.038560
1976.0    0.044986
1977.0    0.047128
1978.0    0.094257
1979.0    0.072835
1980.0    0.182087
1981.0    0.059982
1982.0    0.087830
1983.0    0.109252
1984.0    0.109252
1985.0    0.203509
1986.0    0.154238
1987.0    0.154238
1988.0    0.289197
1989.0    0.372743
1990.0    0.743343
1991.0    0.726206
1992.0    0.792614
1993.0    0.910435
1994.0    1.347443
1995.0    2.628478
1996.0    2.941239
1997.0    4.179431
1998.0    5.062017
1999.0    6.205951
2000.0    6.760781
2001.0    5.646837
2002.0    5.325507
2003.0    5.781796
2004.0    5.790364
2005.0    6.289497
2006.0    5.719672
2007.0    4.877788
2008.0    4.744971
2009.0    4.466485
2010.0    3.403954
2011.0    3.476789
2012.0    2.806281
2013.0    1.720186
2014.0    1.420278
2015.0    0.839742
2016.0    2.613483
Name: registration_year, dtype: float64

In [27]:
autos["registration_year"].describe()

count    46681.000000
mean      2002.910756
std          7.185103
min       1910.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       2016.000000
Name: registration_year, dtype: float64

The amount of cars registered per year was extremely small until 1980, and from 1980 onwards, the amount of cars registered per year has overall increased. In terms of percentages, the most popular year for registering cars was 2000.

## Exploring the data

### Exploring the brand column

In [28]:
autos["brand"].unique().shape

(40,)

In [29]:
top_brands = autos["brand"].value_counts(normalize=True).head(20) * 100 

The most common car brand is Volkswagen (representing just over 20% of car sales) and the least common brand is Lada (with 0.05% of sales).
<br>
<br>
I have decided to select the top 20 brands.

In [30]:
top_20_brands = top_brands.index

print(top_20_brands)

Index(['volkswagen', 'opel', 'bmw', 'mercedes_benz', 'audi', 'ford', 'renault',
       'peugeot', 'fiat', 'seat', 'skoda', 'nissan', 'mazda', 'smart',
       'citroen', 'toyota', 'hyundai', 'sonstige_autos', 'volvo', 'mini'],
      dtype='object')


In [31]:
avg_brand_price = {} #this is a dictionary that will store the aggregate brand data

for brand in top_20_brands:
    avg_price = autos.loc[autos['brand'] == brand, 'price'].mean()
    avg_brand_price[brand] = avg_price

In [32]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,48565,48565,48565,48565,48409.0,48565,43979,46681.0,46222,48565.0,46107,48565.0,48565.0,44535,48565,39464,48565,48565.0,48565.0,48565
unique,46882,37470,2,1,,2,8,,2,,245,,,7,40,2,76,,,38474
top,2016-03-12 16:06:22,Ford_Fiesta,privat,Angebot,,test,limousine,,manuell,,golf,,,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,76,48564,48565,,25019,12598,,36102,,3900,,,29368,10336,34775,1887,,,8
mean,,,,,5907.909707,,,2002.910756,,117.197158,,125770.101925,5.782251,,,,,0.0,50975.745207,
std,,,,,9068.263463,,,7.185103,,200.649618,,39788.636804,3.685595,,,,,0.0,25746.968398,
min,,,,,2.0,,,1910.0,,0.0,,5000.0,0.0,,,,,0.0,1067.0,
25%,,,,,1250.0,,,1999.0,,71.0,,125000.0,3.0,,,,,0.0,30657.0,
50%,,,,,3000.0,,,2003.0,,107.0,,150000.0,6.0,,,,,0.0,49716.0,
75%,,,,,7490.0,,,2008.0,,150.0,,150000.0,9.0,,,,,0.0,71665.0,


In [33]:
avg_brand_price

{'audi': 9241.752587244284,
 'bmw': 8294.405101846563,
 'citroen': 3772.8918128654973,
 'fiat': 2798.3031746031747,
 'ford': 3736.143111111111,
 'hyundai': 5382.935684647303,
 'mazda': 4059.059539918809,
 'mercedes_benz': 8565.483606557376,
 'mini': 10541.566985645934,
 'nissan': 4675.6945945945945,
 'opel': 2959.746095238095,
 'peugeot': 3074.2082748948105,
 'renault': 2439.5865343116097,
 'seat': 4329.860414394766,
 'skoda': 6369.875321336761,
 'smart': 3518.102305475504,
 'sonstige_autos': 12385.82251082251,
 'toyota': 5148.0032733224225,
 'volkswagen': 5346.960516103997,
 'volvo': 4889.263157894737}

The 4 most expensive car brands out of the top 20 are Mini, Audi, Mercedes-Benz, and BMW. The average price of a Mini is surprisingly more expensive than the other three most expensive brands, having an average price of 10,541 dollars; however, this could be explained by the fact that Mini is the least commonly sold out of the top 20 brands, so it's possible that the Minis that are being sold are being sold at on average higher prices due to the Minis being sold perhaps being rarer models compared to the Audis, Mercedes, and BMWs being sold.
<br>
<br>
The cheapest car brand sold is Renault,followed by Fiat and Opel.

##### Exploring the top 6 brands

In [34]:
top_6_brands = top_20_brands[0:6]

In [35]:
## Mean mileage and price of the top 6 brands ##

avg_price_top6_brands = {}
avg_mileage_top6_brands = {}

for brand in top_6_brands:
    avg_price = autos.loc[autos['brand'] == brand, 'price'].mean()
    avg_price_top6_brands[brand] = avg_price
    
    avg_mileage = autos.loc[autos['brand'] == brand, 'odometer_km'].mean()
    avg_mileage_top6_brands[brand] = avg_mileage    

In [36]:
avg_price_top6_series = pd.Series(avg_price_top6_brands)
avg_mileage_top6_series = pd.Series(avg_mileage_top6_brands)

print(avg_price_top6_series)
print(avg_mileage_top6_series)

audi             9241.752587
bmw              8294.405102
ford             3736.143111
mercedes_benz    8565.483607
opel             2959.746095
volkswagen       5346.960516
dtype: float64
audi             129492.562380
bmw              132682.973075
ford             124349.497339
mercedes_benz    130796.431642
opel             129383.172257
volkswagen       128896.575077
dtype: float64


In [37]:
top_autos_df = pd.DataFrame(avg_price_top6_series, columns=['average_price_dollars'])

top_autos_df['average_mileage_km'] = avg_mileage_top6_series

print(top_autos_df)

               average_price_dollars  average_mileage_km
audi                     9241.752587       129492.562380
bmw                      8294.405102       132682.973075
ford                     3736.143111       124349.497339
mercedes_benz            8565.483607       130796.431642
opel                     2959.746095       129383.172257
volkswagen               5346.960516       128896.575077


In [38]:
top_autos_df.describe()

Unnamed: 0,average_price_dollars,average_mileage_km
count,6.0,6.0
mean,6357.41517,129266.868629
std,2697.522549,2770.972917
min,2959.746095,124349.497339
25%,4138.847462,129018.224372
50%,6820.682809,129437.867319
75%,8497.71398,130470.464327
max,9241.752587,132682.973075


BMWs have the highest average mileage and are the second most expensive brand of car. Overall, there does not seem to be any correlation between average price and average mileage.