# Project: Exploring eBay Car Sales Data

####  We'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.

#### The dataset was originally scraped and uploaded to Kaggle by user orgesleka. The original dataset isn't available on Kaggle anymore, but you can download it from https://data.world/data-society/used-cars-data).

#### The aim of this project is to clean the data and analyze the included used car listings.


Importing Numpy and Pandas data science libraries

In [308]:
import numpy as np
import pandas as pd

Importing a CSV file into a Pandas dataframe.

In [309]:
autos = pd.read_csv('autos.csv', encoding='Latin-1')

Exploring the first five (5) rows of the dataset

In [310]:
print(autos.head())

           dateCrawled                                               name  \
0  2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   
1  2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   
2  2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   
3  2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   
4  2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   

   seller offerType   price   abtest vehicleType  yearOfRegistration  \
0  privat   Angebot  $5,000  control         bus                2004   
1  privat   Angebot  $8,500  control   limousine                1997   
2  privat   Angebot  $8,990     test   limousine                2009   
3  privat   Angebot  $4,350  control  kleinwagen                2007   
4  privat   Angebot  $1,350     test       kombi                2003   

     gearbox  powerPS   model   odometer  monthOfRegistration fuelType  \
0    manuell      158  andere 

Visualizing general information about the DataFrame.

In [311]:
print(autos.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  date

Printing the name of the columns of the Dataframe.

In [312]:
print(autos.columns)

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')


Changing column's names:

* yearOfRegistration to registration_year
* monthOfRegistration to registration_month
* notRepairedDamage to unrepaired_damage
* dateCreated to ad_created

In [313]:
news_columns = {'yearOfRegistration':'registration_year', 'monthOfRegistration':'registration_month', 'notRepairedDamage':'unrepaired_damage', 'dateCreated':'ad_created'}
autos.rename(news_columns, axis=1, inplace=True)

Visualizing new column's names

In [314]:
print(autos.columns)

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'registration_year', 'gearbox', 'powerPS', 'model',
       'odometer', 'registration_month', 'fuelType', 'brand',
       'unrepaired_damage', 'ad_created', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')


We'll modify the format of the column's name from camelcase to snakecase.

In [315]:
import re
autos.columns = [re.sub(r'(?<!^)(?=[A-Z])', '_', string).lower() for string in autos.columns]

Visualizing new column's names

In [316]:
print(autos.columns)

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_p_s', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')


We are looking at the current state of the autos data frame.

In [317]:
print(autos.head())

          date_crawled                                               name  \
0  2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   
1  2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   
2  2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   
3  2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   
4  2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   

   seller offer_type   price   abtest vehicle_type  registration_year  \
0  privat    Angebot  $5,000  control          bus               2004   
1  privat    Angebot  $8,500  control    limousine               1997   
2  privat    Angebot  $8,990     test    limousine               2009   
3  privat    Angebot  $4,350  control   kleinwagen               2007   
4  privat    Angebot  $1,350     test        kombi               2003   

     gearbox  power_p_s   model   odometer  registration_month fuel_type  \
0    manuell        15

We are checking the descriptive statistics for all columns.

In [318]:
print(autos.describe())

       registration_year     power_p_s  registration_month  nr_of_pictures  \
count       50000.000000  50000.000000        50000.000000         50000.0   
mean         2005.073280    116.355920            5.723360             0.0   
std           105.712813    209.216627            3.711984             0.0   
min          1000.000000      0.000000            0.000000             0.0   
25%          1999.000000     70.000000            3.000000             0.0   
50%          2003.000000    105.000000            6.000000             0.0   
75%          2008.000000    150.000000            9.000000             0.0   
max          9999.000000  17700.000000           12.000000             0.0   

        postal_code  
count  50000.000000  
mean   50813.627300  
std    25779.747957  
min     1067.000000  
25%    30451.000000  
50%    49577.000000  
75%    71540.000000  
max    99998.000000  


Based on the command's output, it seems that the column "nr_of_pictures" contains only one value. So, it is a candidate to be dropped from the dataset because it doesn't add value to our analysis.

Precisely, the output below shows one value available:

In [319]:
print(autos['nr_of_pictures'].unique())

[0]


Any columns that have mostly one value that are candidates to be dropped.
Any columns that need more investigation.
Any examples of numeric data stored as text that needs to be cleaned.

Show the current number of rows and columns.

In [320]:
print(autos.shape)

(50000, 20)


We are deleting column "nr_of_pictures" from the autos data frame.

In [321]:
autos.drop(columns=['nr_of_pictures'], inplace=True)

Show the current number of rows and columns after deleting the column "nr_of_pictures".

In [322]:
print(autos.shape)

(50000, 19)


The columns ['price', 'odometer'] are number values stored as text, as shown below:

In [323]:
print(autos.dtypes)

date_crawled          object
name                  object
seller                object
offer_type            object
price                 object
abtest                object
vehicle_type          object
registration_year      int64
gearbox               object
power_p_s              int64
model                 object
odometer              object
registration_month     int64
fuel_type             object
brand                 object
unrepaired_damage     object
ad_created            object
postal_code            int64
last_seen             object
dtype: object


To convert ['price', 'odometer'] , we'll perform the following actions:

* Remove any non-numeric characters
* Convert the column to a numeric dtype

In [324]:
print(autos['price'].head())

0    $5,000
1    $8,500
2    $8,990
3    $4,350
4    $1,350
Name: price, dtype: object


We are removing the dollar sign character ($) and comma (,) from the values stored in the column "price". Then, converting the values from the object (string) to integer (int).

In [325]:
autos['price'] = autos['price'].str.replace('$', '').str.replace(',','').astype(int)

Verifying that the changes have been done correctly:

In [326]:
print(autos.dtypes)

date_crawled          object
name                  object
seller                object
offer_type            object
price                  int64
abtest                object
vehicle_type          object
registration_year      int64
gearbox               object
power_p_s              int64
model                 object
odometer              object
registration_month     int64
fuel_type             object
brand                 object
unrepaired_damage     object
ad_created            object
postal_code            int64
last_seen             object
dtype: object


In the "odometer" column , we'll perform the following actions:

* Remove any non-numeric characters
* Convert the column to a numeric dtype

In [327]:
print(autos['odometer'].head())

0    150,000km
1    150,000km
2     70,000km
3     70,000km
4    150,000km
Name: odometer, dtype: object


We are removing "km" characters and comma (,) from the values stored in the column "odometer". Then, converting the values from the object (string) to integer (int).

In [328]:
autos['odometer'] = autos['odometer'].str.replace('km', '').str.replace(',', '').astype(int)

We change the column's name odometer to odometer_km.

In [329]:
autos.rename({'odometer':'odometer_km'}, axis=1, inplace=True)

Verifying that the changes have been done correctly:

In [330]:
print(autos.dtypes)

date_crawled          object
name                  object
seller                object
offer_type            object
price                  int64
abtest                object
vehicle_type          object
registration_year      int64
gearbox               object
power_p_s              int64
model                 object
odometer_km            int64
registration_month     int64
fuel_type             object
brand                 object
unrepaired_damage     object
ad_created            object
postal_code            int64
last_seen             object
dtype: object


Let's continue exploring the data, specifically looking for data that doesn't look right. We'll start by analyzing the odometer_km and price columns. 

Analyze the columns using minimum and maximum values and look for any values that look unrealistically high or low (outliers) that we might want to remove

Discover how many values are in the column.

In [331]:
print(autos['price'].shape)
print(autos['odometer_km'].shape)

(50000,)
(50000,)


View min/max/median/mean without scientific notation

In [332]:
print(autos['price'].describe().apply(lambda x: format(x, 'f')))

count       50000.000000
mean         9840.043760
std        481104.380500
min             0.000000
25%          1100.000000
50%          2950.000000
75%          7200.000000
max      99999999.000000
Name: price, dtype: object


View min/max/median/mean without scientific notation

In [333]:
print(autos['odometer_km'].describe().apply(lambda x: format(x, 'f')))

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: object


Getting a count of unique values:

In [334]:
print(autos['price'].value_counts().head())

0       1421
500      781
1500     734
2500     643
1000     639
Name: price, dtype: int64


Getting a count of unique values:

In [335]:
print(autos['odometer_km'].value_counts().head())

150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
Name: odometer_km, dtype: int64


The function below takes a data frame and a string (column name) as input. Then, it removes from the Series (column ), returning a clean data frame. See http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm

I took the function's code from this URL:

https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-a-pandas-dataframe

In [336]:
def remove_outlier(df_in, col_name):
    q1 = df_in[col_name].quantile(0.25)
    q3 = df_in[col_name].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    df_out = df_in.loc[(df_in[col_name] > fence_low) & (df_in[col_name] < fence_high)]
    return df_out

We are removing outliers from the "price" column.

In [337]:
autos_non_outliers = remove_outlier(autos, 'price')

In [338]:
print("The new dataframe {rows}.".format(rows=autos_non_outliers.shape[0]))

The new dataframe 46207.


We are removing outliers from the "odometer_km" column.

In [339]:
autos_non_outliers = remove_outlier(autos, 'odometer_km')

In [340]:
print("The new dataframe {rows}.".format(rows=autos_non_outliers.shape[0]))

The new dataframe 41520.


There are 5 columns that should represent date values. Some of these columns were created by the crawler, some came from the website itself.

Right now, the date_crawled, last_seen, and ad_created columns are all identified as string values by pandas. Because these three columns are represented as strings, we need to convert the data into a numerical representation so we can understand it quantitatively. The other two columns are represented as numeric values, so we can use methods like Series.describe() to understand the distribution without any extra data processing.

Let's first understand how the values in the three string columns are formatted. These columns all represent full timestamp values, like so:

In [341]:
print(autos_non_outliers[['date_crawled','ad_created','last_seen']][0:5])

          date_crawled           ad_created            last_seen
0  2016-03-26 17:47:46  2016-03-26 00:00:00  2016-04-06 06:45:54
1  2016-04-04 13:38:56  2016-04-04 00:00:00  2016-04-06 14:45:08
4  2016-04-01 14:38:50  2016-04-01 00:00:00  2016-04-01 14:38:50
5  2016-03-21 13:47:45  2016-03-21 00:00:00  2016-04-06 09:45:21
6  2016-03-20 17:55:21  2016-03-20 00:00:00  2016-03-23 02:48:59


We'll calculate the distribution of values in the date_crawled, ad_created, and last_seen columns (all string columns) as percentages.

Checking the distribution of date_crawled column:

In [342]:
print(autos_non_outliers['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index())

2016-03-05    0.024639
2016-03-06    0.014090
2016-03-07    0.036199
2016-03-08    0.034128
2016-03-09    0.033911
2016-03-10    0.032177
2016-03-11    0.032201
2016-03-12    0.037042
2016-03-13    0.014981
2016-03-14    0.036850
2016-03-15    0.033960
2016-03-16    0.029528
2016-03-17    0.031961
2016-03-18    0.013247
2016-03-19    0.033863
2016-03-20    0.037620
2016-03-21    0.037572
2016-03-22    0.032828
2016-03-23    0.032539
2016-03-24    0.028926
2016-03-25    0.032394
2016-03-26    0.032322
2016-03-27    0.031214
2016-03-28    0.034706
2016-03-29    0.033839
2016-03-30    0.033815
2016-03-31    0.032009
2016-04-01    0.033454
2016-04-02    0.034995
2016-04-03    0.038102
2016-04-04    0.036729
2016-04-05    0.013512
2016-04-06    0.003444
2016-04-07    0.001204
Name: date_crawled, dtype: float64


Checking the distribution of ad_created column:

In [343]:
print(autos_non_outliers['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index())

2015-12-05    0.000024
2015-12-30    0.000024
2016-01-03    0.000024
2016-01-07    0.000024
2016-01-10    0.000048
                ...   
2016-04-03    0.038367
2016-04-04    0.037283
2016-04-05    0.012115
2016-04-06    0.003468
2016-04-07    0.001108
Name: ad_created, Length: 70, dtype: float64


Checking the distribution of last_seen column:

In [344]:
print(autos_non_outliers['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index())

2016-03-05    0.001084
2016-03-06    0.004697
2016-03-07    0.005829
2016-03-08    0.008333
2016-03-09    0.010356
2016-03-10    0.011079
2016-03-11    0.013463
2016-03-12    0.025072
2016-03-13    0.009369
2016-03-14    0.013223
2016-03-15    0.016305
2016-03-16    0.017100
2016-03-17    0.029359
2016-03-18    0.007346
2016-03-19    0.016522
2016-03-20    0.021821
2016-03-21    0.021580
2016-03-22    0.022110
2016-03-23    0.019147
2016-03-24    0.020520
2016-03-25    0.020087
2016-03-26    0.017389
2016-03-27    0.016618
2016-03-28    0.021821
2016-03-29    0.022856
2016-03-30    0.024880
2016-03-31    0.024422
2016-04-01    0.023988
2016-04-02    0.025530
2016-04-03    0.025506
2016-04-04    0.024904
2016-04-05    0.119701
2016-04-06    0.212620
2016-04-07    0.125361
Name: last_seen, dtype: float64


Understanding the distributon of registration_year column:

In [345]:
print(autos_non_outliers['registration_year'].describe())

count    41520.000000
mean      2002.820111
std         35.023528
min       1910.000000
25%       1999.000000
50%       2003.000000
75%       2007.000000
max       9000.000000
Name: registration_year, dtype: float64


In [346]:
print("The dataset contains {quantity} cars registered on the date before 1900".format(quantity=autos_non_outliers[autos_non_outliers['registration_year'] < 1900].shape[0]))

The dataset contains 0 cars registered on the date before 1900


In [347]:
print("The dataset contains {quantity} cars registered on the date after 2016".format(quantity=autos_non_outliers[autos_non_outliers['registration_year'] > 2016].shape[0]))

The dataset contains 1725 cars registered on the date after 2016


We are removing rows where column registration_year is greater than the year 2016.

In [348]:
autos_non_outliers_2 = autos_non_outliers.drop(autos_non_outliers[autos_non_outliers['registration_year'] < 1900].index)

In [349]:
print("The new dataset contains {quantity} rows after removing rows where registration_year column is greater than the year 2016".format(quantity=autos_non_outliers_2.shape[0]))

The new dataset contains 41520 rows after removing rows where registration_year column is greater than the year 2016


Reviewing the distribution of the column "registration_year" (normalized)

In [350]:
print(autos_non_outliers_2['registration_year'].value_counts(normalize=True))

2000    0.075193
1999    0.068882
2005    0.066113
2003    0.062042
2001    0.061826
          ...   
9000    0.000024
1937    0.000024
1962    0.000024
1957    0.000024
1943    0.000024
Name: registration_year, Length: 73, dtype: float64


Determining how many brands of cars are in our dataset.

In [351]:
print(autos_non_outliers_2['brand'].unique())

['peugeot' 'bmw' 'ford' 'chrysler' 'volkswagen' 'seat' 'renault'
 'mercedes_benz' 'audi' 'opel' 'mazda' 'porsche' 'mini' 'toyota' 'nissan'
 'jeep' 'dacia' 'saab' 'volvo' 'mitsubishi' 'fiat' 'skoda' 'subaru' 'kia'
 'sonstige_autos' 'citroen' 'smart' 'hyundai' 'chevrolet' 'honda' 'daewoo'
 'suzuki' 'jaguar' 'land_rover' 'alfa_romeo' 'lada' 'rover' 'trabant'
 'daihatsu' 'lancia']


In [352]:
print("Our dataset contains a total of {quantity} unique brands".format(quantity=len(autos_non_outliers_2['brand'].unique())))

Our dataset contains a total of 40 unique brands


Explorng the top 20 brands of cars registered in the dataset

In [353]:
print(autos_non_outliers_2['brand'].value_counts().sort_values(ascending=False).head(20))

volkswagen       9097
bmw              4804
opel             4736
mercedes_benz    4096
audi             3653
ford             2856
renault          2078
peugeot          1233
fiat             1014
seat              755
mazda             630
nissan            591
skoda             559
citroen           553
toyota            464
smart             447
volvo             423
mitsubishi        342
hyundai           324
honda             324
Name: brand, dtype: int64


Storing Top 20 brands object (Pandas Series) in a variable and converting it to a Python list.


In [354]:
top_20_brands = autos_non_outliers_2['brand'].value_counts().sort_values(ascending=False).head(20)
top_20_brands_list = list(top_20_brands.index)

Visualizing the content of top_20_brands list.

In [355]:
print(top_20_brands_list)

['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault', 'peugeot', 'fiat', 'seat', 'mazda', 'nissan', 'skoda', 'citroen', 'toyota', 'smart', 'volvo', 'mitsubishi', 'hyundai', 'honda']


We are using a dictionary "mean_price_by_brand" to aggregate the data of each brand and its mean price.

In [356]:
mean_price_by_brand = dict()
for brand in top_20_brands_list:
    df = autos_non_outliers_2[autos_non_outliers_2['brand'] == brand]
    mean_price = df['price'].mean()
    mean_price_by_brand[brand] = round(mean_price)

Below, you can see the information about each brand and its median price.

In [357]:
print(mean_price_by_brand)

{'volkswagen': 5384.0, 'bmw': 6881.0, 'opel': 4802.0, 'mercedes_benz': 31015.0, 'audi': 6670.0, 'ford': 2937.0, 'renault': 1762.0, 'peugeot': 2341.0, 'fiat': 14097.0, 'seat': 2918.0, 'mazda': 2786.0, 'nissan': 3106.0, 'skoda': 4805.0, 'citroen': 2910.0, 'toyota': 3950.0, 'smart': 2443.0, 'volvo': 33347.0, 'mitsubishi': 2489.0, 'hyundai': 3689.0, 'honda': 2963.0}


Below, you can see the top 6 most expensive brands

In [358]:
sorted(mean_price_by_brand, key=mean_price_by_brand.get, reverse=True)[:6]

['volvo', 'mercedes_benz', 'fiat', 'bmw', 'audi', 'volkswagen']

Below, you can see the top  cheapest brands of cars.

In [359]:
sorted(mean_price_by_brand, key=mean_price_by_brand.get, reverse=False)[:6]

['renault', 'peugeot', 'smart', 'mitsubishi', 'mazda', 'citroen']

In [360]:
print("The most expensive brand is {brand}".format(brand=sorted(mean_price_by_brand, key=mean_price_by_brand.get, reverse=True)[0]))

The most expensive brand is volvo


In [361]:
print("The most cheapest brand is {brand}".format(brand=sorted(mean_price_by_brand, key=mean_price_by_brand.get, reverse=False)[0]))

The most cheapest brand is renault


We are using a dictionary "mean_mileage_by_brand" to aggregate the data of each brand and its mean mileage (odometer_km column).

In [362]:
mean_mileage_by_brand = dict()
for brand in top_20_brands_list:
    df = autos_non_outliers_2[autos_non_outliers_2['brand'] == brand]
    mean_odometer_km = df['odometer_km'].mean()
    mean_mileage_by_brand[brand] = round(mean_odometer_km)

Below, you can see the information about each brand and its median mileage.

In [363]:
print(mean_mileage_by_brand)

{'volkswagen': 143399.0, 'bmw': 143649.0, 'opel': 142122.0, 'mercedes_benz': 143705.0, 'audi': 144243.0, 'ford': 140499.0, 'renault': 141006.0, 'peugeot': 140458.0, 'fiat': 137027.0, 'seat': 139848.0, 'mazda': 140944.0, 'nissan': 137733.0, 'skoda': 136530.0, 'citroen': 137667.0, 'toyota': 136875.0, 'smart': 127975.0, 'volvo': 145757.0, 'mitsubishi': 141462.0, 'hyundai': 134985.0, 'honda': 140648.0}


Below, you can see the top 6 brands with more mileage

In [364]:
sorted(mean_mileage_by_brand, key=mean_mileage_by_brand.get, reverse=True)[:6]

['volvo', 'audi', 'mercedes_benz', 'bmw', 'volkswagen', 'opel']

Below, you can see the top 6 brands with less mileage

In [365]:
sorted(mean_mileage_by_brand, key=mean_mileage_by_brand.get, reverse=False)[:6]

['smart', 'hyundai', 'skoda', 'toyota', 'fiat', 'citroen']

In [366]:
print("{brand} with the most mileage".format(brand=sorted(mean_mileage_by_brand, key=mean_mileage_by_brand.get, reverse=True)[0]))

volvo with the most mileage


In [367]:
print("{brand} with the less mileage".format(brand=sorted(mean_mileage_by_brand, key=mean_mileage_by_brand.get, reverse=False)[0]))

smart with the less mileage


We are converting a Pandas Series object to a dictionary.

In [368]:
series_mean_price = pd.Series(mean_price_by_brand)

We are converting a Pandas Series object to a dictionary.

In [369]:
series_mean_mileage = pd.Series(mean_mileage_by_brand)

We concatenate two Pandas Series objects to create a single pandas DataFrame object. 

In [370]:
df = pd.concat([series_mean_price, series_mean_mileage], axis=1)

We are renaming dataframe's columns names

In [371]:
df.rename({0:'mean_price', 1:'mean_mileage'}, axis=1, inplace=True)

We are visualizing the content of our new data frame.

In [372]:
df

Unnamed: 0,mean_price,mean_mileage
volkswagen,5384.0,143399.0
bmw,6881.0,143649.0
opel,4802.0,142122.0
mercedes_benz,31015.0,143705.0
audi,6670.0,144243.0
ford,2937.0,140499.0
renault,1762.0,141006.0
peugeot,2341.0,140458.0
fiat,14097.0,137027.0
seat,2918.0,139848.0
