# Exploring Ebay Used Car Sales Data

## Introduction:

Our dataset sample contains 50,000 used cars scraped from Ebay-Kleinanzeigen - a classifieds section of the German eBay website. 
The following fields are included in the dataset:
- **dateCrawled** - When this ad was first crawled. All field-values are taken from this date.
- **name** - Name of the car.
- **seller** - Whether the seller is private or a dealer.
- **offerType** - The type of listing
- **price** - The price on the ad to sell the car.
- **abtest** - Whether the listing is included in an A/B test.
- **vehicleType** - The vehicle Type.
- **yearOfRegistration** - The year in which the car was first registered.
- **gearbox** - The transmission type.
- **powerPS** - The power of the car in PS.
- **model** - The car model name.
- **kilometer** - How many kilometers the car has driven.
- **monthOfRegistration** - The month in which the car was first registered.
- **fuelType** - What type of fuel the car uses.
- **brand** - The brand of the car.
- **notRepairedDamage** - If the car has a damage which is not yet repaired.
- **dateCreated** - The date on which the eBay listing was created.
- **nrOfPictures** - The number of pictures in the ad.
- **postalCode** - The postal code for the location of the vehicle.
- **lastSeenOnline** - When the crawler saw this ad last online.

The aim of this project is to clean the data and analyze the included used car listings.

In [1]:
import numpy as np
import pandas as pd
autos=pd.read_csv('autos.csv', encoding="Latin-1")

In [2]:
autos

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50
5,2016-03-21 13:47:45,Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...,privat,Angebot,"$7,900",test,bus,2006,automatik,150,voyager,"150,000km",4,diesel,chrysler,,2016-03-21 00:00:00,0,22962,2016-04-06 09:45:21
6,2016-03-20 17:55:21,VW_Golf_III_GT_Special_Electronic_Green_Metall...,privat,Angebot,$300,test,limousine,1995,manuell,90,golf,"150,000km",8,benzin,volkswagen,,2016-03-20 00:00:00,0,31535,2016-03-23 02:48:59
7,2016-03-16 18:55:19,Golf_IV_1.9_TDI_90PS,privat,Angebot,"$1,990",control,limousine,1998,manuell,90,golf,"150,000km",12,diesel,volkswagen,nein,2016-03-16 00:00:00,0,53474,2016-04-07 03:17:32
8,2016-03-22 16:51:34,Seat_Arosa,privat,Angebot,$250,test,,2000,manuell,0,arosa,"150,000km",10,,seat,nein,2016-03-22 00:00:00,0,7426,2016-03-26 18:18:10
9,2016-03-16 13:47:02,Renault_Megane_Scenic_1.6e_RT_Klimaanlage,privat,Angebot,$590,control,bus,1997,manuell,90,megane,"150,000km",7,benzin,renault,nein,2016-03-16 00:00:00,0,15749,2016-04-06 10:46:35


In [3]:
print(autos.info())
print("\n")
print(autos.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

There are 
- 2 data types in the data set: string(15 columns) and integer(5 columns), 
- 5 of string columns have null values, but none have more than ~20% null values. 
- the column names use camelcase instead of Python's preferred snakecase, which means we can't just replace spaces with underscores.

Let's convert the column names from camelcase to snakecase and reword some of the column names based on the data dictionary to be more descriptive.

## Clean Columns

In [4]:
print(autos.columns)

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')


We'll make a few changes here:

- Change the columns from camelcase to snakecase.
- Change a few wordings to more accurately describe the columns.

In [5]:
new_columns=['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen']
autos.columns=new_columns
print(autos.head())

          date_crawled                                               name  \
0  2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   
1  2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   
2  2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   
3  2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   
4  2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   

   seller offer_type   price   abtest vehicle_type  registration_year  \
0  privat    Angebot  $5,000  control          bus               2004   
1  privat    Angebot  $8,500  control    limousine               1997   
2  privat    Angebot  $8,990     test    limousine               2009   
3  privat    Angebot  $4,350  control   kleinwagen               2007   
4  privat    Angebot  $1,350     test        kombi               2003   

     gearbox  power_ps   model   odometer  registration_month fuel_type  \
0    manuell       158 

## Initial Data Exploration and Cleaning

We'll start by exploring the data to find obvious areas where we can clean the data.

In [6]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-04-04 16:40:33,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


In [7]:
print(autos["seller"].value_counts())
print("\n")
print(autos["offer_type"].value_counts())
print("\n")
print(autos["nr_of_pictures"].value_counts())

privat        49999
gewerblich        1
Name: seller, dtype: int64


Angebot    49999
Gesuch         1
Name: offer_type, dtype: int64


0    50000
Name: nr_of_pictures, dtype: int64


There are above 3 columns: *seller*, *offer_type* and *nr_of_pictures* that have mostly one value that are candidates to be dropped.

In [8]:
autos=autos.drop(["nr_of_pictures", "seller", "offer_type"], axis=1)

Also we can notice 2 columns: *price* and *odometer* are numeric data stored as text that needs to be cleaned and converted to a numeric dtype.

In [9]:
print(autos["price"].unique())

['$5,000' '$8,500' '$8,990' ... '$385' '$22,200' '$16,995']


In [10]:
autos["price"] = autos["price"].str.replace('$','').str.replace(',','').astype(int)
autos.rename({"price": "price_$"}, axis=1, inplace=True)
print(autos["price_$"].head())

0    5000
1    8500
2    8990
3    4350
4    1350
Name: price_$, dtype: int64


In [11]:
print(autos["odometer"].unique())

['150,000km' '70,000km' '50,000km' '80,000km' '10,000km' '30,000km'
 '125,000km' '90,000km' '20,000km' '60,000km' '5,000km' '100,000km'
 '40,000km']


In [12]:
autos["odometer"] = autos["odometer"].str.replace('km','').str.replace(',','').astype(int)
autos.rename({"odometer": "odometer_km"}, axis=1, inplace=True)
print(autos["odometer_km"].unique())

[150000  70000  50000  80000  10000  30000 125000  90000  20000  60000
   5000 100000  40000]


There are some columns that need more investigation:
- registration_year
- power_ps
- registration_month

In [13]:
print(autos["registration_year"].unique())

[2004 1997 2009 2007 2003 2006 1995 1998 2000 2017 2010 1999 1982 1990
 2015 2014 1996 1992 2005 2002 2012 2011 2008 1985 2016 1994 1986 2001
 2018 2013 1972 1993 1988 1989 1967 1973 1956 1976 4500 1987 1991 1983
 1960 1969 1950 1978 1980 1984 1963 1977 1961 1968 1934 1965 1971 1966
 1979 1981 1970 1974 1910 1975 5000 4100 2019 1959 9996 9999 6200 1964
 1958 1800 1948 1931 1943 9000 1941 1962 1927 1937 1929 1000 1957 1952
 1111 1955 1939 8888 1954 1938 2800 5911 1500 1953 1951 4800 1001]


Above we can see strange data for year as 1500, 4800 etc

In [14]:
print(autos["power_ps"].unique())

[  158   286   102    71     0   150    90   101    75   203   205   124
   131   320   184   120   306   116    54   110   204    60   218   140
   193   122    86    43    95   435    84   224    58    65    80   121
   174   179    57   136   108    99   170   143   113   177   163   385
    63   200   151   106    85   109   145   127   250   105    50   126
   160   183    81   125   141   239   560    55    88   344   232    98
   135    45    73   129   103    64   436    53    68   211   245   128
   241   115   231   325   235   197   133   111   260    82    83   185
   173    92    61   190    74   100    44    76    70   147   225   107
    69   517   230   315     1   370   360   354   155   258   326    33
    94   192   330   156   313    52   130   180   117   275   114   256
   299   207   104   118    87   394   220   272   175   280   420   333
    77   165    66    89   148     5   400   188   171   247    59    26
    67   138   387   265    40   367   401   345   

We can see strange too small and too high values for the autos power data.

In [15]:
print(autos["registration_month"].value_counts())

0     5075
3     5071
6     4368
5     4107
4     4102
7     3949
10    3651
12    3447
9     3389
11    3360
1     3282
8     3191
2     3008
Name: registration_month, dtype: int64


There are 5075 rows with 0 month above.

## Exploring Odometer and Price

In [16]:
print(autos["odometer_km"].unique().shape)

(13,)


In [17]:
print(autos["odometer_km"].describe())

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64


In [18]:
print(autos["odometer_km"].value_counts().sort_index())

5000        967
10000       264
20000       784
30000       789
40000       819
50000      1027
60000      1164
70000      1230
80000      1436
90000      1757
100000     2169
125000     5170
150000    32424
Name: odometer_km, dtype: int64


There is no need to do anything with **odometer_km** data, numbers seem to be realistic. Let's look at **price** column.

In [19]:
print(autos["price_$"].unique().shape)
# shows number of unique values

(2357,)


In [20]:
print(autos["price_$"].describe())

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price_$, dtype: float64


In [21]:
print(autos["price_$"].value_counts().sort_index().head(50))

0      1421
1       156
2         3
3         1
5         2
8         1
9         1
10        7
11        2
12        3
13        2
14        1
15        2
17        3
18        1
20        4
25        5
29        1
30        7
35        1
40        6
45        4
47        1
49        4
50       49
55        2
59        1
60        9
65        5
66        1
70       10
75        5
79        1
80       15
89        1
90        5
99       19
100     134
110       3
111       2
115       2
117       1
120      39
122       1
125       8
129       1
130      15
135       1
139       1
140       9
Name: price_$, dtype: int64


In [22]:
print(autos["price_$"].value_counts().sort_index(ascending=False).head(50))

99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
3890000     1
1300000     1
1234566     1
999999      2
999990      1
350000      1
345000      1
299000      1
295000      1
265000      1
259000      1
250000      1
220000      1
198000      1
197000      1
194000      1
190000      1
180000      1
175000      1
169999      1
169000      1
163991      1
163500      1
155000      1
151990      1
145000      1
139997      1
137999      1
135000      1
130000      1
129000      1
128000      1
120000      2
119900      1
119500      1
116000      1
115991      1
115000      1
114400      1
109999      1
105000      2
104900      1
99900       2
99000       2
98500       1
Name: price_$, dtype: int64


For the **price_$** column we can see that there are as too small unrealistical alues as well as too high. Given that eBay is an auction site, there could legitimately be items where the opening bid is \$1. We will keep the \$1 items, but remove anything above \$350,000.

In [23]:
autos=autos[autos["price_$"].between(1,351000)]
print(autos["price_$"].describe())

count     48565.000000
mean       5888.935591
std        9059.854754
min           1.000000
25%        1200.000000
50%        3000.000000
75%        7490.000000
max      350000.000000
Name: price_$, dtype: float64


## Exploring the date columns

In [24]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,price_$,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,postal_code,last_seen
count,48565,48565,48565.0,48565,43979,48565.0,46222,48565.0,46107,48565.0,48565.0,44535,48565,39464,48565,48565.0,48565
unique,46882,37470,,2,8,,2,,245,,,7,40,2,76,,38474
top,2016-03-11 22:38:16,Ford_Fiesta,,test,limousine,,manuell,,golf,,,benzin,volkswagen,nein,2016-04-03 00:00:00,,2016-04-07 06:17:27
freq,3,76,,25019,12598,,36102,,3900,,,29368,10336,34775,1887,,8
mean,,,5888.935591,,,2004.755421,,117.197158,,125770.101925,5.782251,,,,,50975.745207,
std,,,9059.854754,,,88.643887,,200.649618,,39788.636804,3.685595,,,,,25746.968398,
min,,,1.0,,,1000.0,,0.0,,5000.0,0.0,,,,,1067.0,
25%,,,1200.0,,,1999.0,,71.0,,125000.0,3.0,,,,,30657.0,
50%,,,3000.0,,,2004.0,,107.0,,150000.0,6.0,,,,,49716.0,
75%,,,7490.0,,,2008.0,,150.0,,150000.0,9.0,,,,,71665.0,


There are 5 columns that represent date values:
- date_crawled
- registration_year
- registration_month
- ad_created
- last_seen

where **registration_year** and **registration_month** are numeric values, when others are strings.

In [25]:
autos.loc[:5, ["date_crawled", "ad_created", "last_seen", "registration_month", "registration_year"]]

Unnamed: 0,date_crawled,ad_created,last_seen,registration_month,registration_year
0,2016-03-26 17:47:46,2016-03-26 00:00:00,2016-04-06 06:45:54,3,2004
1,2016-04-04 13:38:56,2016-04-04 00:00:00,2016-04-06 14:45:08,6,1997
2,2016-03-26 18:57:24,2016-03-26 00:00:00,2016-04-06 20:15:37,7,2009
3,2016-03-12 16:58:10,2016-03-12 00:00:00,2016-03-15 03:16:28,6,2007
4,2016-04-01 14:38:50,2016-04-01 00:00:00,2016-04-01 14:38:50,7,2003
5,2016-03-21 13:47:45,2016-03-21 00:00:00,2016-04-06 09:45:21,4,2006


First 3 columns with string values represnt full timestamp values. To understand the date range, we can extract just the date values. You can notice that the first 10 characters represent the day e.g. 2016-03-12. To select the first 10 characters in each column, we can use Series.str[:10].

In [26]:
print(autos["date_crawled"].str[:10].value_counts(normalize=True, dropna=False).sort_index())

2016-03-05    0.025327
2016-03-06    0.014043
2016-03-07    0.036014
2016-03-08    0.033296
2016-03-09    0.033090
2016-03-10    0.032184
2016-03-11    0.032575
2016-03-12    0.036920
2016-03-13    0.015670
2016-03-14    0.036549
2016-03-15    0.034284
2016-03-16    0.029610
2016-03-17    0.031628
2016-03-18    0.012911
2016-03-19    0.034778
2016-03-20    0.037887
2016-03-21    0.037373
2016-03-22    0.032987
2016-03-23    0.032225
2016-03-24    0.029342
2016-03-25    0.031607
2016-03-26    0.032204
2016-03-27    0.031092
2016-03-28    0.034860
2016-03-29    0.034099
2016-03-30    0.033687
2016-03-31    0.031834
2016-04-01    0.033687
2016-04-02    0.035478
2016-04-03    0.038608
2016-04-04    0.036487
2016-04-05    0.013096
2016-04-06    0.003171
2016-04-07    0.001400
Name: date_crawled, dtype: float64


So, website was crawled everyday from 05 March till 07 April 2016 with almost the same volume of listings everyday.

In [27]:
(autos["ad_created"]
.str[:10]
.value_counts(normalize=False, dropna=False)
.sort_index())

2015-06-11       1
2015-08-10       1
2015-09-09       1
2015-11-10       1
2015-12-05       1
2015-12-30       1
2016-01-03       1
2016-01-07       1
2016-01-10       2
2016-01-13       1
2016-01-14       1
2016-01-16       1
2016-01-22       1
2016-01-27       3
2016-01-29       1
2016-02-01       1
2016-02-02       2
2016-02-05       2
2016-02-07       1
2016-02-08       1
2016-02-09       1
2016-02-11       1
2016-02-12       2
2016-02-14       2
2016-02-16       1
2016-02-17       1
2016-02-18       2
2016-02-19       3
2016-02-20       2
2016-02-21       3
              ... 
2016-03-09    1610
2016-03-10    1549
2016-03-11    1598
2016-03-12    1785
2016-03-13     826
2016-03-14    1709
2016-03-15    1652
2016-03-16    1463
2016-03-17    1519
2016-03-18     660
2016-03-19    1636
2016-03-20    1843
2016-03-21    1825
2016-03-22    1593
2016-03-23    1557
2016-03-24    1422
2016-03-25    1542
2016-03-26    1567
2016-03-27    1505
2016-03-28    1699
2016-03-29    1653
2016-03-30  

Most listings were created on crawled dates March-April 2016, but a few are quite old, with the oldest at around 9 months.

In [28]:
print(autos["last_seen"].str[:10].unique().shape)
(autos["last_seen"]
.str[:10]
.value_counts(normalize=False, dropna=False)
.sort_index())

(34,)


2016-03-05       52
2016-03-06      210
2016-03-07      262
2016-03-08      360
2016-03-09      466
2016-03-10      518
2016-03-11      601
2016-03-12     1155
2016-03-13      432
2016-03-14      612
2016-03-15      771
2016-03-16      799
2016-03-17     1364
2016-03-18      357
2016-03-19      769
2016-03-20     1003
2016-03-21     1002
2016-03-22     1038
2016-03-23      900
2016-03-24      960
2016-03-25      933
2016-03-26      816
2016-03-27      760
2016-03-28     1013
2016-03-29     1085
2016-03-30     1203
2016-03-31     1155
2016-04-01     1107
2016-04-02     1210
2016-04-03     1224
2016-04-04     1189
2016-04-05     6059
2016-04-06    10772
2016-04-07     6408
Name: last_seen, dtype: int64

The crawler last saw these ads just on the same dates they were crawled with the splash of qty of last seen on last 3 days of crawled period that was identified on 05 March till 07 April 2016.

In [29]:
(autos["registration_year"].describe())

count    48565.000000
mean      2004.755421
std         88.643887
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

The year that the car was first registered will likely indicate the age of the car. Looking at this column, we note some odd values. The minimum value is 1000, long before cars were created and the maximum is 9999, many years into the future.

## Dealing with Incorrect Registration Year Data

Because a car can't be first registered after the listing was seen, any vehicle with a registration year above 2016 is definitely inaccurate. Determining the earliest valid year is more difficult. Realistically, it could be somewhere in the first few decades of the 1900s.

Let's count the number of listings with cars that fall outside the 1900 - 2016 interval and see if it's safe to remove those rows entirely, or if we need more custom logic.

In [30]:
(~autos["registration_year"].between(1900,2016)
 .value_counts())*100/autos.shape[0]

True    -96.122722
False    -3.881396
Name: registration_year, dtype: float64

Listings that are out of the selected years range consist less than 4% of the data set, so we can remove them.

In [31]:
autos=autos[autos["registration_year"].between(1900,2016)]
print(autos["registration_year"]
       .value_counts(print(autos["registration_year"]
       .value_counts(normalize=True))))

2000    0.067608
2005    0.062895
1999    0.062060
2004    0.057904
2003    0.057818
2006    0.057197
2001    0.056468
2002    0.053255
1998    0.050620
2007    0.048778
2008    0.047450
2009    0.044665
1997    0.041794
2011    0.034768
2010    0.034040
1996    0.029412
2012    0.028063
1995    0.026285
2016    0.026135
2013    0.017202
2014    0.014203
1994    0.013474
1993    0.009104
2015    0.008397
1992    0.007926
1990    0.007433
1991    0.007262
1989    0.003727
1988    0.002892
1985    0.002035
          ...   
1966    0.000471
1976    0.000450
1969    0.000407
1975    0.000386
1965    0.000364
1964    0.000257
1963    0.000171
1959    0.000129
1961    0.000129
1910    0.000107
1956    0.000086
1958    0.000086
1937    0.000086
1962    0.000086
1950    0.000064
1954    0.000043
1941    0.000043
1951    0.000043
1934    0.000043
1957    0.000043
1955    0.000043
1953    0.000021
1943    0.000021
1929    0.000021
1939    0.000021
1938    0.000021
1948    0.000021
1927    0.0000

In [32]:
print((autos["registration_year"]
       .value_counts(normalize=True)>0.005).sort_index())

1910    False
1927    False
1929    False
1931    False
1934    False
1937    False
1938    False
1939    False
1941    False
1943    False
1948    False
1950    False
1951    False
1952    False
1953    False
1954    False
1955    False
1956    False
1957    False
1958    False
1959    False
1960    False
1961    False
1962    False
1963    False
1964    False
1965    False
1966    False
1967    False
1968    False
        ...  
1987    False
1988    False
1989    False
1990     True
1991     True
1992     True
1993     True
1994     True
1995     True
1996     True
1997     True
1998     True
1999     True
2000     True
2001     True
2002     True
2003     True
2004     True
2005     True
2006     True
2007     True
2008     True
2009     True
2010     True
2011     True
2012     True
2013     True
2014     True
2015     True
2016     True
Name: registration_year, Length: 78, dtype: bool


The most qty of listings are for autos registered since 1990 till 2016 years.

## Exploring Price by Brand

In [33]:
print(autos["brand"].value_counts().shape)

(autos["brand"].value_counts(normalize=True)*100)

(40,)


volkswagen        21.126368
bmw               11.004477
opel              10.758124
mercedes_benz      9.646323
audi               8.656627
ford               6.989996
renault            4.714980
peugeot            2.984083
fiat               2.564212
seat               1.827296
skoda              1.640925
nissan             1.527388
mazda              1.518819
smart              1.415994
citroen            1.400998
toyota             1.270324
hyundai            1.002549
sonstige_autos     0.981127
volvo              0.914719
mini               0.876159
mitsubishi         0.822604
honda              0.784045
kia                0.706926
alfa_romeo         0.664082
porsche            0.612669
suzuki             0.593389
chevrolet          0.569825
chrysler           0.351321
dacia              0.263490
daihatsu           0.250637
jeep               0.227073
subaru             0.214220
land_rover         0.209936
saab               0.164949
jaguar             0.156381
daewoo             0

There are 40 different brands in the dataset. The first 4 brands consists more than 50% of overal listings with **volkswagen** on the top with the 21% that is the same as for second and third brands together.

Let's explore mean price for the most popular mass brands, so we will limit our analysis to brands representing more than 5% of total listings.

In [34]:
unique_brands=autos["brand"].value_counts(normalize=True)
top_brands=unique_brands[unique_brands>0.05].index
print(top_brands)

Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')


In [35]:
mean_price_by_brand={}

for b in top_brands:
    selected_rows=autos[autos["brand"]==b]
    mean_price=selected_rows["price_$"].mean()
    mean_price_by_brand[b]=int(mean_price)


def display_table(table):
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(mean_price_by_brand)

audi : 9336
mercedes_benz : 8628
bmw : 8332
volkswagen : 5402
ford : 3749
opel : 2975


We can see that the most popupar brand are not the cheapest, but vice versa, top 3 expensive brands are enough mass saled:
- audi
- mercedes_benz
- bmw

The most popular **volkswagen** that moderate price that seems makes it so well-knows among used cars.

## Exploring Mileage

In [36]:
mean_mileage_by_brand={}

for b in top_brands:
    selected_rows=autos[autos["brand"]==b]
    mean_mileage=selected_rows["odometer_km"].mean()
    mean_mileage_by_brand[b]=int(mean_mileage)
    
display_table(mean_mileage_by_brand)

bmw : 132572
mercedes_benz : 130788
opel : 129310
audi : 129157
volkswagen : 128707
ford : 124266


In [37]:
mpb_ser=pd.Series(mean_price_by_brand).sort_values(ascending=False)
mmb_ser=pd.Series(mean_mileage_by_brand).sort_values(ascending=False)
print(mpb_ser)
print(mmb_ser)

audi             9336
mercedes_benz    8628
bmw              8332
volkswagen       5402
ford             3749
opel             2975
dtype: int64
bmw              132572
mercedes_benz    130788
opel             129310
audi             129157
volkswagen       128707
ford             124266
dtype: int64


In [38]:
mean_for_topbrands = pd.DataFrame(mmb_ser, columns=['mean_mileage'])
mean_for_topbrands["mean_price"]=mpb_ser
mean_for_topbrands

Unnamed: 0,mean_mileage,mean_price
bmw,132572,8332
mercedes_benz,130788,8628
opel,129310,2975
audi,129157,9336
volkswagen,128707,5402
ford,124266,3749


Seems mean mileage values do not have strong corelation with mean price, for all brand mean mileage is about 128-132K km. But there is a slight trend to the more expensive vehicles having higher mileage, with the less expensive vehicles having lower mileage.

## Cleaning German words

In [39]:
autos

Unnamed: 0,date_crawled,name,price_$,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,5000,control,bus,2004,manuell,158,andere,150000,3,lpg,peugeot,nein,2016-03-26 00:00:00,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,8500,control,limousine,1997,automatik,286,7er,150000,6,benzin,bmw,nein,2016-04-04 00:00:00,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,8990,test,limousine,2009,manuell,102,golf,70000,7,benzin,volkswagen,nein,2016-03-26 00:00:00,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,4350,control,kleinwagen,2007,automatik,71,fortwo,70000,6,benzin,smart,nein,2016-03-12 00:00:00,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,1350,test,kombi,2003,manuell,0,focus,150000,7,benzin,ford,nein,2016-04-01 00:00:00,39218,2016-04-01 14:38:50
5,2016-03-21 13:47:45,Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...,7900,test,bus,2006,automatik,150,voyager,150000,4,diesel,chrysler,,2016-03-21 00:00:00,22962,2016-04-06 09:45:21
6,2016-03-20 17:55:21,VW_Golf_III_GT_Special_Electronic_Green_Metall...,300,test,limousine,1995,manuell,90,golf,150000,8,benzin,volkswagen,,2016-03-20 00:00:00,31535,2016-03-23 02:48:59
7,2016-03-16 18:55:19,Golf_IV_1.9_TDI_90PS,1990,control,limousine,1998,manuell,90,golf,150000,12,diesel,volkswagen,nein,2016-03-16 00:00:00,53474,2016-04-07 03:17:32
8,2016-03-22 16:51:34,Seat_Arosa,250,test,,2000,manuell,0,arosa,150000,10,,seat,nein,2016-03-22 00:00:00,7426,2016-03-26 18:18:10
9,2016-03-16 13:47:02,Renault_Megane_Scenic_1.6e_RT_Klimaanlage,590,control,bus,1997,manuell,90,megane,150000,7,benzin,renault,nein,2016-03-16 00:00:00,15749,2016-04-06 10:46:35


In [40]:
(autos["unrepaired_damage"].value_counts())

nein    33834
ja       4540
Name: unrepaired_damage, dtype: int64

In [41]:
corrections = {
    "nein": "no",
    "ja": "yes"}
autos["unrepaired_damage"] = autos["unrepaired_damage"].map(corrections)
(autos["unrepaired_damage"].value_counts())

no     33834
yes     4540
Name: unrepaired_damage, dtype: int64

In [42]:
(autos["fuel_type"].value_counts())

benzin     28540
diesel     14032
lpg          649
cng           71
hybrid        37
elektro       19
andere        15
Name: fuel_type, dtype: int64

In [43]:
corrections = {
    "benzin": "benzine",
    "diesel": "diesel",
    "lpg": "lpg gas",
    "cng": "gas",
    "hybrid": "hybrid",
     "elektro": "elektro",
      "andere": "other"}
autos["fuel_type"] = autos["fuel_type"].map(corrections)
(autos["fuel_type"].value_counts())

benzine    28540
diesel     14032
lpg gas      649
gas           71
hybrid        37
elektro       19
other         15
Name: fuel_type, dtype: int64

In [44]:
(autos["gearbox"].value_counts())

manuell      34715
automatik     9856
Name: gearbox, dtype: int64

In [45]:
corrections = {
    "manuell": "manual",
    "automatik": "automatic"}
autos["gearbox"] = autos["gearbox"].map(corrections)
(autos["gearbox"].value_counts())

manual       34715
automatic     9856
Name: gearbox, dtype: int64

In [46]:
(autos["vehicle_type"].value_counts())

limousine     12598
kleinwagen    10585
kombi          8930
bus            4031
cabrio         3016
coupe          2462
suv            1965
andere          390
Name: vehicle_type, dtype: int64

In [47]:
corrections = {
    "limousine": "limo",
    "kleinwagen": "hatchback",
    "kombi": "kombi",
    "bus": "bus",
    "cabrio": "cabrio",
     "coupe": "coupe",
     "suv": "suv",
      "andere": "other"}
autos["vehicle_type"] = autos["vehicle_type"].map(corrections)
(autos["vehicle_type"].value_counts())

limo         12598
hatchback    10585
kombi         8930
bus           4031
cabrio        3016
coupe         2462
suv           1965
other          390
Name: vehicle_type, dtype: int64

## Converting the dates

In [48]:
autos["date_crawled"]=autos["date_crawled"].str.strip().str[:10].str.replace("-", "").astype(int)
print(autos["date_crawled"].unique().shape)
(autos["date_crawled"].value_counts(dropna=False).sort_index())

(34,)


20160305    1176
20160306     661
20160307    1692
20160308    1566
20160309    1552
20160310    1505
20160311    1515
20160312    1719
20160313     741
20160314    1696
20160315    1604
20160316    1377
20160317    1484
20160318     598
20160319    1618
20160320    1775
20160321    1742
20160322    1533
20160323    1503
20160324    1376
20160325    1471
20160326    1497
20160327    1437
20160328    1615
20160329    1592
20160330    1578
20160331    1484
20160401    1578
20160402    1660
20160403    1810
20160404    1709
20160405     607
20160406     144
20160407      66
Name: date_crawled, dtype: int64

In [49]:
autos["ad_created"]=autos["ad_created"].str.strip().str[:10].str.replace("-", "").astype(int)
print(autos["ad_created"].unique().shape)
(autos["ad_created"].value_counts(dropna=False).sort_index())

(74,)


20150611       1
20150810       1
20150909       1
20151110       1
20151205       1
20151230       1
20160103       1
20160107       1
20160110       2
20160113       1
20160114       1
20160116       1
20160127       2
20160201       1
20160202       2
20160205       2
20160207       1
20160208       1
20160209       1
20160211       1
20160212       2
20160214       2
20160216       1
20160217       1
20160218       2
20160219       3
20160220       2
20160221       3
20160222       1
20160223       4
            ... 
20160309    1554
20160310    1492
20160311    1529
20160312    1711
20160313     802
20160314    1634
20160315    1595
20160316    1397
20160317    1469
20160318     630
20160319    1569
20160320    1777
20160321    1752
20160322    1521
20160323    1498
20160324    1374
20160325    1479
20160326    1498
20160327    1429
20160328    1621
20160329    1591
20160330    1569
20160331    1488
20160401    1579
20160402    1643
20160403    1821
20160404    1725
20160405     5

In [50]:
autos["last_seen"]=autos["last_seen"].str.strip().str[:10].str.replace("-", "").astype(int)
print(autos["last_seen"].unique().shape)
(autos["last_seen"].value_counts(dropna=False).sort_index())

(34,)


20160305       50
20160306      192
20160307      251
20160308      349
20160309      456
20160310      499
20160311      578
20160312     1109
20160313      404
20160314      591
20160315      747
20160316      760
20160317     1311
20160318      337
20160319      729
20160320      963
20160321      961
20160322      973
20160323      857
20160324      919
20160325      884
20160326      784
20160327      730
20160328      966
20160329     1031
20160330     1149
20160331     1103
20160401     1071
20160402     1151
20160403     1174
20160404     1126
20160405     5854
20160406    10425
20160407     6197
Name: last_seen, dtype: int64

In [51]:
autos

Unnamed: 0,date_crawled,name,price_$,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,postal_code,last_seen
0,20160326,Peugeot_807_160_NAVTECH_ON_BOARD,5000,control,bus,2004,manual,158,andere,150000,3,lpg gas,peugeot,no,20160326,79588,20160406
1,20160404,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,8500,control,limo,1997,automatic,286,7er,150000,6,benzine,bmw,no,20160404,71034,20160406
2,20160326,Volkswagen_Golf_1.6_United,8990,test,limo,2009,manual,102,golf,70000,7,benzine,volkswagen,no,20160326,35394,20160406
3,20160312,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,4350,control,hatchback,2007,automatic,71,fortwo,70000,6,benzine,smart,no,20160312,33729,20160315
4,20160401,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,1350,test,kombi,2003,manual,0,focus,150000,7,benzine,ford,no,20160401,39218,20160401
5,20160321,Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...,7900,test,bus,2006,automatic,150,voyager,150000,4,diesel,chrysler,,20160321,22962,20160406
6,20160320,VW_Golf_III_GT_Special_Electronic_Green_Metall...,300,test,limo,1995,manual,90,golf,150000,8,benzine,volkswagen,,20160320,31535,20160323
7,20160316,Golf_IV_1.9_TDI_90PS,1990,control,limo,1998,manual,90,golf,150000,12,diesel,volkswagen,no,20160316,53474,20160407
8,20160322,Seat_Arosa,250,test,,2000,manual,0,arosa,150000,10,,seat,no,20160322,7426,20160326
9,20160316,Renault_Megane_Scenic_1.6e_RT_Klimaanlage,590,control,bus,1997,manual,90,megane,150000,7,benzine,renault,no,20160316,15749,20160406


## The most common brand/model combinations

In [52]:
unique_brands=autos["brand"].unique()
print(unique_brands.shape)
unique_brands

(40,)


array(['peugeot', 'bmw', 'volkswagen', 'smart', 'ford', 'chrysler',
       'seat', 'renault', 'mercedes_benz', 'audi', 'sonstige_autos',
       'opel', 'mazda', 'porsche', 'mini', 'toyota', 'dacia', 'nissan',
       'jeep', 'saab', 'volvo', 'mitsubishi', 'jaguar', 'fiat', 'skoda',
       'subaru', 'kia', 'citroen', 'chevrolet', 'hyundai', 'honda',
       'daewoo', 'suzuki', 'trabant', 'land_rover', 'alfa_romeo', 'lada',
       'rover', 'daihatsu', 'lancia'], dtype=object)

In [53]:
unique_models=autos["model"].unique()
print(unique_models.shape)
unique_models

(245,)


array(['andere', '7er', 'golf', 'fortwo', 'focus', 'voyager', 'arosa',
       'megane', nan, 'a3', 'clio', 'vectra', 'scirocco', '3er', 'a4',
       '911', 'cooper', '5er', 'polo', 'e_klasse', '2_reihe', 'c_klasse',
       'corsa', 'mondeo', 'altea', 'a1', 'twingo', 'a_klasse', 'cl',
       '3_reihe', 's_klasse', 'sandero', 'passat', 'primera', 'wrangler',
       'a6', 'transporter', 'astra', 'v40', 'ibiza', 'micra', '1er',
       'yaris', 'colt', '6_reihe', '5_reihe', 'corolla', 'ka', 'tigra',
       'punto', 'vito', 'cordoba', 'galaxy', '100', 'octavia', 'm_klasse',
       'lupo', 'fiesta', 'superb', 'meriva', 'c_max', 'laguna', 'touran',
       '1_reihe', 'm_reihe', 'touareg', 'seicento', 'avensis', 'vivaro',
       'x_reihe', 'ducato', 'carnival', 'boxster', 'signum', 'sharan',
       'zafira', 'rav', 'a5', 'beetle', 'c_reihe', 'phaeton', 'i_reihe',
       'sl', 'insignia', 'up', 'civic', '80', 'mx_reihe', 'omega',
       'sorento', 'z_reihe', 'berlingo', 'clk', 's_max', 'kalos',
 

In [54]:
top_models_by_brand=[]
for brand in unique_brands:
    for model in unique_models:
        combined=(autos["brand"]==brand) & (autos["model"]==model)
        counted=autos.loc[combined, ["brand","model"]].shape
        qty,c=counted
        if qty!=0:
            top_models_by_brand.append([brand,model,qty])  
    top_models_by_brand=sorted(top_models_by_brand,key=lambda x: (x[2]), reverse=True) 

    
(top_models_by_brand[:10])

[['volkswagen', 'golf', 3707],
 ['bmw', '3er', 2615],
 ['volkswagen', 'polo', 1609],
 ['opel', 'corsa', 1592],
 ['volkswagen', 'passat', 1349],
 ['opel', 'astra', 1348],
 ['audi', 'a4', 1231],
 ['mercedes_benz', 'c_klasse', 1136],
 ['bmw', '5er', 1132],
 ['mercedes_benz', 'e_klasse', 958]]

In [55]:
unique_brands=autos["brand"].value_counts(normalize=True)
top_brands=unique_brands[unique_brands>0.05].index
print(top_brands)

Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')


In [56]:
unique_models=autos["model"].value_counts(normalize=True)
top_models=unique_models[unique_models>0.02].index
print(top_models)

Index(['golf', 'andere', '3er', 'polo', 'corsa', 'passat', 'astra', 'a4',
       'c_klasse', '5er', 'e_klasse'],
      dtype='object')


In [57]:
rank_by_top_brands=[]
for brand in top_brands:
    all_models_by_brand=[]
    for model in top_models:
        combined=(autos["brand"]==brand) & (autos["model"]==model)
        counted=autos.loc[combined, ["brand","model"]].shape
        qty,c=counted
        if qty!=0:
            all_models_by_brand.append([brand,model,qty])  
        all_models_by_brand=sorted(all_models_by_brand,key=lambda x: (x[2]), reverse=True) 
    rank_by_top_brands.append(all_models_by_brand[0])
    
(rank_by_top_brands)

[['volkswagen', 'golf', 3707],
 ['bmw', '3er', 2615],
 ['opel', 'corsa', 1592],
 ['mercedes_benz', 'c_klasse', 1136],
 ['audi', 'a4', 1231],
 ['ford', 'andere', 190]]

## Mean price by milage

In [62]:
unique_km=autos["odometer_km"].unique()
unique_km

array([150000,  70000,  50000,  80000,  10000,  30000, 125000,  90000,
        20000,  60000,   5000,  40000, 100000])

In [68]:
range1=autos["odometer_km"]<=50000
autos.loc[range1, "odometer_km_split"]="5K-50K"
range2=(autos["odometer_km"]>50000) & (autos["odometer_km"]<=100000)
autos.loc[range2, "odometer_km_split"]="51K-100K"
range3=autos["odometer_km"]>100000
autos.loc[range3, "odometer_km_split"]="101K-150K"

unique_milage=autos["odometer_km_split"].unique()
unique_milage

array(['101K-150K', '51K-100K', '5K-50K'], dtype=object)

In [72]:
mean_price_by_milage={}
for mil in unique_milage:
    selected_rows=autos["odometer_km_split"]==mil
    mp=autos.loc[selected_rows, "price_$"].mean()
    mean_price_by_milage[mil]=int(mp)
(mean_price_by_milage)

{'101K-150K': 4107, '51K-100K': 9595, '5K-50K': 14890}

## Price by damage

In [73]:
(autos["unrepaired_damage"].value_counts())

no     33834
yes     4540
Name: unrepaired_damage, dtype: int64

In [102]:
print(autos.loc[autos["brand"].isnull(), "brand"].value_counts(dropna=False))
print(autos.loc[autos["model"].isnull(), "model"].value_counts(dropna=False))

Series([], Name: brand, dtype: int64)
NaN    2193
Name: model, dtype: int64


In [103]:
null=autos["model"].isnull()
autos.loc[null, "model"]="Unknown"
print(autos.loc[autos["model"].isnull(), "model"].value_counts(dropna=False))

Series([], Name: model, dtype: int64)


In [104]:
autos["brand/model"]=autos["brand"]+str(" ")+autos["model"]
unique_bm=autos["brand/model"].unique()
print(unique_bm.shape)
unique_bm[:10]

(330,)


array(['peugeot andere', 'bmw 7er', 'volkswagen golf', 'smart fortwo',
       'ford focus', 'chrysler voyager', 'seat arosa', 'renault megane',
       'mercedes_benz Unknown', 'audi a3'], dtype=object)

In [111]:
mean_price_by_damaged={}
mean_price_by_nondamaged={}
for bm in unique_bm:
    damaged_rows=(autos["brand/model"]==bm) & (autos["unrepaired_damage"]=="yes")
    nondamaged_rows=(autos["brand/model"]==bm) & (autos["unrepaired_damage"]=="no")
    mpd=autos.loc[damaged_rows, "price_$"].mean()
    mpnd=autos.loc[nondamaged_rows, "price_$"].mean()
    mean_price_by_damaged[bm]=mpd
    mean_price_by_nondamaged[bm]=mpnd
    
mpd_series=pd.Series(mean_price_by_damaged)
price_per_damage = pd.DataFrame(mpd_series, columns=['mean_price_damaged'])
mpnd_series=pd.Series(mean_price_by_nondamaged)
price_per_damage["mean_price_non_damaged"]=mpnd_series
price_per_damage=price_per_damage.dropna()
price_per_damage

Unnamed: 0,mean_price_damaged,mean_price_non_damaged
alfa_romeo 147,1286.846154,2722.854545
alfa_romeo 156,1190.388889,1689.516667
alfa_romeo 159,7800.000000,6659.653846
alfa_romeo Unknown,333.333333,2359.800000
alfa_romeo andere,3548.625000,8074.409091
audi 100,1166.500000,2031.725000
audi 80,641.421053,1750.887755
audi Unknown,1562.500000,4724.520833
audi a2,2333.333333,3808.848485
audi a3,2769.447761,9263.882353


In [116]:
price_per_damage["delta"]=price_per_damage["mean_price_non_damaged"]-price_per_damage["mean_price_damaged"]
price_per_damage["delta"]=price_per_damage["delta"].astype(int)
price_per_damage

Unnamed: 0,mean_price_damaged,mean_price_non_damaged,delta
alfa_romeo 147,1286.846154,2722.854545,1436
alfa_romeo 156,1190.388889,1689.516667,499
alfa_romeo 159,7800.000000,6659.653846,-1140
alfa_romeo Unknown,333.333333,2359.800000,2026
alfa_romeo andere,3548.625000,8074.409091,4525
audi 100,1166.500000,2031.725000,865
audi 80,641.421053,1750.887755,1109
audi Unknown,1562.500000,4724.520833,3162
audi a2,2333.333333,3808.848485,1475
audi a3,2769.447761,9263.882353,6494


In [128]:
print(price_per_damage["delta"].mean())
(price_per_damage.iloc[:,2]).sort_values(ascending=False)

3701.9368029739776


porsche 911               53407
land_rover range_rover    23611
land_rover defender       20764
bmw m_reihe               20071
mercedes_benz g_klasse    19581
porsche andere            18661
mercedes_benz sl          18530
audi andere               18037
jeep wrangler             14109
ford mustang              13680
land_rover discovery      13429
mercedes_benz v_klasse    11850
kia sportage              11391
ford kuga                 11232
nissan navara             10246
audi q7                   10090
mercedes_benz s_klasse     9971
sonstige_autos Unknown     9330
opel andere                9097
mercedes_benz slk          8991
bmw andere                 8722
jaguar x_type              8707
jeep grand                 8636
mercedes_benz m_klasse     8567
mitsubishi outlander       8443
porsche cayenne            8252
audi a8                    8153
jeep Unknown               7800
mini cooper                7791
opel insignia              7693
                          ...  
nissan p