![Imgur](https://image.freepik.com/free-vector/automobiles-models-icon-collection_74855-5435.jpg)

<a href='https://www.freepik.com/'>models icon @freepik.com </a>


# Exploring Ebay Car Sales Data (Consolidated Version)

## Introduction

In this guided project, we'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.

The dataset was originally scraped and uploaded to Kaggle by user orgesleka, the original dataset isn't available on Kaggle anymore, but you can find it [here](https://data.world/data-society/used-cars-data).

This is the sort version, 50000 rows, the dataset we are going to work with is a dataset to which errors have been added to be able to work on cleaning it, so it is a little more difficult than the original.  

The aim of this project is **to clean the data** and analyze the included used car listings 


### Data dictionary:

- `dateCrawled` - When this ad was first crawled. All field-values are taken from this date.

- `name` - Name of the car.

- `seller` - Whether the seller is private or a dealer.

- `offerType` - The type of listing

- `price` - The price on the ad to sell the car.

- `abtest` - Whether the listing is included in an A/B test.

- `vehicleType` - The vehicle Type.

- `yearOfRegistration` - The year in which the car was first registered.

- `gearbox` - The transmission type.

- `powerPS` - The power of the car in PS.

- `model` - The car model name.

- `kilometer` - How many kilometers the car has driven.

- `monthOfRegistration` - The month in which the car was first registered.

- `fuelType` - What type of fuel the car uses.

- `brand` - The brand of the car.

- `notRepairedDamage`- If the car has a damage which is not yet repaired.

- `dateCreated` - The date on which the eBay listing was created.

- `nrOfPictures` - The number of pictures in the ad.

- `postalCode` - The postal code for the location of the vehicle.

- `lastSeenOnline` - When the crawler saw this ad last online.

In [None]:
import numpy as np
import pandas as pd
import chardet
import matplotlib.pyplot as plt
%matplotlib inline

### 1. Initial Data Loading and Inspection

In [None]:
autos = pd.read_csv("autos.csv",encoding='Windows-1252')
autos.info()

### 2. Cleaning Column Names

In [None]:
autos.rename({'yearOfRegistration':'registration_year','monthOfRegistration':'registration_month',
             'notRepairedDamage':'unrepaired_damage','dateCreated':'ad_created',},axis = 1, inplace = True)
autos.rename({'dateCrawled':'date_crawled','price':'price_in_dollars',
              'offerType':'offer_type','vehicleType':'vehicle_type',
              'powerPS':'CV','fuelType':'fuel_type',
              'nrOfPictures':'nr_pictures','postalCode':'postal_code',
              'lastSeen':'last_seen',}, axis = 1, inplace = True)

### 3. Initial Exploration and Dropping Irrelevant Columns

In [None]:
autos.describe(include = 'all')

In [None]:
columns_todrop = ['seller','offer_type','abtest','nr_pictures']
autos.drop(columns_todrop, axis=1, inplace=True)

### 4. Cleaning Numeric Columns

In [None]:
autos['price_in_dollars'] = autos['price_in_dollars'].str.replace('$', '').str.replace(',', '').astype(int)
autos.rename({'odometer':'odometer_km'}, axis = 1, inplace = True)
autos['odometer_km'] = autos['odometer_km'].str.replace(',','').str.replace('km','').astype(int)

### 5. Translating Categorical Columns

In [None]:
category_translator = {'bus':'monovolumen','limousine':'sedan', 'kleinwagen':'compacto','kombi':'familiar', 'coupe':'coupe','suv':'suv','cabrio':'cabrio','andere':'otros'}
autos['vehicle_type'] = autos['vehicle_type'].map(category_translator)

categorical_type_fuel_translator = {'benzin':'gasolina', 'diesel':'diesel', 'lpg':'lpg', 'cng':'cng', 'hybrid':'híbrido', 'elektro':'electrico', 'andere':'otros'}
autos['fuel_type'] = autos['fuel_type'].map(categorical_type_fuel_translator)

### 6. Outlier Analysis and Removal

In [None]:
autos = autos[autos['price_in_dollars'].between(850,350001)]
autos = autos[autos['registration_year'].between(1927,2016)]

### 7. Analysis of Price and Mileage by Brand

In [None]:
top_brands = autos['brand'].value_counts().index[:10]
brands_price = {}
brands_km = {}

for brand in top_brands:
    sel_brand = autos[autos['brand'] == brand]
    brands_price[brand] = sel_brand['price_in_dollars'].mean().round()
    brands_km[brand] = sel_brand['odometer_km'].mean().round()

brands_price_series = pd.Series(brands_price)
brands_km_series = pd.Series(brands_km)

brand_analysis = pd.DataFrame({'mean_price': brands_price_series, 'mean_kilometers': brands_km_series})
print(brand_analysis)