# Predicting Car Prices by using the k-nearest neighbors algorithm

In this project, we will predict a car's market price by using its attributes and applying the k-nearest neighbors algorithm. The date comes from https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data and contains technical information on various cars such as the weight of the car or the miles per gallon. Below we explain the data in more details:

- symboling: values from -3 to 3 intctemented by 1, its insurance risk rating, where 3 means that the car is risky and -3 that it is safe.
- normalized-losses: continous from 65 to 256, normalized value of relative average loss payment per insured vehicle year.
- make: alfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar, mazda, mercedes-benz, mercury, mitsubishi, nissan, peugot, plymouth, porsche, renault, saab, subaru, toyota, volkswagen, volvo 
- fuel-type: diesel, gas. 
- aspiration: std, turbo. 
- num-of-doors: four, two. 
- body-style: hardtop, wagon, sedan, hatchback, convertible. 
- drive-wheels: 4wd, fwd, rwd. 
- engine-location: front, rear. 
- wheel-base: continuous from 86.6 120.9. 
- length: continuous from 141.1 to 208.1. 
- width: continuous from 60.3 to 72.3. 
- height: continuous from 47.8 to 59.8. 
- curb-weight: continuous from 1488 to 4066. 
- engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor. 
- num-of-cylinders: eight, five, four, six, three, twelve, two. 
- engine-size: continuous from 61 to 326. 
- fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi. 
- bore: continuous from 2.54 to 3.94. 
- stroke: continuous from 2.07 to 4.17. 
- compression-ratio: continuous from 7 to 23. 
- horsepower: continuous from 48 to 288. 
- peak-rpm: continuous from 4150 to 6600. 
- city-mpg: continuous from 13 to 49. 
- highway-mpg: continuous from 16 to 54. 
- price: continuous from 5118 to 45400.

Let's import the libraries and explore the data!

## The Data Set exploration

In [1]:
import  pandas as pd
import  numpy as np

In [5]:
cols = ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 
        'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 
        'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-rate', 'horsepower', 'peak-rpm', 
        'city-mpg', 'highway-mpg', 'price']
cars = pd.read_csv(r'Data\imports-85.data.csv', names=cols)

In [6]:
#Making sure that all columns will be displayed.
pd.options.display.max_columns = 99
print(cars)

     symboling normalized-losses         make fuel-type aspiration  \
0            3                 ?  alfa-romero       gas        std   
1            3                 ?  alfa-romero       gas        std   
2            1                 ?  alfa-romero       gas        std   
3            2               164         audi       gas        std   
4            2               164         audi       gas        std   
5            2                 ?         audi       gas        std   
6            1               158         audi       gas        std   
7            1                 ?         audi       gas        std   
8            1               158         audi       gas      turbo   
9            0                 ?         audi       gas      turbo   
10           2               192          bmw       gas        std   
11           0               192          bmw       gas        std   
12           0               188          bmw       gas        std   
13           0      

In [7]:
#We will choose numeric columns, so thay can be used as features in our model. 
#The target column will be the price, as we want to predict a car's market price.
