# Focus Car Price Predict. RMSE + R2

1. [Import libraries](#libraries)
2. [Load data](#load_data)
3. [Preprocessing data](#preprocessing)
4. [Build model](#build_model)
5. [Evaluation](#evaluation)

<a id='libraries'></a>
## Import libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

In [None]:
df = pd.read_csv("../input/used-car-dataset-ford-and-mercedes/focus.csv")
df.head()

* **model**: car model name
* **year**: car production
* **price**: car price in £
* **transmission**: transmission type
* **mileage**: car mileage
* **fuelType**: fuel type
* **engineSize**: engine size in litres

<a id='preprocessing'></a>
## Preprocessing data

In [None]:
sns.heatmap(df.isnull())

**Data isnt have NA values**

In [None]:
df['model'].value_counts()

In [None]:
df.drop('model', axis=1, inplace=True)# I know that all data about one specific car model

In [None]:
f, axes = plt.subplots(2, 2, figsize=(18, 15))
sns.boxplot(y='year', data=df, ax=axes[0][0])
sns.boxplot(y='price', data=df, ax=axes[0][1])
sns.boxplot(y='mileage', data=df, ax=axes[1][0])
sns.boxplot(y='engineSize', data=df, ax=axes[1][1])

In [None]:
df.drop(df[df.engineSize < 0.5].index, inplace=True)# for remove rows with zeros engineSize

In [None]:
df.drop(df[df.year < 2014].index, inplace=True)
df.drop(df[df.price > 27000].index, inplace=True)
df.drop(df[df.mileage > 60000].index, inplace=True)

In [None]:
df['transmission'].value_counts()

In [None]:
transm = [1 if i=='Manual' else 0 if i=='Automatic' else 0.5 for i in df['transmission']]
df['transmission'] = transm

In [None]:
df['fuelType'].value_counts()

In [None]:
fuelt = [1 if i=='Petrol' else 0 for i in df['fuelType']]
df['fuelType'] = fuelt

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop('price', axis=1)
y = df['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

<a id='build_model'></a>
## Build model

In [None]:
from sklearn.linear_model import LinearRegression

slr = LinearRegression()
slr.fit(X_train, y_train)

<a id='evaluation'></a>
## Evaluation

In [None]:
y_pred = slr.predict(X_test)

In [None]:
print(f'Mean Squared Error = {np.sqrt(mean_squared_error(y_test,y_pred))}')

In [None]:
print(f'R-Squared = {r2_score(y_test,y_pred)}')