# Regression

## Second Hand Cars Prices

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import math

In [2]:
df = pd.read_csv(r'\Users\97254\Desktop\Second Hand Cars.csv')
df

Unnamed: 0,v.id,on road old,on road now,years,km,rating,condition,economy,top speed,hp,torque,current price
0,1,535651,798186,3,78945,1,2,14,177,73,123,351318.0
1,2,591911,861056,6,117220,5,9,9,148,74,95,285001.5
2,3,686990,770762,2,132538,2,8,15,181,53,97,215386.0
3,4,573999,722381,4,101065,4,3,11,197,54,116,244295.5
4,5,691388,811335,6,61559,3,9,12,160,53,105,531114.5
...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,633238,743850,5,125092,1,6,11,171,95,97,190744.0
996,997,599626,848195,4,83370,2,9,14,161,101,120,419748.0
997,998,646344,842733,7,86722,1,8,9,196,113,89,405871.0
998,999,535559,732439,2,140478,4,5,9,184,112,128,74398.0


### Data Dictionary Of The Various Features
**v.id** - indexes of vehicles

 <font size="1">'on the road price' covers everything you’ll have to pay to get your new car on the road. It includes the car’s list price, registration and delivery fees, and a year’s road tax ext.</font>

**on road old** - 'on the road price' when the car was new

**on road now** - 'on the road price' right now

**years**- how many years the car have been used (2-7)

**km** - the number of kilometers the vehicle traveled 

**rating** - a score given to the vehicle for its evaluation (1-5)

**condition** - a score given to the vehicle that represent its condition (1-10)

**economy** - a score given to the vehicle that represent its economical level (8-15)

**top speed** - the highest speed the vehicle reaches

**hp** - the vehicle's horsepower

**torque** - the vehicle's torque

**current price** - the price of the vehicle

As we can see, we do not need the 'v.id' column, since we have the index column automatically.<br> We'll remone this column.

In [3]:
df = df.drop(columns = ['v.id'])
df.head()

Unnamed: 0,on road old,on road now,years,km,rating,condition,economy,top speed,hp,torque,current price
0,535651,798186,3,78945,1,2,14,177,73,123,351318.0
1,591911,861056,6,117220,5,9,9,148,74,95,285001.5
2,686990,770762,2,132538,2,8,15,181,53,97,215386.0
3,573999,722381,4,101065,4,3,11,197,54,116,244295.5
4,691388,811335,6,61559,3,9,12,160,53,105,531114.5


In [4]:
df.shape

(1000, 11)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   on road old    1000 non-null   int64  
 1   on road now    1000 non-null   int64  
 2   years          1000 non-null   int64  
 3   km             1000 non-null   int64  
 4   rating         1000 non-null   int64  
 5   condition      1000 non-null   int64  
 6   economy        1000 non-null   int64  
 7   top speed      1000 non-null   int64  
 8   hp             1000 non-null   int64  
 9   torque         1000 non-null   int64  
 10  current price  1000 non-null   float64
dtypes: float64(1), int64(10)
memory usage: 86.1 KB


There are no NaN values in this dataset. We can see that the dataset is clean and ready to work with.

In [6]:
df.describe()

Unnamed: 0,on road old,on road now,years,km,rating,condition,economy,top speed,hp,torque,current price
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,601648.286,799131.397,4.561,100274.43,2.988,5.592,11.625,166.893,84.546,103.423,308520.2425
std,58407.246204,57028.9502,1.719079,29150.463233,1.402791,2.824449,2.230549,19.28838,20.51694,21.058716,126073.25915
min,500265.0,700018.0,2.0,50324.0,1.0,1.0,8.0,135.0,50.0,68.0,28226.5
25%,548860.5,750997.75,3.0,74367.5,2.0,3.0,10.0,150.0,67.0,85.0,206871.75
50%,601568.0,798168.0,5.0,100139.5,3.0,6.0,12.0,166.0,84.0,104.0,306717.75
75%,652267.25,847563.25,6.0,125048.0,4.0,8.0,13.0,184.0,102.0,121.0,414260.875
max,699859.0,899797.0,7.0,149902.0,5.0,10.0,15.0,200.0,120.0,140.0,584267.5
