**Price Prediction Model**

Your price prediction model aims to estimate the price of an item based on certain features. Here's a concise summary of its performance:
- **Training Score**: The model achieved an impressive 92% accuracy on the training data. This indicates its strong ability to predict price trends based on the provided features.

- **Test Score**: The model maintained a high accuracy of 91% on the test data, confirming its robustness in generalizing to new, unseen instances.

- **R-squared Score**: With an R-squared score of 91%, the model is able to explain a significant proportion of the variance in the target variable. This indicates a good fit of the model to the data.

- **Error Metrics**:
  - Mean Absolute Error (MAE): The MAE of approximately 959.67 suggests that, on average, the model's predictions are off by this amount from the actual prices.
  - Mean Squared Error (MSE): The MSE of about 1,866,304.23 quantifies the average squared difference between predicted and actual prices.
  - Root Mean Squared Error (RMSE): The RMSE of approximately 1366.13 indicates the average magnitude of errors in the same units as the target variable.

In conclusion,  price prediction model demonstrates strong accuracy and performance. Its ability to accurately estimate prices suggests its potential utility in various scenarios. Further optimizations could focus on reducing prediction errors and fine-tuning model parameters.

Importing required libraries

In [27]:
import numpy as np
import pandas as pd

Reading the CSV file into a DataFrame

In [28]:
ford=pd.read_csv("ford.csv")

In [29]:
ford

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
0,Fiesta,2017,12000,Automatic,15944,Petrol,150,57.7,1.0
1,Focus,2018,14000,Manual,9083,Petrol,150,57.7,1.0
2,Focus,2017,13000,Manual,12456,Petrol,150,57.7,1.0
3,Fiesta,2019,17500,Manual,10460,Petrol,145,40.3,1.5
4,Fiesta,2019,16500,Automatic,1482,Petrol,145,48.7,1.0
...,...,...,...,...,...,...,...,...,...
17961,B-MAX,2017,8999,Manual,16700,Petrol,150,47.1,1.4
17962,B-MAX,2014,7499,Manual,40700,Petrol,30,57.7,1.0
17963,Focus,2015,9999,Manual,7010,Diesel,20,67.3,1.6
17964,KA,2018,8299,Manual,5007,Petrol,145,57.7,1.2


Displaying the first few rows of the dataset

In [30]:
ford.head()

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
0,Fiesta,2017,12000,Automatic,15944,Petrol,150,57.7,1.0
1,Focus,2018,14000,Manual,9083,Petrol,150,57.7,1.0
2,Focus,2017,13000,Manual,12456,Petrol,150,57.7,1.0
3,Fiesta,2019,17500,Manual,10460,Petrol,145,40.3,1.5
4,Fiesta,2019,16500,Automatic,1482,Petrol,145,48.7,1.0


Displaying the last few columns of the dataset

In [31]:
ford.tail()

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
17961,B-MAX,2017,8999,Manual,16700,Petrol,150,47.1,1.4
17962,B-MAX,2014,7499,Manual,40700,Petrol,30,57.7,1.0
17963,Focus,2015,9999,Manual,7010,Diesel,20,67.3,1.6
17964,KA,2018,8299,Manual,5007,Petrol,145,57.7,1.2
17965,Focus,2015,8299,Manual,5007,Petrol,22,57.7,1.0


Describe the dataset

In [32]:
ford.describe()

Unnamed: 0,year,price,mileage,tax,mpg,engineSize
count,17966.0,17966.0,17966.0,17966.0,17966.0,17966.0
mean,2016.86647,12279.534844,23362.608761,113.329456,57.90698,1.350807
std,2.050336,4741.343657,19472.054349,62.012456,10.125696,0.432367
min,1996.0,495.0,1.0,0.0,20.8,0.0
25%,2016.0,8999.0,9987.0,30.0,52.3,1.0
50%,2017.0,11291.0,18242.5,145.0,58.9,1.2
75%,2018.0,15299.0,31060.0,145.0,65.7,1.5
max,2060.0,54995.0,177644.0,580.0,201.8,5.0


Displaying the shape of the dataset

In [33]:
ford.shape

(17966, 9)

Displaying the size of the dataset

In [34]:
ford.size

161694

Displaying the types of the dataset

In [35]:
ford.dtypes

model            object
year              int64
price             int64
transmission     object
mileage           int64
fuelType         object
tax               int64
mpg             float64
engineSize      float64
dtype: object

Checking the null values of the dataset

In [36]:
ford.isnull().sum()

model           0
year            0
price           0
transmission    0
mileage         0
fuelType        0
tax             0
mpg             0
engineSize      0
dtype: int64

 Creating a copy of the dataset

In [37]:
ford_copy=ford.copy()

In [38]:
ford_copy

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
0,Fiesta,2017,12000,Automatic,15944,Petrol,150,57.7,1.0
1,Focus,2018,14000,Manual,9083,Petrol,150,57.7,1.0
2,Focus,2017,13000,Manual,12456,Petrol,150,57.7,1.0
3,Fiesta,2019,17500,Manual,10460,Petrol,145,40.3,1.5
4,Fiesta,2019,16500,Automatic,1482,Petrol,145,48.7,1.0
...,...,...,...,...,...,...,...,...,...
17961,B-MAX,2017,8999,Manual,16700,Petrol,150,47.1,1.4
17962,B-MAX,2014,7499,Manual,40700,Petrol,30,57.7,1.0
17963,Focus,2015,9999,Manual,7010,Diesel,20,67.3,1.6
17964,KA,2018,8299,Manual,5007,Petrol,145,57.7,1.2


Converting categorical columns to numerical using label encoding

In [39]:
array=["model","transmission","fuelType"]
for i in array:
    ford_copy[i]=ford_copy[i].astype('category').cat.codes

Separating features (x) and target (y) variables

In [40]:
x=ford_copy.drop(['price'],axis=1)

In [41]:
y=ford_copy["price"]

In [42]:
x

Unnamed: 0,model,year,transmission,mileage,fuelType,tax,mpg,engineSize
0,5,2017,0,15944,4,150,57.7,1.0
1,6,2018,1,9083,4,150,57.7,1.0
2,6,2017,1,12456,4,150,57.7,1.0
3,5,2019,1,10460,4,145,40.3,1.5
4,5,2019,0,1482,4,145,48.7,1.0
...,...,...,...,...,...,...,...,...
17961,0,2017,1,16700,4,150,47.1,1.4
17962,0,2014,1,40700,4,30,57.7,1.0
17963,6,2015,1,7010,0,20,67.3,1.6
17964,11,2018,1,5007,4,145,57.7,1.2


In [43]:
y

0        12000
1        14000
2        13000
3        17500
4        16500
         ...  
17961     8999
17962     7499
17963     9999
17964     8299
17965     8299
Name: price, Length: 17966, dtype: int64

Splitting the dataset into training and testing sets

In [44]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=40)

Initializing a Gradient Boosting Regressor

In [45]:
from sklearn.ensemble import GradientBoostingRegressor
regressor=GradientBoostingRegressor(random_state=44)

Training the regressor on the training data  

Calculating and printing the training  score

In [46]:
regressor.fit(x_train,y_train)
regressor.score(x_train,y_train)

0.92068924065327

Calculating and printing the testing  score

In [47]:
regressor.score(x_test,y_test)

0.918454336393321

Predicting the target values on the test data

In [48]:
y_pred=regressor.predict(x_test)

Calculating and printing the R-squared score

In [49]:
from sklearn.metrics import r2_score
print(r2_score(y_test,y_pred))

0.918454336393321


Calculating and printing various error metrics

In [51]:
from sklearn import metrics
print('Mean Absolute Error=',metrics.mean_absolute_error(y_test,y_pred))
print('Mean Squared Error=',metrics.mean_squared_error(y_test,y_pred))
print('Root Mean Squared Error=',np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

Mean Absolute Error= 959.6657616412352
Mean Squared Error= 1866304.228839954
Root Mean Squared Error= 1366.127457025864
