Please Click Here to run the app
This is a machine learning based app to predict the price of used Car.
The model of the app is based on Random Forest Regressor which is famous for its high accuracy. Basically it is an ensemble learning technique which takes some "n" base learners mainly Decision Trees and take best prediction based on votes
I have used Car Dekho Data from Kaggle.
Here is dataset snippet
Dataset consist of various columns such as Car_name,year,Selling_Price,Present_price etc.
On the basis of this dataset we need to decide what should be the price of any second hand car. This is a regression task i.e. we need to predict continuous value from the data therefore we need to go with those algorithms which is good with this.
This project is basically divided on 2 part
- Model Building (using Random Forest and other machine learning algorithms)
- Model Deployment using Flask in Heroku Platform
I worked in Anaconda Platform which you can visit by clicking this website to download anconda's individual edition
Please make a saparate enviroment before jumping into code
Please install all the following pakages using "pip install"
- sklearn
- pandas
- numpy
- flask
- matplotlib
- seaborn
- jsonify
- requests
Before training our model we need to clean and analyze our dataset. Let's see how we did it
I first decided to check null values in the dataset.
So it's great! that we don't have any null values present in it.
Next let's check frequency of the cateogrical features in the dataset.
So According to the above figure we have to convert Seller_Type,Transmission and Fuel Type into one hot vector so that we can perform numerical calculations on them
Next we need to break the data into Independet and Dependent feature.
Since we are using Tree based learning algorithm so we do NOT required Feature Engineering or Feature Scaling. But if anyone want to work with any other algo they must need to first convert the features into its scalled form and then train the model.
Let's check the important feature in the dataset.
So clearly it shows that "Owner" feature not at all important for our model training so we decided to remove it from the training data. As it only increases the size without giving any value.
Now after doing all Data Analysis we are going for training our model. Let's split our data into train and test set.
Note that we are using RandomizedSearchCV for Hyperparameter Tuning, and because of this model get a good accuracy.
Let's See the accuracy of our Model
Let's see the scatter plot between prediction and real value
so it looks fine and prediction should be accuratly capture as the regression line is following the plotted points
Hence our model is trained
First we need to create the HTML file in a templetes folder for our flask's app.py file; as it detect all the html files from that folder itself.
You can check my template folder for that file.
Then we need to create a flask app which capture all the inputs from the user and then preprocess all the data collected from the user and then give the data to the model for making prediction
You can check my app.py file for the code.
We are using Heroku app for deploying our flask app.
Note that we need procfile, requriments.txt and runtime.txt file before proceeding for the deployment.
procfile contain gunicorn code which is a Python Web Server Gateway Interface HTTP server. That is used to pass our python code to http server.
requriments.txt is a text file containing all our packages name that we used in this project
We also require runtime.txt file which is basically tell heroku server to download specific python version on the server which is given in the text file
After all these formalities we are ready to deploy our model just visit Heroku website and create this app
After creating this model which seems to give good accuracy on new data and working without much problem
But following are the points that we can used to further developed the product
- Firstly it's not available offline where we might work in future (By using some advance cloud services).
- We may improve the accuracy when we get the new data
Note that if anyone want to suggest me anything please ping me on linkedIn. It's very helpful for me
As I am a self learner my special thanks goes to Krish Naik sir. He is a brilliant teacher, data scientist and auther.
I also thanks to Andrew Ng sir. He also help me a lot to learn the fundamental mathamatics behind the machine learning algorithms