Big-Mart-Sales-Hackathon

This Hackathon is sponsored by Analytics Vidhya and we are supposed to build a predictive model and find out the sales of each product at a particular store.

We have train (8523) and test (5681) data set, train data set has both input and output variable(s). We need to predict the sales for test data set.

Evaluation Metric:

The Evaluation metric we will use is the Root Mean Square Error value.

Where, N: total number of observations Predicted: the response entered by user Actual: actual values of sales

EDA Steps taken

Checking the distribution of Train and Test Data.
Checking the Statistics for the whole Data.
Filling the missing values in the Item_Weight column with mean and Outlet_Size with mode as there are missing values
Generating a Profile report on the whole dataset.

Feature Engineering

Replacing the various content values in the Item Fat Content variable
Getting the first two characters of ID to separate them into different categories.
Checking the distribution of variables using count plot.
Checking the unique items in the train and test set.
Mapping the Item_Identifier variable with their respective column names.
Label Encoding the train and test dataset.
One hot encoding the data to get dummy variables for the train and test data.

Modelling

Linear Regression yielding a RMSE score of 9.18
Random Forest Regressor yielding a RMSE score of 39.62
Support Vector Machine yielding a RMSE score of 739.68
Gradient Boosting Algorithm yielding a RMSE score of 30.42
Decision Tree Regressor yielding a RMSE score of 30.02
Adaboost Algorithm yielding a RMSE score of 121. 48

After using 6 algorthims on the train data and testing the same on unseen test data we can see that Linear Regression has yielded the lowest RMSE Score of 9.18, since, lower the RMSE value the better, as it is able to explain the variance in the data points in a much better way. And RMSE is defined by how close the predicted values are in terms of the observed data points. Lower values of RMSE indicate better fit. Hence, we will select the Linear Regression RMSE value for the prediction purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
BigMart .ipynb		BigMart .ipynb
README.md		README.md
test_bigmart.csv		test_bigmart.csv
train_bigmart.csv		train_bigmart.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigMart .ipynb

BigMart .ipynb

README.md

README.md

test_bigmart.csv

test_bigmart.csv

train_bigmart.csv

train_bigmart.csv

Repository files navigation

Big-Mart-Sales-Hackathon

Evaluation Metric:

EDA Steps taken

Feature Engineering

Modelling

About

Releases

Packages

Languages

SaiBiswas/Big-Mart-Sales-Hackathon

Folders and files

Latest commit

History

Repository files navigation

Big-Mart-Sales-Hackathon

Evaluation Metric:

EDA Steps taken

Feature Engineering

Modelling

About

Resources

Stars

Watchers

Forks

Languages