This project focuses on different types of supermarkets that are in business in a variety of locations and looks to answer some questions regarding the chief contributors to the profits that are earned by these outlets. We intended to make the output of this project very simple to understand, by the shop owners and businessmen in the retail sector. We started off with an extensive exploration module that answers some of the basic questions required by a data analyst to process further. Once the inconsistencies in the dataset were highlighted, we went on to clean the data and impute the missing values. The project then focussed on visualizing the data, making it easier to understand. This was followed by the usage of machine learning algorithms for predicting the sales of a particular item in a supermarket outlet. Five models were used and even within these five, the code was tweaked to obtain the best possible predictions. The project ends with a comparison of the various algorithms employed and identification of the best model.
The results of the models tested on the test data were submitted to the hackathon and the RMSE results were returned, which have been tabulated and arranged in the descending order. We see that the XGBOOST and RANDOM FOREST REGRESSORS perform the best, while the KNN algorithms perform the worst.