Skip to content

bvantuan/Predict-Future-Sales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predict Future Sales

This repository contains my final project of an online course "How to Win a Data Science Competition: Learn from Top Kagglers" (Coursera) organized by National Research University Higher School of Economics (HSE). The project works with a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company to predict total sales for every product and store in the next month. The competition is hosted on Kaggle platform.

Overview

The task is to forecast the total amount of products sold in every shop for the test set.

Architecture

  • STEP 1 : Perform target encoding and create some lag-based features
  • STEP 2 : Tune hyper parameters of gradient boosted trees (LightGBM)
  • STEP 3 : Ensembling with simple averaging of linear model and gradient boosted trees and use stacking
  • STEP 4 : Produce the final submit file

Dataset

  • sales_train.csv - the training set. Daily historical data from January 2013 to October 2015.
  • test.csv - the test set. You need to forecast the sales for these shops and products for November 2015.
  • sample_submission.csv - a sample submission file in the correct format.
  • items.csv - supplemental information about the items/products.
  • item_categories.csv - supplemental information about the items categories.
  • shops.csv- supplemental information about the shops.

Results

The final solution is optimized for root-mean-square error (RMSE).

Public Score
My result 0.94099
Top 1 Kaggle 0.75368

Repository Files:

  • EDA.ipynb : Exploratory data analysis on the data set
  • DataProcess.ipynb : Feature engineering
  • Model.ipynb : Stacking of linear model and gradient boosted trees

References