Skip to content

deepakgupta1/Future-Datathon-Product-Recommendation

Repository files navigation

Future-Datathon-Product-Recommendation

Recommend products that Big Bazaar customers are most probable to buy in next one month.

Future Group has built an attractive portfolio of some of the fastest growing consumer brands in India. Around 400 million customers walk into their stores each year and choose products and services supplied by over 30,000 small, medium and large entrepreneurs and manufacturers from across India.

About the data

  • Missing Values: Not too many missing values. The DOB, Gender, State, PinCode, promotion_description and PaymentUsed fields have missing values.
  • Data Quality: Wrong Pincodes, multiple spellings for same state name.

Approach:

Upon some data analysis, I found that most of the customers happen to buy same products repeatedly, which is quite intuitive as we always buy the same toothpaste or our favourite biscuits. Also, customers buy products which are popular around that time.

Finally, I came to following heuristics for recommendation:

  • A customer who has not shopped in 2016 and 2017 would most probably won’t shop in coming month, so no recommendations for such customers.
  • A customer who has not shopped in 2017(but has done before) is recommended 3 most popular products.
  • A customer who has shopped for at least 5 months in 2017 with minimum average number of products shopped per month being 35 is recommended 15 of his/her most brought products in decreasing order of number of times the product was bought.
  • A customer who has shopped for at least 5 months in 2017 with minimum average number of products shopped per month being 25 is recommended 10 of his/her most brought products in decreasing order of number of times the product was bought.
  • A customer who has shopped for at least 4 months in 2017 with minimum average number of products shopped per month being 10 is recommended 8 of his/her most brought products in decreasing order of number of times the product was bought.

For remaining of 20 places ‘None’ is used.

I also experimented with machine learning approache: Collaborative Filtering, but it was taking too much time and the score was also less. So eventually went ahead with heuristics.