Course project related to the subject named Data Mining at The University of Trento. The main purpose of the project is to classify different kind of customers based on a data set of groceries transactions, and a data set of recipe ingredients.
In a typical supermarket, customers buy different kinds of ingredients (food) for the recipes they are planning to prepare at home. We would like to identify types of customers based on the food they eat. This is challenging for many reasons. First because the same ingredient may be used in different recipes or for the same recipe different ingredients may be used. Furthermore, a customer may buy only some of the elements he/she needs (because she may have the rest at home) which means that it is not clear what recipe a person wants to make when buying a specific item. For each “kind” of customer a basic textual description should be provided that gives a human understanding of what this kind of customers are about.
To produce the kinds of customers that exist we assume the existence of a market basket dataset, as well as a dataset that provides for each recipe its ingredients. One example of the former may be found here:
https://www.kaggle.com/irfanasrullah/groceries
while the second here:
https://www.kaggle.com/kaggle/recipe-ingredients-dataset These are real datasets, but it is equally important to have tests with some synthetic data.