Anticipatory customer order prediction after purchasal of item(s).
Input folder: input/input
Text file containing customer name and all the product IDs bought by that customer in the following format:
Mary 34 56 29 12 34 56 92 29 34 12
Kelly 92 29 12 34 79 29 56 12 34 18
For each product, the probability of occurance of next product is found. More description of outputs is found on Programming Approaches section.
Pairs approach consists of in-mapper combiner algorithm. The mapper and reducer output is in pairs.
Output Folder: output/CrystalBallPair
Output: Part file containing “Pairs” of product Id and the frequencies of the bought products until the original product is bought. The output would be in the following format:
[12, 18] 0.09090909090909091
[12, 29] 0.18181818181818182
[12, 34] 0.36363636363636365
... and so on.
The mapper and reducer output of stripes approach is in Stripes.
Output Folder: output/CrystalBallStripe
Output: Part file containing “Stripes” of product Id and the frequencies of the bought products until the original product is bought. The output would be in the following format:
12 {(56, 0.18181818181818182), (92, 0.09090909090909091), (34, 0.36363636363636365), (18, 0.09090909090909091), (79, 0.09090909090909091), (29, 0.18181818181818182), }
29 {(56, 0.15384615384615385), (92, 0.07692307692307693), (34, 0.3076923076923077), (18, 0.07692307692307693), (79, 0.07692307692307693), (12, 0.3076923076923077), }
... and so on.
The mapper output of hybrid approach is in Pairs and reducer output is in stripes. This is the most efficient apporach.
Output Folder: output/CrystalBallHybrid
Output: Part file containing “Stripes” of product Id and the frequencies of the bought products until the original product is bought. The output would be in the following format:
12 {(56, 0.18181818181818182), (92, 0.09090909090909091), (34, 0.36363636363636365), (18, 0.09090909090909091), (79, 0.09090909090909091), (29, 0.18181818181818182), }
29 {(56, 0.15384615384615385), (92, 0.07692307692307693), (34, 0.3076923076923077), (18, 0.07692307692307693), (79, 0.07692307692307693), (12, 0.3076923076923077), }
... and so on.
Change current directory to project source directory and run these bash commands:
./build.sh
./run.sh
- Install platform for Cloudera. This may be VMWare, Docker or VirtualBox.
- Install Cloudera CDH 5.8 into the platform (Download Link : https://www.cloudera.com/downloads/quickstart_vms/5-8.html )
- Bishal Paudel - BishalPaudel
This project is licensed under the MIT License - see the LICENSE.md file for details