Arunprakash1990/DataMining-MS-GSP-Algorithm
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Mining Sequential Patterns Based on GSP (Generalized Sequential Patterns) MS-GSP algorithm - Sequential pattern mining using multiple minimum supports with a support difference constraint. What MS-GSP does? The algorithm is used for finding sequential patterns of frequent item sets that customers purchase from a supermarket. Where it is used? It is used in the application of market basket data analysis to find out which items in a store are most profitable. How it works? So, if we provide an input of a set of transactions made by each customer, the algorithm will output the frequent item sets. At each pass, it calculates the support of every combination of item sets and compares it with a minimum support threshold and determines if that combination of items is frequent. Data structures used: I used a hash based implementation for efficient support counting and an array list to store the frequent item sets at each iteration. (Look up advantages of hash map for searching) Advantage of this algorithm over other algorithms It is better than the standard Apriori algorithm as it takes into account the “time constraint”.GSP is based on Apriori but takes into account the order of the items and finds sequences. Input format: data.txt:Each line represents a Transaction Sequence and each set in a sequence represents a set of items. <{20, 30, 80}{70}{50, 70}> <{20, 30}{30, 70, 80}> <{20, 40}{70}{20, 30, 70}> <{10, 30}{40}> <{10, 40, 90}{40, 90}> <{10, 40, 70}{40, 90}> para.txt:Gives the minimum item support for each item as well as the support difference constraint MIS(10) = 0.43 MIS(20) = 0.30 MIS(30) = 0.30 MIS(40) = 0.40 MIS(50) = 0.40 MIS(60) = 0.30 MIS(70) = 0.20 MIS(80) = 0.20 MIS(90) = 0.20 Support_Diff_Constraint = 0.1 Output format: Pattern :<{30,20}{70,80}{20,30,70}> count: 10
About
Predicted the customer’s purchasing behavior by generating patterns and identified the most frequently purchased item in a particular store using Java
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published