Skip to content

Arunprakash1990/DataMining-MS-GSP-Algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mining Sequential Patterns Based on GSP (Generalized Sequential Patterns) MS-GSP algorithm - Sequential pattern mining using multiple minimum supports with a support difference constraint.

What MS-GSP does?
The algorithm is used for finding sequential patterns of frequent item sets that customers purchase from a supermarket.

Where it is used?
It is used in the application of market basket data analysis to find out which items in a store are most profitable.

How it works?
So, if we provide an input of a set of transactions made by each customer, the algorithm will output the frequent item sets.
At each pass, it calculates the support of every combination of item sets and compares it with a minimum support threshold and determines if that combination of items is frequent.

Data structures used: 
I used a hash based implementation for efficient support counting and an array list to store the frequent item sets at each iteration. (Look up advantages of hash map for searching)

Advantage of this algorithm over other algorithms
It is better than the standard Apriori algorithm as it takes into account the “time constraint”.GSP is based on Apriori but takes into account the order of the items and finds sequences.






Input format:
data.txt:Each line represents a Transaction Sequence and each set in a sequence represents a set of items.
<{20, 30, 80}{70}{50, 70}>
<{20, 30}{30, 70, 80}>
<{20, 40}{70}{20, 30, 70}>
<{10, 30}{40}>
<{10, 40, 90}{40, 90}>
<{10, 40, 70}{40, 90}>

para.txt:Gives the minimum item support for each item as well as the support difference constraint
MIS(10) = 0.43
MIS(20) = 0.30
MIS(30) = 0.30
MIS(40) = 0.40
MIS(50) = 0.40
MIS(60) = 0.30
MIS(70) = 0.20
MIS(80) = 0.20
MIS(90) = 0.20
Support_Diff_Constraint = 0.1

Output format:
Pattern :<{30,20}{70,80}{20,30,70}> count: 10

About

Predicted the customer’s purchasing behavior by generating patterns and identified the most frequently purchased item in a particular store using Java

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages