An implementation of data mining algorithm PrefixSpan based on Python.
Base on paper:
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth
This simple python script does not rely on any other third-party libraries. Just confirm that your environment is Python 3.
-
Type code below to import script.
import PrefixSpan
-
Pack data. Confirm that this data is a 2-D python list whose elements are single char(or int)
data = [ [1, 4, 2, 3], [0, 1, 2, 3], [1, 2, 1, 3, 4], [2, 1, 2, 4, 4], [1, 1, 1, 2, 3], ]
-
Call function PrefixSpan.
L = PrefixSpan.PrefixSpan(data,minSup=0.8,minConf=0)
This function return a list that contains bunch of result subsequences(format: python tuple).
Parameters:
- minSup: min_support declared in paper, default 0.5
- minConf: min_confidence, default 0
-
Print result.
print(L)
[out]:
[[(1,), (2,), (3,)], [(1, 2), (1, 3), (2, 3)], [(1, 2, 3)], []]
The length of pattern sequence in result list increases by 1