Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a way to edit the minlen #18

Open
StefanBloemheuvel opened this issue Apr 11, 2019 · 4 comments
Open

a way to edit the minlen #18

StefanBloemheuvel opened this issue Apr 11, 2019 · 4 comments

Comments

@StefanBloemheuvel
Copy link

hi,

is there a way to edit the minlen? i am only interested in patterns between 2 and 5 length. (i am working in python btw)

Thanks in advance, the package works great!

@VBota1
Copy link

VBota1 commented Mar 15, 2020

I was also unable to find this feature so I updated the class PrefixSpan(object): like this:

class PrefixSpan(object):
    def __init__(self, db, minLen = 1, maxLen = 1000):
        # type: (List[List[int]]) -> None
        self._db = db

        self.minlen = minLen
        self.maxlen = maxLen

        self._results = [] # type: Any

I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:

        prefix = PrefixSpan(sourceData, minLen=3, maxLen=10)

        for pattern in prefix.frequent(minSupport):
            print(pattern)

Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9],
    [3, 6, 9, 0, 3, 6, 9, 3, 6, 9],
    [0, 7, 0, 7, 0, 7, 0, 7]

if we run the command:

        prefix = PrefixSpan(sourceData, minLen=2, maxLen=3)

        for pattern in prefix.frequent(3):
            print(pattern)

you get the pattern: [3, [7, 7]]

Patterns like: [3, [3, 9, 3]] and [3, [9, 3, 9]] are not returned because there exist one super pattern that contains both of them [3, [3, 9, 3, 9]], but this pattern has the length 4 and we set the maxLen=3 so it is not returned.

@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?

@chuanconggao
Copy link
Owner

@VBota1 Pull request is highly welcome. Thanks.

@abhi-rawat1
Copy link

I was also unable to find this feature so I updated the class PrefixSpan(object): like this:

class PrefixSpan(object):
    def __init__(self, db, minLen = 1, maxLen = 1000):
        # type: (List[List[int]]) -> None
        self._db = db

        self.minlen = minLen
        self.maxlen = maxLen

        self._results = [] # type: Any

I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:

        prefix = PrefixSpan(sourceData, minLen=3, maxLen=10)

        for pattern in prefix.frequent(minSupport):
            print(pattern)

Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9],
    [3, 6, 9, 0, 3, 6, 9, 3, 6, 9],
    [0, 7, 0, 7, 0, 7, 0, 7]

if we run the command:

        prefix = PrefixSpan(sourceData, minLen=2, maxLen=3)

        for pattern in prefix.frequent(3):
            print(pattern)

you get the pattern: [3, [7, 7]]

Patterns like: [3, [3, 9, 3]] and [3, [9, 3, 9]] are not returned because there exist one super pattern that contains both of them [3, [3, 9, 3, 9]], but this pattern has the length 4 and we set the maxLen=3 so it is not returned.

@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?

@VBota1 - I tried to follow ur given steps. I created a new class with named as 'PrefixSpan_My'. But getting some error as mentioned below; will you be able to provide any suggestion to fix this issue?

ps = PrefixSpan_My(data, minLen = 3)
print(ps.frequent(2))

Error: 'NoneType' object is not callable

image

@lionralfs
Copy link

For anyone else coming across this issue, one way to get around this (without requiring changes to the library) is to do the following:

ps = PrefixSpan(transactions)
ps.minlen = 2
ps.maxlen = 5

result = ps.frequent(2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants