Skip to content

bitmapup/prefixspanr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WinCOPPER

Created by

Author: Agustin Guevara-Cogorno
Advisors Hugo Alatrista-Salas and Miguel Nuñez-del-Prado
Maintainer: Yoshitomi Maehara Aliaga
Contact Details: halatrista@utec.edu.pe
Institution: Pontificia Universidad Católica del Perú (PUCP) / Universidad de Ingeniería y Tecnología (UTEC)

Note: use Python 2.7.*

Description

WinCOPPER [3] is an extension of COPPER algorithm [1], originally developed to extract sequential patterns [2] under items-inclusion constraint. New WinCOPPER incorporates - also - time constraints capabilities, improving the extraction of useful patterns.

If WinCopper is used in your experimentations, please, we will be grateful if you cite us:

Alatrista-Salas, H., Guevara-Cogorno, A., Maehara and Y. Nunez-del-Prado, M. (2020). Efficiently Mining Gapped and Window Constraint Frequent Sequential Patterns. Proceedings 17th International Conference on Modeling Decisions for Artificial Intelligence [link]

Bibtex:

@inproceedings{alatrista2020efficiently,
  title={Efficiently Mining Gapped and Window Constraint Frequent Sequential Patterns},
  author={Alatrista-Salas, Hugo and Guevara-Cogorno, Agustin and Maehara, Yoshitomi and Nunez-del-Prado, Miguel},
  booktitle={International Conference on Modeling Decisions for Artificial Intelligence},
  pages={240--251},
  year={2020},
  organization={Springer}
}

Install

Install WinCOPPER thought Source Code

You can clone the Github repository, or you can download the code source. For both options, WinCOPPER could be installed using the following instructions:

python setup.py install

Also, if you prefer to install WinCOPPER in your development environment, you can use this:

python setup.py develop

Install WinCopper thought PIP

A simple way to install WinCOPPER is through PIP:

pip install git+https://github.com/bitmapup/prefixspanr.git

Also, you can select a specific version o WinCOPPER through the instruction.

pip install git+https://github.com/bitmapup/prefixspanr.git@v1.0

wincopper depends on the following packages. Please, be sure that these packages are correctly installed before installing WinCOPPER.

  • psutil
  • pandas
  • numpy
  • resources (Only in Linux-like OS)

WinCopper usage example

  1. Read the CSV dataset (an example of the input file is shown in the example folder).
import pandas as pd
from ast import literal_eval
path_res = "pei_dataset_modif.csv"
data = pd.read_csv(path_res, sep=",", header=0, converters={"sequence": literal_eval})
sids = list(data["sid"])
sequences = list(data["sequence"])
  sid,sequence
  10,"['a' , ['a','b','c'] , ['a','c'] , 'd' , ['c','f']]"
  20,"[['a','d'] , 'c' , ['b','c'] , ['a','e']]"
  30,"[['e','f'] , ['a','b'] , ['d','f'] , 'c', 'b']"
  40,"['e' , 'g' , ['a','f'] , 'c' , 'b', 'c']"
* Only the part of sequences will be processed without ids of sequences
  1. Set the WinCOPPER options values with (see next section).
# Absolute threshold
threshold = 3
# Contains k-itemsets separated by commas
items_separated = False
# is a configuration of a Original PrefixSpan
options = {'threshold': threshold, 'itemsSeparated': items_separated}
  1. Mine the dataset with WinCOPPER incorporating the options settled in the previous step.
import wincopper as wc
# Run algorithm
result_mining = wc.prefixspan(sequences, options)

for l in result_mining:
    print(l)
['<a>', 4, 1.0]
['<a><b>', 4, 1.0]
['<a><c>', 4, 1.0]
['<a><c><b>', 3, 0.75]
['<a><c><c>', 3, 0.75]
['<b>', 4, 1.0]
['<b><c>', 3, 0.75]
['<c>', 4, 1.0]
['<c><b>', 3, 0.75]
['<c><c>', 3, 0.75]
['<d>', 3, 0.75]
['<d><c>', 3, 0.75]
['<e>', 3, 0.75]
['<f>', 3, 0.75]

Wincopper in google Colaboratory

Is possible to run example in Google colaboratory using

Open In Colab

also is possible to create a new notebook with python 2 and after install package by pip using link 1 or link 2

Configuration Options

It is worth noting that, WinCOPPER includes capabilities of PrefixSpam [2] and Copper [1] algorithms.

Algorithm Configurations

**1) Original PrefixSpan [2] **

options = {'threshold' : int or float}

Descriptions

  • threshold: support of patterns
    • If the threshold value is an integer, WinCopper assumes that an Absolute Support is using.
    • If the threshold value is a float, WinCopper assumes that a Relative Support is using.

2) COPPER

options = {'threshold' : int or float,
           'minSseq': int, 'maxSseq': int,
           'minSize': int, 'maxSize': int,
           'logic': string}

Descriptions

  • threshold: support of patterns
    • If the threshold value is an integer, WinCopper assumes that an Absolute Support is using.
    • If the threshold value is a float, WinCopper assumes that a Relative Support is using.
  • minSseq: minimum itemset size constraint (itemset size)
  • maxSseq: maximum itemset size constraint (itemset size)
  • minSize: minimum pattern size constraint (subsequence size)
  • maxSize: maximum pattern size constraint (subsequence size)
  • logic: soft inclusion constraint
    • OR relation '(s1 | s2)'
    • AND relation '(s1 & s2)'

3) Prefixspan with time constraints (WinGap)

options = {'threshold' : int or float, 'window' : int, 'gap' : int}

Descriptions

  • threshold: support of patterns
    • If the threshold value is an integer, WinCopper assumes that an Absolute Support is using.
    • If the threshold value is a float, WinCopper assumes that a Relative Support is using.
  • window: maximum windows size between itemsets
  • gap: maximum gap between itemsets

4) WinCOPPER (Copper with time constraints)

options = {'threshold' : int or float,
           'minSseq': int, 'maxSseq': int,
           'minSize': int, 'maxSize': int,
           'window' : int, 'gap' : int,
           'logic': string}

Descriptions

  • threshold: support of patterns
    • If the threshold value is an integer, WinCopper assumes that an Absolute Support is using.
    • If the threshold value is a float, WinCopper assumes that a Relative Support is using.
  • minSseq: minimum itemset size constraint (itemset size)
  • maxSseq: maximum itemset size constraint (itemset size)
  • minSize: minimum pattern size constraint (subsequence size)
  • maxSize: maximum pattern size constraint (subsequence size)
  • logic: soft inclusion constraint
    • OR relation '(s1 | s2)'
    • AND relation '(s1 & s2)'
  • window: maximum windows size between itemsets constraint
  • gap: maximum gap between itemsets constraint

Other Options

options = {'itemsSeparated': bool, 'dataDesc': string,
           'resultFile': bool, 'test': bool}

Descriptions

  • itemsSeparated: Flag for separation of itemsets
    • If the itemsSeparated value is True, patterns will contain only 1-itemsets
    • If the itemsSeparated value is False, patterns will contain k-itemsets [Default]
  • dataDesc: Allows fixing a name to the results file.
  • resultFile: Flag for generating a results file
    • If the resultFile value is True, a file with the extracted patterns will be generated. [Default]
    • If the resultFile value is False, a file will not be generated.
  • test: Flag for generating a summary of tests file (for experimentation purpouses)
    • If the test value is True, a summary of test will be generated
    • If the test value is False, a summary of test will not be generated [Default]

References

  • [1] Guevara-Cogorno, A., Flamand, C. and Alatrista-Salas, H. (2015). COPPER - Constraint OPtimized Prefixspan for Epidemiological Research. Procedia Computer Science, 63, 433-438. [link]
  • [2] Pei, J., Han, J., Mortazavi-Asl, B. and Pinto H. (2002). PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. Proceedings 17th International Conference on Data Engineering. 215-224. [link]
  • [3] Alatrista-Salas H., Guevara-Cogorno A., Maehara Y. and Nunez-del-Prado M. (2020) Efficiently Mining Gapped and Window Constraint Frequent Sequential Patterns. In: Torra V., Narukawa Y., Nin J., Agell N. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2020. Lecture Notes in Computer Science, vol 12256. Springer, Cham. [link]