Skip to content

GiorgioMorales/itemMining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Itemset Mining: Python Implementation of the Apriori and AssociationRules Algorithms

Description

This is an implementation of the algorithms described in:

Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Machine Learning: Fundamental Concepts and Algorithms, 
Chapter 8, 2nd Edition, Cambridge University Press, March 2020. ISBN: 978-1108473989.

Our system utilizes the Apriori Algorithm to generate frequent item sets from a given sparse binary table. Some of the uses of the frequent itemsets generated from the Apriori Algorithm include: determining the items that are most likely to be purchased, predicting correlations between the items that are most likely to be selected at the same time, or predicting dependency chains within the dataset. Our system also generates rules within a minimum confidence from the list of frequent itemsets generated by the Apriori Algorithm. The strong rules, in combination with the frequent itemset, can be used to make more confident assumptions about the relationship between each of the categories of a given dataset.

Usage

Below, we list the instructions that should be followed in order to use our system as well as some comments about the structure of the project:

  • Clone or download our repository using:

    git clone https://github.com/GiorgioMorales/itemMining.git.

  • Configure the project using a Python 3.x environment. The only libraries needed are Pandas and Numpy.

  • The main driver is located within the ItemMining.py script. It creates an instance of the ItemMining class called table. The constructor takes as arguments the address of the dataset and the parameters minSup (minimum support) and minConf (minimum confidence) used by the Apriori and AssociationRules algorithms, respectively. By default, we consider a minimum support of 4 and a minimum confidence of 0.4.

  • Our proposed system is executed typing

    python ItemMining.py

    in a Python terminal. If the users want to manually set the parameters, they should type

    python ItemMining.py -s minSup -c minConf

    instead.

  • The file Dataset.py contains a class of the same name used to read the Transaction Itemset dataset (in CSV format). In addition, the file ItemsetRule.py contains two helper classes: The Itemset class is used to store itemsets as nodes of a tree, while the \texttt{Rule} class store all the elements that describe an association rule.

  • The dataset consists of a sparse binary table read as a Python Pandas Dataframe object with the columns labeled and the rows containing unique elements. We noticed that, in this dataset, each department is related to only one possible ID, so we chose to use the Dept column from the dataset instead of the ID column so that the obtained rules can be better interpreted. The table is passed into the Apriori Algorithm, and it displays the frequent itemsets that are detected from the given table.

  • Using the sparse binary table, the system then determines a set of rules based on the frequent itemsets and a minimum confidence parameter using the \texttt{associationRules} algorithm. %Taking the frequent itemsets that are larger than 2, rules are generated that imply that if all of the elements of a certain itemset subset is true, then another certain itemset subset must be true with a certain degree of confidence. Each strong rule is displayed along with its support, confidence, leverage, and lift metrics.

  • Finally, we call the static method rankRules to print the most "k" relevant rules from the set of strong rules.