<a href="https://colab.research.google.com/github/UdayLab/Hands-on-Pattern-Mining/blob/main/chapter4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 4: Pattern Discovery in Transactional Databases

## Install PAMI repository

In [1]:
!pip install pami

zsh:1: command not found: pip


## Download the sample transactional database

In [2]:
!wget -nc https://web-ext.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv

File ‘Transactional_T10I4D100K.csv’ already there; not retrieving.



## Discovering frequent patterns using FP-growth

### Generic

```python
from PAMI.frequentPattern.basic import FPGrowth  as alg

obj = alg.FPGrowth(iFile='inputFileName',minSup=minimumSupportvalue,sep='\t')
obj.mine()
obj.save('outputFileName')

frequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(frequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 1

In [3]:
from PAMI.frequentPattern.basic import FPGrowth  as alg

obj = alg.FPGrowth(iFile='Transactional_T10I4D100K.csv',minSup=300,sep='\t')
obj.mine()
obj.save('frequentPatternsAtMinSupCount300.txt')

frequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(frequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using frequentPatternGrowth algorithm
#Patterns: 4540
Runtime: 2.5302541255950928
Memory (RSS): 429178880
Memory (USS): 406208512


## Discovering Closed Frequent Patterns

### Generic



```python
from PAMI.frequentPattern.closed import CHARM  as alg

obj = alg.CHARM(iFile='inputFileName', minSup=minimumSupportValue)
obj.mine()
obj.save('outputFileName')

closedFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))  
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 2

In [4]:
from PAMI.frequentPattern.closed import CHARM  as alg

obj = alg.CHARM(iFile='Transactional_T10I4D100K.csv', minSup=300)
obj.mine()
obj.save('closedFrequentPatterns.txt')

closedFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Closed Frequent patterns were generated successfully using CHARM algorithm
#Patterns: 2856
Runtime: 4.201343059539795
Memory (RSS): 85114880
Memory (USS): 61603840


## Discovering Maximal Frequent Patterns

### Generic

```python
from PAMI.frequentPattern.maximal import MaxFPGrowth  as alg

obj = alg.MaxFPGrowth(iFile='inputFileName', minSup=minimumSupportValue)
obj.mine()
obj.save('outputFileName')

maximalFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 3

In [5]:
from PAMI.frequentPattern.maximal import MaxFPGrowth  as alg

obj = alg.MaxFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=300)
obj.mine()
obj.save('maximalFrequentPatternsAtMinSupCount100.txt')

maximalFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Maximal Frequent patterns were generated successfully using MaxFp-Growth algorithm 
#Patterns: 1292
Runtime: 2.4696578979492188
Memory (RSS): 474710016
Memory (USS): 438288384


## Discovering Top-k Frequently Occurring Patterns

### Generic



```python
from PAMI.frequentPattern.topk import FAE  as alg

obj = alg.FAE(iFile='inputFileName', k=number of frequently patterns needed)
obj.mine()
obj.save('outputFileName')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 4

In [6]:
from PAMI.frequentPattern.topk import FAE  as alg

obj = alg.FAE(iFile='Transactional_T10I4D100K.csv', k=1000)
obj.mine()
obj.save('topkFrequentPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

 TopK frequent patterns were successfully generated using FAE algorithm.
#Patterns: 1000
Runtime: 6.010847806930542
Memory (RSS): 128352256
Memory (USS): 104841216


## Rare Item Problem - Calculating the items' MIS values

### Generic



```python
from PAMI.extras.calculateMISValues import usingBeta as ub
cd = ub.usingBeta(iFile='inputFileName',
        beta=percentageOfItemsFrequency, LS=lowestMinimumSupportForAnItem) #using default tab separator
cd.calculateMIS()
cd.save('outputFileName')
```



### Example 5

In [7]:
from PAMI.extras.calculateMISValues import usingBeta as ub
cd = ub.usingBeta(iFile='Transactional_T10I4D100K.csv',
        beta=0.5, LS=100) #using default tab separator
cd.calculateMIS()
cd.save('MIS.txt')

## Rare Item Problem - Mining Frequent Patterns using Multiple Minimum Support Values

### Generic


```python
from PAMI.multipleMinimumSupportBasedFrequentPattern.basic \
    import CFPGrowthPlus as alg

obj = alg.CFPGrowthPlus(iFile='inputFileName',
        MIS='MIS.txt')  #using default tab separator
obj.mine()         
obj.save('outputFileName')
print('Total No of patterns: ' +
    str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 6

In [8]:
from PAMI.multipleMinimumSupportBasedFrequentPattern.basic \
    import CFPGrowthPlus as alg

obj = alg.CFPGrowthPlus(iFile='Transactional_T10I4D100K.csv',
        MIS='MIS.txt')  #using default tab separator
obj.mine()
obj.save('frequentPatternsMultipleMinimumSupports.txt')
print('Total No of patterns: ' +
    str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using Conditional Frequent Pattern Growth algorithm
Total No of patterns: 4947
Runtime: 4.567728042602539
Memory (RSS): 135168000
Memory (USS): 111689728


## Discovering Correlated Patterns

### Generic



```python
from PAMI.correlatedPattern.basic import CoMine as alg

obj = alg.CoMine(iFile='inputFileName',
        minSup=minimumSupportValue, minAllConf=minimumAllConfidenceValue)
obj.mine()
obj.save('correlatedPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 7

In [9]:
from PAMI.correlatedPattern.basic import CoMine as alg

obj = alg.CoMine(iFile='Transactional_T10I4D100K.csv', minSup=300, minAllConf=0.5)
obj.mine()
obj.save('correlatedPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Correlated patterns were generated successfully using CoMine algorithm
#Patterns: 723
Runtime: 3.1105270385742188
Memory (RSS): 299974656
Memory (USS): 276463616


## Discovering Relative Frequent Patterns

### Generic



```python
from PAMI.relativeFrequentPattern.basic import RSFPGrowth as alg

obj = alg.RSFPGrowth(iFile='inputFileName', minSup=minimumSupportCount, minRS=minimumRelativeSupportValue)

obj.mine()
obj.save('outputFileName')

relativeFrequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 8

In [10]:
from PAMI.relativeFrequentPattern.basic import RSFPGrowth as alg

obj = alg.RSFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=300, minRS=0.75)

obj.mine()
obj.save('relativeFrequentPatterns.txt')

relativeFrequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Relative support frequent patterns were generated successfully using RSFPGrowth algorithm
#Patterns: 4540
Runtime: 5.259850025177002
Memory (RSS): 181321728
Memory (USS): 169787392


## Discovering Fault-Tolerant Frequent Patterns

### Generic



```python
from PAMI.faultTolerantFrequentPattern.basic import FTFPGrowth as alg

obj = alg.FTFPGrowth(iFile='inputFileName', minSup=minimumSupportValue, itemSup=minimumSupportAnItemHasToMaintain, minLength=minimumLengthOfAnItemset, faultTolerance=faultTolerantValue, sep="\t")
obj.mine()

print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 9

In [11]:
from PAMI.faultTolerantFrequentPattern.basic import FTFPGrowth as alg

obj = alg.FTFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=100, itemSup=100, minLength=3, faultTolerance=1, sep="\t")

obj.mine()

print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using frequentPatternGrowth algorithm
#Patterns: 4540
Runtime: 4.621241092681885
Memory (RSS): 642809856
Memory (USS): 631275520


## Discovering Association Rules From the Frequent Patterns

### Generic



```python
from PAMI.AssociationRules.basic import confidence as alg

obj = alg.confidence('inputFileName', minConf=minimumConfidenceValue)
obj.mine()
obj.printResults()
obj.save("outputFileName")
```



### Example 10

In [12]:
from PAMI.AssociationRules.basic import confidence as alg

obj = alg.confidence('frequentPatternsAtMinSupCount300.txt', minConf=0.75)
obj.mine()
obj.printResults()
obj.save("associationRulesconfidence.csv")

Association rules successfully  generated from frequent patterns 
Total number of Association Rules: 22984
Total Memory in USS: 625049600
Total Memory in RSS 636600320
Total ExecutionTime in ms: 0.017277956008911133
