<a href="https://colab.research.google.com/github/UdayLab/Hands-on-Pattern-Mining/blob/main/chapter4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 4: Pattern Discovery in Transactional Databases

## Install PAMI repository

In [None]:
!pip install pami

Collecting pami
  Downloading pami-2024.11.15.1-py3-none-any.whl.metadata (80 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/80.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.3/80.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting resource (from pami)
  Downloading Resource-0.2.1-py2.py3-none-any.whl.metadata (478 bytes)
Collecting validators (from pami)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting sphinx-rtd-theme (from pami)
  Downloading sphinx_rtd_theme-3.0.2-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting discord.py (from pami)
  Downloading discord.py-2.4.0-py3-none-any.whl.metadata (6.9 kB)
Collecting fastparquet (from pami)
  Downloading fastparquet-2024.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting cramjam>=2.3 (from fastparquet->pami)
  Downloading cramjam-2.9.0-cp310-cp310-manylinux_2_17_x86_64.manyl

## Download the sample transactional database

In [None]:
!wget -nc https://web-ext.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv

--2024-12-03 05:57:12--  https://web-ext.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv
Resolving web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)... 163.143.103.34
Connecting to web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)|163.143.103.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4019277 (3.8M) [text/csv]
Saving to: ‘Transactional_T10I4D100K.csv’


2024-12-03 05:57:14 (3.88 MB/s) - ‘Transactional_T10I4D100K.csv’ saved [4019277/4019277]



## Discovering frequent patterns using FP-growth

### Generic

```python
from PAMI.frequentPattern.basic import FPGrowth  as alg

obj = alg.FPGrowth(iFile='inputFileName',minSup=minimumSupportvalue,sep='\t')
obj.mine()
obj.save('outputFileName')

frequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(frequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 1

In [None]:
from PAMI.frequentPattern.basic import FPGrowth  as alg

obj = alg.FPGrowth(iFile='Transactional_T10I4D100K.csv',minSup=300,sep='\t')
obj.mine()
obj.save('frequentPatternsAtMinSupCount300.txt')

frequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(frequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using frequentPatternGrowth algorithm
#Patterns: 4540
Runtime: 11.018220901489258
Memory (RSS): 562528256
Memory (USS): 540168192


## Discovering Closed Frequent Patterns

### Generic



```python
from PAMI.frequentPattern.closed import CHARM  as alg

obj = alg.CHARM(iFile='inputFileName', minSup=minimumSupportValue)
obj.mine()
obj.save('outputFileName')

closedFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))  
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 2

In [None]:
from PAMI.frequentPattern.closed import CHARM  as alg

obj = alg.CHARM(iFile='Transactional_T10I4D100K.csv', minSup=300)
obj.mine()
obj.save('closedFrequentPatterns.txt')

closedFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Closed Frequent patterns were generated successfully using CHARM algorithm
#Patterns: 2856
Runtime: 11.586620092391968
Memory (RSS): 581001216
Memory (USS): 558632960


## Discovering Maximal Frequent Patterns

### Generic

```python
from PAMI.frequentPattern.maximal import MaxFPGrowth  as alg

obj = alg.MaxFPGrowth(iFile='inputFileName', minSup=minimumSupportValue)
obj.mine()
obj.save('outputFileName')

maximalFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 3

In [None]:
from PAMI.frequentPattern.maximal import MaxFPGrowth  as alg

obj = alg.MaxFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=300)
obj.mine()
obj.save('maximalFrequentPatternsAtMinSupCount100.txt')

maximalFPsDF= obj.getPatternsAsDataFrame()

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Maximal Frequent patterns were generated successfully using MaxFp-Growth algorithm 
#Patterns: 1292
Runtime: 11.646287202835083
Memory (RSS): 539312128
Memory (USS): 516968448


## Discovering Top-k Frequently Occurring Patterns

### Generic



```python
from PAMI.frequentPattern.topk import FAE  as alg

obj = alg.FAE(iFile='inputFileName', k=number of frequently patterns needed)
obj.mine()
obj.save('outputFileName')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 4

In [None]:
from PAMI.frequentPattern.topk import FAE  as alg

obj = alg.FAE(iFile='Transactional_T10I4D100K.csv', k=1000)
obj.mine()
obj.save('topkFrequentPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

 TopK frequent patterns were successfully generated using FAE algorithm.
#Patterns: 1000
Runtime: 19.14928388595581
Memory (RSS): 366190592
Memory (USS): 343724032


## Rare Item Problem - Calculating the items' MIS values

### Generic



```python
from PAMI.extras.calculateMISValues import usingBeta as ub
cd = ub.usingBeta(iFile='inputFileName',
        beta=percentageOfItemsFrequency, LS=lowestMinimumSupportForAnItem) #using default tab separator
cd.calculateMIS()
cd.save('outputFileName')
```



### Example 5

In [None]:
from PAMI.extras.calculateMISValues import usingBeta as ub
cd = ub.usingBeta(iFile='Transactional_T10I4D100K.csv',
        beta=0.5, LS=100) #using default tab separator
cd.calculateMIS()
cd.save('MIS.txt')

## Rare Item Problem - Mining Frequent Patterns using Multiple Minimum Support Values

### Generic


```python
from PAMI.multipleMinimumSupportBasedFrequentPattern.basic \
    import CFPGrowthPlus as alg

obj = alg.CFPGrowthPlus(iFile='inputFileName',
        MIS='MIS.txt')  #using default tab separator
obj.mine()         
obj.save('outputFileName')
print('Total No of patterns: ' +
    str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 6

In [None]:
from PAMI.multipleMinimumSupportBasedFrequentPattern.basic \
    import CFPGrowthPlus as alg

obj = alg.CFPGrowthPlus(iFile='Transactional_T10I4D100K.csv',
        MIS='MIS.txt')  #using default tab separator
obj.mine()
obj.save('frequentPatternsMultipleMinimumSupports.txt')
print('Total No of patterns: ' +
    str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using Conditional Frequent Pattern Growth algorithm
Total No of patterns: 4947
Runtime: 17.537547826766968
Memory (RSS): 700235776
Memory (USS): 677855232


## Discovering Correlated Patterns

### Generic



```python
from PAMI.correlatedPattern.basic import CoMine as alg

obj = alg.CoMine(iFile='inputFileName',
        minSup=minimumSupportValue, minAllConf=minimumAllConfidenceValue)
obj.mine()
obj.save('correlatedPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 7

In [None]:
from PAMI.correlatedPattern.basic import CoMine as alg

obj = alg.CoMine(iFile='Transactional_T10I4D100K.csv', minSup=300, minAllConf=0.5)
obj.mine()
obj.save('correlatedPatterns.txt')

print('#Patterns: ' + str(len(obj.getPatternsAsDataFrame())))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Correlated patterns were generated successfully using CoMine algorithm
#Patterns: 723
Runtime: 15.577804803848267
Memory (RSS): 657141760
Memory (USS): 634855424


## Discovering Relative Frequent Patterns

### Generic



```python
from PAMI.relativeFrequentPattern.basic import RSFPGrowth as alg

obj = alg.RSFPGrowth(iFile='inputFileName', minSup=minimumSupportCount, minRS=minimumRelativeSupportValue)

obj.mine()
obj.save('outputFileName')

relativeFrequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 8

In [None]:
from PAMI.relativeFrequentPattern.basic import RSFPGrowth as alg

obj = alg.RSFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=300, minRS=0.75)

obj.mine()
obj.save('relativeFrequentPatterns.txt')

relativeFrequentPatternsDF= obj.getPatternsAsDataFrame()
print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Relative support frequent patterns were generated successfully using RSFPGrowth algorithm
#Patterns: 4540
Runtime: 29.90667486190796
Memory (RSS): 574488576
Memory (USS): 552030208


## Discovering Fault-Tolerant Frequent Patterns

### Generic



```python
from PAMI.faultTolerantFrequentPattern.basic import FTFPGrowth as alg

obj = alg.FTFPGrowth(iFile='inputFileName', minSup=minimumSupportValue, itemSup=minimumSupportAnItemHasToMaintain, minLength=minimumLengthOfAnItemset, faultTolerance=faultTolerantValue, sep="\t")
obj.mine()

print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 9

In [None]:
from PAMI.faultTolerantFrequentPattern.basic import FTFPGrowth as alg

obj = alg.FTFPGrowth(iFile='Transactional_T10I4D100K.csv', minSup=100, itemSup=100, minLength=3, faultTolerance=1, sep="\t")

obj.mine()

print('#Patterns: ' + str(len(relativeFrequentPatternsDF)))
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Frequent patterns were generated successfully using frequentPatternGrowth algorithm
#Patterns: 4540
Runtime: 18.587899923324585
Memory (RSS): 930197504
Memory (USS): 907669504


## Discovering Association Rules From the Frequent Patterns

### Generic



```python
from PAMI.AssociationRules.basic import confidence as alg

obj = alg.confidence('inputFileName', minConf=minimumConfidenceValue)
obj.mine()
obj.printResults()
obj.save("outputFileName")
```



### Example 10

In [None]:
from PAMI.AssociationRules.basic import confidence as alg

obj = alg.confidence('frequentPatternsAtMinSupCount300.txt', minConf=0.75)
obj.mine()
obj.printResults()
obj.save("associationRulesconfidence.csv")

Association rules successfully  generated from frequent patterns 
Total number of Association Rules: 22984
Total Memory in USS: 903217152
Total Memory in RSS 925667328
Total ExecutionTime in ms: 0.036444902420043945
