<a href="https://colab.research.google.com/github/vanithakattumuri/Hands-on-Pattern-Mining/blob/main/chapter10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 10: Pattern Discovery in Utility Databases

## Install the PAMI library

In [1]:
!pip install --upgrade pami

Collecting pami
  Downloading pami-2024.12.10.1-py3-none-any.whl.metadata (80 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/80.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━[0m [32m71.7/80.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.3/80.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting resource (from pami)
  Downloading Resource-0.2.1-py2.py3-none-any.whl.metadata (478 bytes)
Collecting validators (from pami)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting sphinx-rtd-theme (from pami)
  Downloading sphinx_rtd_theme-3.0.2-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting discord.py (from pami)
  Downloading discord.py-2.4.0-py3-none-any.whl.metadata (6.9 kB)
Collecting fastparquet (from pami)
  Downloading fastparquet-2024.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.

## Download the dataset

In [2]:
!wget -nc https://web-ext.u-aizu.ac.jp/~udayrage/datasets/utilityDatabases/Utility_T10I4D100K.csv

--2024-12-11 09:10:00--  https://web-ext.u-aizu.ac.jp/~udayrage/datasets/utilityDatabases/Utility_T10I4D100K.csv
Resolving web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)... 163.143.103.34
Connecting to web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)|163.143.103.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7356594 (7.0M) [text/csv]
Saving to: ‘Utility_T10I4D100K.csv’


2024-12-11 09:10:02 (5.08 MB/s) - ‘Utility_T10I4D100K.csv’ saved [7356594/7356594]



## Finding high utility patterns

### Generic



```python
from PAMI.highUtilityPattern.basic
    import EFIM  as alg#import the algorithm

obj = alg.EFIM(iFile='utilityDatabase.csv',
                minUtil=minUtilCount,
                sep='\t') #initialize
obj.mine()            #start the mining process

obj.save('utilityPatterns.txt') #save the patterns

utilityPatternsDF= obj.getPatternsAsDataFrame()
print('# patterns: ' + str(len(utilityPatternsDF)))  
print('Runtime: ' + str(obj.getRuntime()))

print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 1

In [3]:
from PAMI.highUtilityPattern.basic import EFIM  as alg #import the algorithm

obj = alg.EFIM(iFile='Utility_T10I4D100K.csv',
                minUtil=50000,
                sep='\t') #initialize
obj.mine()            #start the mining process

obj.save('utilityPatterns.txt') #save the patterns

utilityPatternsDF= obj.getPatternsAsDataFrame()
print('# patterns: ' + str(len(utilityPatternsDF)))
print('Runtime: ' + str(obj.getRuntime()))

print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

High Utility patterns were generated successfully using EFIM algorithm
# patterns: 5968
Runtime: 153.3342638015747
Memory (RSS): 259395584
Memory (USS): 236847104


## Discovering High Utility Frequent Patterns

### Generic




```python
from PAMI.highUtilityFrequentPattern.basic
    import HUFIM  as alg

obj = alg.HUFIM(iFile=inputFile,
    minUtil=minUtilCount,
    minSup=minimumSupportCount,
    sep='\t')
obj.mine()   

obj.save('utilityFrequentPatternsAtMinSup.txt')

utilityFPDF= obj.getPatternsAsDataFrame()  
print('Total No of patterns: ' + str(len(utilityFPDF))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 2

In [4]:
from PAMI.highUtilityFrequentPattern.basic import HUFIM  as alg

obj = alg.HUFIM(iFile='Utility_T10I4D100K.csv',
    minUtil=50000,
    minSup=1000,
    sep='\t')
obj.mine()

obj.save('utilityFrequentPatternsAtMinSup.txt')

utilityFPDF= obj.getPatternsAsDataFrame()
print('Total No of patterns: ' + str(len(utilityFPDF)))
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

High Utility Frequent patterns were generated successfully using HUFIM algorithm
Total No of patterns: 382
Runtime: 75.3496265411377
Memory (RSS): 261861376
Memory (USS): 239280128
