# Discovering Frequent Patterns based on multiple minimum supports in Big Data Using CFPGrowth Algorithm

In this tutorial, we will discuss the first approaches to find frequent patterns in big data using CFPGrowth algorithm.

[__Basic approach:__](#basicApproach) Here, we present the steps to discover frequent patterns using a multiple minimum support file

***

## <a id='basicApproach'>Basic approach: Executing CFPGrowth on a single dataset at a particular minimum support value</a>

#### Step 0 : Install latest version of PAMI library

In [1]:
!pip install -U PAMI



In [6]:
!wget https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv

--2023-11-15 08:02:18--  https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv
Resolving u-aizu.ac.jp (u-aizu.ac.jp)... 150.95.161.176, 150.31.244.160
Connecting to u-aizu.ac.jp (u-aizu.ac.jp)|150.95.161.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4019277 (3.8M) [text/csv]
Saving to: ‘Transactional_T10I4D100K.csv’


2023-11-15 08:02:18 (21.7 MB/s) - ‘Transactional_T10I4D100K.csv’ saved [4019277/4019277]



In [7]:
from PAMI.extras.calculateMISValues import usingBeta as ub
inputFile = "Transactional_T10I4D100K.csv"
beta = 0.8
LS = 100
sep = "\t"
output = "MIS_T10.txt"
cd = ub.usingBeta(inputFile, beta, LS, sep)
cd.calculateMIS()
cd.save(output)

#### Step 1: Import the CFPGrowth algorithm

In [8]:
from PAMI.multipleMinimumSupportBasedFrequentPattern.basic import CFPGrowth as alg

#### Step 2: Specify the following input parameters

In [9]:
inputFile = 'Transactional_T10I4D100K.csv'

MIS = 'MIS_T10.txt'

seperator='\t'       

#### Step 3: Execute the CFPGrowth algorithm

In [10]:
obj = alg.CFPGrowth(iFile=inputFile, MIS=MIS, sep=seperator)    #initialize
obj.startMine()            #Start the mining process

870


KeyboardInterrupt: 

#### Step 4: Storing the generated patterns

##### Step 4.1: Storing the generated patterns in a file

In [None]:
obj.save(outFile='frequentPatternsMinSupCount100.txt')

##### Step 4.2. Storing the generated patterns in a data frame

In [None]:
frequentPatternsDF= obj.getPatternsAsDataFrame()

#### Step 5: Getting the statistics

##### Step 5.1: Total number of discovered patterns 

In [None]:
print('Total No of patterns: ' + str(len(frequentPatternsDF)))

##### Step 5.2: Runtime consumed by the mining algorithm

In [None]:
print('Runtime: ' + str(obj.getRuntime()))

##### Step 5.3: Total Memory consumed by the mining algorithm

In [None]:
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

# Advanced Tutorial on Implementing CFPGrowth Algorithm

In this tutorial, we will discuss the second approach to find frequent patterns in big data using CFPGrowth algorithm.

[__Advanced approach:__](#advApproach) Here, we generalize the basic approach by presenting the steps to discover frequent patterns using multiple minimum support values.

***

#### In this tutorial, we explain how the CFPGrowth algorithm  can be implemented by varying the minimum support values

#### Step 1: Import the ECLAT algorithm and pandas data frame

In [1]:
from PAMI.frequentPattern.basic import ECLAT  as alg
import pandas as pd

#### Step 2: Specify the following input parameters

In [2]:
inputFile = '/userData/likhitha/new/frequentPattern/transactional_T10I4D100K.csv'
seperator='\t'
minimumSupportCountList = [100, 150, 200, 250, 300]
#minimumSupport can also specified between 0 to 1. E.g., minSupList = [0.005, 0.006, 0.007, 0.008, 0.009]

result = pd.DataFrame(columns=['algorithm', 'minSup', 'patterns', 'runtime', 'memory'])
#initialize a data frame to store the results of ECLAT algorithm

#### Step 3: Execute the ECLAT algorithm using a for loop

In [4]:
algorithm = 'ECLAT'  #specify the algorithm name
for minSupCount in minimumSupportCountList:
    obj = alg.ECLAT(inputFile, minSup=minSupCount, sep=seperator)
    obj.startMine()
    #store the results in the data frame
    result.loc[result.shape[0]] = [algorithm, minSupCount, len(obj.getPatterns()), obj.getRuntime(), obj.getMemoryRSS()]


Frequent patterns were generated successfully using ECLAT algorithm
Frequent patterns were generated successfully using ECLAT algorithm
Frequent patterns were generated successfully using ECLAT algorithm
Frequent patterns were generated successfully using ECLAT algorithm
Frequent patterns were generated successfully using ECLAT algorithm


In [None]:
print(result)

#### Step 5: Visualizing the results

##### Step 5.1 Importing the plot library

In [None]:
from PAMI.extras.graph import plotLineGraphsFromDataFrame as plt

##### Step 5.2. Plotting the number of patterns

In [None]:
ab = plt.plotGraphsFromDataFrame(result)
ab.plotGraphsFromDataFrame() #drawPlots()

### Step 6: Saving the results as latex files

In [None]:
from PAMI.extras.graph import generateLatexFileFromDataFrame as gdf
gdf.generateLatexCode(result)