# Discovering Frequent Patterns in Big Data Using FPGrowth Algorithm

In this tutorial, we will discuss two approaches to find frequent patterns in big data using FPGrowth algorithm.

[__Basic approach:__](#basicApproach) Here, we present the steps to discover frequent patterns using a single minimum support value

***

## <a id='basicApproach'>Basic approach: Executing FPGrowth on a single dataset at a particular minimum support value</a>

#### Step 1: Import the FPGrowth algorithm

In [1]:
from PAMI.frequentPattern.basic import FPGrowth  as alg

#### Step 2: Specify the following input parameters

In [2]:
inputFile = 'https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv'

minimumSupportCount = 100  #Users can also specify this constraint between 0 to 1.

seperator='\t'      

#### Step 3: Execute the FPGrowth algorithm

In [3]:
obj = alg.FPGrowth(iFile=inputFile, minSup=minimumSupportCount, sep=seperator)    #initialize
obj.mine()            #Start the mining process

Frequent patterns were generated successfully using frequentPatternGrowth algorithm


#### Step 4: Storing the generated patterns

##### Step 4.1: Storing the generated patterns in a file

In [4]:
obj.save('frequentPatternsMinSupCount1000.txt')

##### Step 4.2. Storing the generated patterns in a data frame

In [5]:
frequentPatternsDF= obj.getPatternsAsDataFrame()

#### Step 5: Getting the statistics

##### Step 5.1: Total number of discovered patterns 

In [6]:
print('Total No of patterns: ' + str(len(frequentPatternsDF)))

Total No of patterns: 27517


##### Step 5.2: Runtime consumed by the mining algorithm

In [7]:
print('Runtime: ' + str(obj.getRuntime()))

Runtime: 15.191829919815063


##### Step 5.3: Total Memory consumed by the mining algorithm

In [8]:
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Memory (RSS): 496418816
Memory (USS): 477413376


# Advanced Tutorial on Implementing FPGrowth Algorithm

In this tutorial, we will discuss second approach to find frequent patterns in big data using FPGrowth algorithm.

[__Advanced approach:__](#advApproach) Here, we generalize the basic approach by presenting the steps to discover frequent patterns using multiple minimum support values.

***

### In this tutorial, we explain how the Apriori algorithm  can be implemented by varying the minimum support values

#### Step 1: Import the FPGrowth algorithm and pandas data frame

In [9]:
from PAMI.frequentPattern.basic import FPGrowth  as alg
import pandas as pd

#### Step 2: Specify the following input parameters

In [10]:
inputFile = 'https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv'
seperator='\t'
minimumSupportCountList = [100, 150, 200, 250, 300]
#minimumSupport can also specified between 0 to 1. E.g., minSupList = [0.005, 0.006, 0.007, 0.008, 0.009]
neighborFile='T10_utility_neighbour.txt'
result = pd.DataFrame(columns=['algorithm', 'minSup', 'patterns', 'runtime', 'memory'])
#initialize a data frame to store the results of FPGrowth algorithm

#### Step 3: Execute the FPGrowth algorithm using a for loop

In [11]:
algorithm = 'FPGrowth'  #specify the algorithm name
for minSupCount in minimumSupportCountList:
    obj = alg.FPGrowth('https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv', minSup=minSupCount,sep=seperator)
    obj.mine()
    #store the results in the data frame
    result.loc[result.shape[0]] = [algorithm, minSupCount, len(obj.getPatterns()), obj.getRuntime(), obj.getMemoryRSS()]


TypeError: __init__() got an unexpected keyword argument 'nFile'

In [None]:
print(result)

#### Step 5: Visualizing the results

##### Step 5.1 Importing the plot library

In [None]:
from PAMI.extras.graph import plotLineGraphsFromDataFrame as plt

##### Step 5.2. Plotting the number of patterns

In [None]:
ab = plt.plotGraphsFromDataFrame(result)
ab.plotGraphsFromDataFrame() #drawPlots()

### Step 6: Saving the results as latex files

In [None]:
from PAMI.extras.graph import generateLatexFileFromDataFrame as gdf
gdf.generateLatexCode(result)