# Discovering High Utility patterns in Big Data Using UPGrowth Algorithm

In this tutorial, we will discuss the first approaches to find High Utility patterns in big data using UPGrowth algorithm.

[__Basic approach:__](#basicApproach) Here, we present the steps to discover High Utility patterns using a single minimum utility value

***

## <a id='basicApproach'>Basic approach: Executing UPGrowth on a single dataset at a particular minimum utility value</a>

#### Step 0 : Install latest version of PAMI library

In [1]:
!pip install -U PAMI

Collecting PAMI
  Downloading pami-2023.12.20.1-py3-none-any.whl (885 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m885.2/885.2 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting resource (from PAMI)
  Downloading Resource-0.2.1-py2.py3-none-any.whl (25 kB)
Collecting validators (from PAMI)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting sphinx-rtd-theme (from PAMI)
  Downloading sphinx_rtd_theme-2.0.0-py2.py3-none-any.whl (2.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
Collecting JsonForm>=0.0.2 (from resource->PAMI)
  Downloading JsonForm-0.0.2.tar.gz (2.4 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting JsonSir>=0.0.2 (from resource->PAMI)
  Downloading JsonSir-0.0.2.tar.gz (2.2 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-easyconfig>=0.1.0 (from resource->PAMI)
  Downloading Python_EasyConfig-0.1.7-py2.py3-n

#### Step 1: Import the UPGrowth algorithm

In [3]:
from PAMI.highUtilityPattern.basic import UPGrowth  as alg

In [4]:
!wget -nc https://u-aizu.ac.jp/~udayrage/datasets/utilityDatabases/Utility_T10I4D100K.csv

--2023-12-20 03:14:43--  https://u-aizu.ac.jp/~udayrage/datasets/utilityDatabases/Utility_T10I4D100K.csv
Resolving u-aizu.ac.jp (u-aizu.ac.jp)... 150.95.161.176, 150.31.244.160
Connecting to u-aizu.ac.jp (u-aizu.ac.jp)|150.95.161.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7356594 (7.0M) [text/csv]
Saving to: ‘Utility_T10I4D100K.csv’


2023-12-20 03:14:49 (1.37 MB/s) - ‘Utility_T10I4D100K.csv’ saved [7356594/7356594]



#### Step 2: Specify the following input parameters

In [7]:
inputFile = 'Utility_T10I4D100K.csv'
 #Users can also specify this constraint between 0 to 1.
minUtilCount=100000
seperator='\t'

#### Step 3: Execute the UPGrowth algorithm

In [None]:
obj = alg.UPGrowth(iFile=inputFile, minUtil=minUtilCount,  sep=seperator)    #initialize
obj.mine()            #Start the mining process

#### Step 4: Storing the generated patterns

##### Step 4.1: Storing the generated patterns in a file

In [None]:
obj.save(outFile='highUtilityPatternsMinUtil30000.txt')

##### Step 4.2. Storing the generated patterns in a data frame

In [None]:
frequentPatternsDF= obj.getPatternsAsDataFrame()

#### Step 5: Getting the statistics

##### Step 5.1: Total number of discovered patterns

In [None]:
print('Total No of patterns: ' + str(len(frequentPatternsDF)))

##### Step 5.2: Runtime consumed by the mining algorithm

In [None]:
print('Runtime: ' + str(obj.getRuntime()))

##### Step 5.3: Total Memory consumed by the mining algorithm

In [None]:
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

# Advanced Tutorial on Implementing UPGrowth Algorithm

In this tutorial, we will discuss the second approach to find High Utility patterns in big data using UPGrowth algorithm.

[__Advanced approach:__](#advApproach) Here, we generalize the basic approach by presenting the steps to discover High Utility patterns using multiple minimum utility values.

***

#### In this tutorial, we explain how the UPGrowth algorithm  can be implemented by varying the minimum utility values

#### Step 1: Import the UPGrowth algorithm and pandas data frame

In [None]:
import UGrowth  as alg
import pandas as pd

#### Step 2: Specify the following input parameters

In [None]:
inputFile = 'Utility_T10I4D100K.csv'
minUtilList=[100000, 120000, 140000, 160000, 180000]
seperator='\t'
result = pd.DataFrame(columns=['algorithm', 'minSup', 'patterns', 'runtime', 'memory'])
#initialize a data frame to store the results of HMiner algorithm

#### Step 3: Execute the HMiner algorithm using a for loop

In [None]:
algorithm = 'UPGrowth'  #specify the algorithm name
for minimumUtility in minUtilList:
    obj = alg.UPGrowth(iFile=inputFile, minUtil=minimumUtility, sep=seperator)
    obj.mine()
    #store the results in the data frame
    result.loc[result.shape[0]] = [algorithm, minimumUtility, len(obj.getPatterns()), obj.getRuntime(), obj.getMemoryRSS()]

High Utility patterns were generated successfully using UPGrowth algorithm


In [None]:
print(result)

#### Step 5: Visualizing the results

##### Step 5.1 Importing the plot library

In [None]:
from PAMI.extras.graph import plotLineGraphsFromDataFrame as plt

##### Step 5.2. Plotting the number of patterns

In [None]:
ab = plt.plotGraphsFromDataFrame(result)
ab.plotGraphsFromDataFrame() #drawPlots()

### Step 6: Saving the results as latex files

In [None]:
from PAMI.extras.graph import generateLatexFileFromDataFrame as gdf
gdf.generateLatexCode(result)