<a href="https://colab.research.google.com/github/UdayLab/Hands-on-Pattern-Mining/blob/main/chapter2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 2: Handling Big Data - Classification, Storage, and Processing Techniques

## Install the PAMI package

In [1]:
!pip install PAMI

zsh:1: command not found: pip


## Downloading a sample file

In [2]:
!wget -nc https://web-ext.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv

File ‘Transactional_T10I4D100K.csv’ already there; not retrieving.



## Converting a CSV file into a Paraquet file

### Generic



```python
from PAMI.extras.convert import CSV2Parquet as alg

obj = alg.CSV2Parquet(inputFile,outputFile,sep)
obj.convert()

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 1: CSV2Paraquet

In [3]:
import PAMI.extras.convert.CSV2Parquet as cp

obj = cp.CSV2Parquet(inputFile='Transactional_T10I4D100K.csv',\
      outputFile='Transactional_T10I4D100K.parquet',sep='\t')
obj.convert()

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Runtime: 0.40280604362487793
Memory (RSS): 255557632
Memory (USS): 199639040


## Converting a Paraquet file into a CSV file

### Generic


```python
from PAMI.extras.convert import Parquet2CSV as alg

obj = alg.Parquet2CSV(inputFile,outputFile,sep)
obj.convert()

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 2: Paraquet2CSV

In [4]:
import PAMI.extras.convert.Parquet2CSV as cp

obj = cp.Parquet2CSV(inputFile='Transactional_T10I4D100K.parquet',\
      outputFile='new_Tran_T10I4D100K.csv',sep='\t')
obj.convert()

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Runtime: 0.8319060802459717
Memory (RSS): 232554496
Memory (USS): 151224320


## Converting a Dataframe into a Particular Database Type

### Generic

```python
from PAMI.extras.convert import DF2DB as alg
import pandas as pd
import numpy as np

obj = alg.DF2DB(dataFrame)
obj.convert2ParticularDatabase(outputFileName, other parameters)

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 3: Dataframe to transactional database

In [5]:
from PAMI.extras.convert import DF2DB as alg
import pandas as pd
import numpy as np

data = np.random.randint(1, 100, size=(1000, 4))
dataFrame = pd.DataFrame(data,
             columns=['Item1', 'Item2', 'Item3', 'Item4']
            )

obj = alg.DF2DB(dataFrame)
obj.convert2TransactionalDatabase(oFile='transactionalDB.csv',
       condition='>=', thresholdValue=36
     )
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Runtime: 0.007760047912597656
Memory (RSS): 228573184
Memory (USS): 152125440


In [6]:
!head transactionalDB.csv #printing the created transactional database

Item1	Item2	Item3
Item1	Item2	Item3
Item1	Item2	Item3
Item1	Item2	Item4
Item1	Item2	Item4
Item2	Item3
Item1	Item2	Item3	Item4
Item2	Item3	Item4
Item1	Item3	Item4
Item1	Item4
