<a href="https://colab.research.google.com/github/UdayLab/Hands-on-Pattern-Mining/blob/main/chapter5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 5: Temporal Databases - Representation, Creation, and Statistics


## Install the PAMI repository

In [None]:
!pip install PAMI

Collecting PAMI
  Downloading pami-2024.11.15.1-py3-none-any.whl.metadata (80 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/80.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━[0m [32m71.7/80.3 kB[0m [31m6.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.3/80.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting resource (from PAMI)
  Downloading Resource-0.2.1-py2.py3-none-any.whl.metadata (478 bytes)
Collecting validators (from PAMI)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting sphinx-rtd-theme (from PAMI)
  Downloading sphinx_rtd_theme-3.0.2-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting discord.py (from PAMI)
  Downloading discord.py-2.4.0-py3-none-any.whl.metadata (6.9 kB)
Collecting fastparquet (from PAMI)
  Downloading fastparquet-2024.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.

## Downloading a sample database

In [None]:
!wget -nc https://web-ext.u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/Temporal_T10I4D100K.csv

--2024-12-03 06:44:20--  https://web-ext.u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/Temporal_T10I4D100K.csv
Resolving web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)... 163.143.103.34
Connecting to web-ext.u-aizu.ac.jp (web-ext.u-aizu.ac.jp)|163.143.103.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4606762 (4.4M) [text/csv]
Saving to: ‘Temporal_T10I4D100K.csv’


2024-12-03 06:44:22 (3.36 MB/s) - ‘Temporal_T10I4D100K.csv’ saved [4606762/4606762]



## Creating a synthetic transactional database

### Generic



```python
from PAMI.extras.syntheticDataGenerator \
    import TemporalDatabase as db
  
obj = db.TemporalDatabase(
        databaseSize=totalTransactions,
        avgItemsPerTransaction=avergeNumberOfItemsToAppearInATransaction,
        numItems=numberOfItemsInADatabase,
        occurrenceProbabilityAtSameTimestamp=probabilityWithWhichTheNextTransactionMustOccurAtTheSameTimestamp,
        occurrenceProbabilityToSkipSubsequentTimestamp=probabilityWithWithTheNextTimestampHasTobeSkipped,
        sep='\t'
        )
obj.create()
obj.save('temporalDatabase.csv')
#read the generated transactions into a dataframe
temporalDataFrame=obj.getTransactions()
#stats
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 1

In [None]:
from PAMI.extras.syntheticDataGenerator \
    import TemporalDatabase as db

obj = db.TemporalDatabase(
        databaseSize=100000,
        avgItemsPerTransaction=10,
        numItems=1000,
        occurrenceProbabilityOfSameTimestamp=0,
        occurrenceProbabilityToSkipSubsequentTimestamp=0,
        sep='\t'
        )
obj.create()
obj.save('temporalDatabase.csv')
#read the generated transactions into a dataframe
temporalDataFrame=obj.getTransactions()
#stats
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Runtime: 21.25088906288147
Memory (RSS): 242675712
Memory (USS): 219865088


## Converting a dataframe into a temporal database

### Generic



```python
from PAMI.extras.convert import DF2DB as alg

obj = alg.DF2DB(dataFrame)
obj.convert2TemporalDatabase(oFile='outputFileName', condition='>=|>|==|!=|<|<=', thresholdValue=thresholdValue)

print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
```



### Example 2

In [None]:
from PAMI.extras.convert import DF2DB as alg
import pandas as pd
import numpy as np

#creating a 5 x 5 dataframe with random values
data = np.random.randint(1, 100, size=(5, 5))
dataFrame = pd.DataFrame(data,
             columns=['Item1', 'Item2', 'Item3', 'Item4', 'Item5']
            )
# Adding a timestamp column with specific values
timestamps = [1, 3, 3, 5, 8]
dataFrame.insert(0, 'timestamp', timestamps)

#converting the database into a transactional database by
#considering values greater than or equal to 36
obj = alg.DF2DB(dataFrame)
obj.convert2TemporalDatabase(oFile='temporalDB.csv',
       condition='>=', thresholdValue=36
     )
print('Runtime: ' + str(obj.getRuntime()))
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))

Runtime: 0.007612943649291992
Memory (RSS): 655667200
Memory (USS): 632975360


## Printing the Statistical Details

### Generic



```python
from PAMI.extras.dbStats import TemporalDatabase as stat

obj = stat.TemporalDatabase("inputFileName")
obj.run()
obj.printStats()
obj.plotGraphs()
```



### Example 3

In [None]:
from PAMI.extras.dbStats import TemporalDatabase as stat

obj = stat.TemporalDatabase("temporalDatabase.csv")
obj.run()
obj.printStats()
obj.plotGraphs()

Database size : 100000
Number of items : 1000
Minimum Transaction Size : 0
Average Transaction Size : 10.0
Maximum Transaction Size : 21
Minimum Inter Arrival Period : 1
Average Inter Arrival Period : 1.0
Maximum Inter Arrival Period : 1
Minimum periodicity : 505
Average periodicity : 746.673
Maximum periodicicty : 1553
Standard Deviation Transaction Size : 5.790027633785525
Variance : 33.52475524755248
