<a href="https://colab.research.google.com/github/UdayLab/PAMI/blob/main/notebooks/extras/stats/TemporalDatabase.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Knowing the Statistics of a Temporal Database

In this notebook, we will learn the process to print the statistics of a transactional database. These statistics are crucial to specify the constraints, such as _minimum support_ and _maximum periodicity_  values.

### Step 1: Install the latest version of PAMI

In [1]:
!pip install -U pami

Collecting pami
  Downloading pami-2024.5.24.1-py3-none-any.whl (999 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m999.9/999.9 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting resource (from pami)
  Downloading Resource-0.2.1-py2.py3-none-any.whl (25 kB)
Collecting validators (from pami)
  Downloading validators-0.28.1-py3-none-any.whl (39 kB)
Collecting sphinx-rtd-theme (from pami)
  Downloading sphinx_rtd_theme-2.0.0-py2.py3-none-any.whl (2.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting discord.py (from pami)
  Downloading discord.py-2.3.2-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
Collecting deprecated (from pami)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting JsonForm>=0.0.2 (from resource->pami)
  Downloading JsonForm-0.0.2.tar.gz (

### Step 2: Download a sample temporal database

In [3]:
!wget -nc https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/Temporal_T10I4D100K.csv

--2024-05-23 22:55:39--  https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/Temporal_T10I4D100K.csv
Resolving u-aizu.ac.jp (u-aizu.ac.jp)... 150.95.161.176, 150.31.244.160
Connecting to u-aizu.ac.jp (u-aizu.ac.jp)|150.95.161.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4606762 (4.4M) [text/csv]
Saving to: ‘Temporal_T10I4D100K.csv’


2024-05-23 22:55:47 (703 KB/s) - ‘Temporal_T10I4D100K.csv’ saved [4606762/4606762]



### Step 3: Check the format of the file by printing few lines

In [4]:
!head -6 Temporal_T10I4D100K.csv

1	25	52	164	240	274	328	368	448	538	561	630	687	730	775	825	834
2	39	120	124	205	401	581	704	814	825	834
3	35	249	674	712	733	759	854	950
4	39	422	449	704	825	857	895	937	954	964
5	15	229	262	283	294	352	381	708	738	766	853	883	966	978
6	26	104	143	320	569	620	798


### Step 4: Printing the statistics of this temporal database

In [8]:
#import the class file
import PAMI.extras.dbStats.TemporalDatabase as stats

#specify the file name
inputFile = 'Temporal_T10I4D100K.csv'

#initialize the class
obj=stats.TemporalDatabase(inputFile,sep='\t')

#execute the class
obj.run()

#Printing each of the database statistics
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
print(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj.getVarianceTransactionLength()}')

#saving the distribution of items' frequencies and transactional lengths
itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
obj.save(itemFrequencies, 'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')

#Alternative apporach to print all of the database statistics and plot them
print("---- Other Format -----")
obj.printStats()
obj.plotGraphs()

Database size : 99913
Total number of items : 870
Database sparsity : 0.9883887027691103
Minimum Transaction Size : 1
Average Transaction Size : 10.10182859087406
Maximum Transaction Size : 29
Standard Deviation Transaction Size : 3.667115963877195
Variance in Transaction Sizes : 13.447874088362232
---- Other Format -----
Database size : 99913
Number of items : 870
Minimum Transaction Size : 1
Average Transaction Size : 10.10182859087406
Maximum Transaction Size : 29
Minimum Inter Arrival Period : 1
Average Inter Arrival Period : 1.0
Maximum Inter Arrival Period : 1
Minimum periodicity : 112
Average periodicity : 2124.6436781609195
Maximum periodicicty : 90296
Standard Deviation Transaction Size : 3.667115963877195
Variance : 13.447874088362232
Sparsity : 0.9883887027691103


NameError: name 'plt' is not defined