### This is a library to find the best performing configuration from a set of dimensions (i.e. schemas, partition, storage) which can be specified inside the <b>settings.yaml</b> file in the resource

In [None]:
%pip install PAPyA==0.1.0

### Load the configuration file and log files location for the experiment

<p>Configurations for SP2Bench Data</p>

In [None]:

config_sp2bench = "settings.yaml" # config file location
logs_sp2bench = "log" # logs file location

<p>Configurations for Watdiv Data</p>

In [None]:
config_watdiv = "settings_watdiv.yaml" # config file location
logs_watdiv = "log_watdiv" # logs file location

#### <u>Configuration file</u> <br>
The configuration file is a yaml data-serialization language which has two main parts, the dimensions and the number of query experiments. You can add more dimensions here or change these existing dimensions to anything you need

<i>Example :</i> 
```yaml
dimensions:
    schemas: ["st", "vt", "pt", "extvt", "wpt"]
    partition: ["horizontal", "predicate", "subject"]
    storage: ["csv", "avro", "parquet", "orc"]

query: 11
```

#### <u>Log file structure</u> <br>
the structure of the log files must follow the order of dimensions in the configuration file (i.e. {schemas}.{partition}.{storage}.txt) and the subfolders should be the ranking sets of the experiments (i.e. dataset sizes)

<i>Example :</i>
```
UI Module
└───log
    │
    |───100M
    |    │   st.horizontal.csv.txt
    |    │   st.horizontal.avro.txt
    |    │   ...
    │
    └───250M
        |   st.horizontal.csv.txt
        │   st.horizontal.avro.txt
        │   ...
```

### Single Dimensional Ranking

<b>SDRank</b> is a class module from PAPyA library to calculate ranking score _R_ for each dimension independently that operates over a log-based structure that user specified on the configuration file.<br> 
The value of _R_ represents the performance of a particular configuration (higher value means better performing configuration). We used Ranking Function _R_ below to calculate the rank scores:

$$R =\sum \limits _{r=1} ^{d} \frac{O_{dim} * (d-r)}{|Q| * (d-1)}, 0<R<=1$$

$d$         : total number of parameters (options) in a particular dimension<br>
$O_{dim}$   : number of occurences of the dimension placed at rank $r$ (Rank 1, Rank 2, Rank 3, ...)<br>
$|Q|$       : total number of queries

### PAPyA.Rank.SDRank

#### <i>class</i> Rank.<b>SDRank</b>(<i>config_path, log_path, ranking_sets, dimension</i>)
<i>Parameters:</i> <br>
&emsp; <b>config_path : str</b><br>
&emsp;&emsp;<small>Specify the path to your configuration file(s). <i>i.e ./UIModule/settings_watdiv.yaml</small></i><br>
&emsp;<b>log_path : str</b><br>
&emsp;&emsp;<small>Specify the path to your log file(s). <i>i.e ./UI Module/log_watdiv</small></i><br>
&emsp;<b>ranking_sets : str</b><br>
&emsp;&emsp;<small>Ranking sets of user choice. <i>i.e dataset sizes (100M)</small></i><br>
&emsp;<b>dimension : str</b><br>
&emsp;&emsp;<small>A single dimension to be ranked. <i>i.e schemas</small></i><br>


In [None]:
# this class takes single dimension and dataset sizes as parameters that user specified inside their log files
from Rank import SDRank

schemaSDRank = SDRank(config_watdiv, logs_watdiv, '100M', 'schemas')
partitionSDRank = SDRank(config_watdiv, logs_watdiv, '250M', 'partition')
storageSDRank = SDRank(config_watdiv, logs_watdiv, '250M', 'storage')

### Rank.SDRank.calculateRank

#### SDRank.<b>calculateRank</b>(<i>*args</i>)
<small>The function that automates calculating the rank scores of a single dimension using the Ranking Function above.</small><br><br>
<small>Returns a table of configurations which is sorted based on the best performing configuration according to their Ranking Score along with number of occurences of the dimension being placed at the rank _r_ (1st, 2nd, 3rd, ...)</small><br><br>
<i>Parameters:</i> <br>
&emsp; <b>*args : str or list</b><br>
&emsp;&emsp;<small>This method takes an arbitrary number of parameters of strings and lists.<br>
&emsp;&emsp;&ensp;str -> slice the table according to string input. <i>i.e. "predicate" will slice the table by the <b>predicate</b> partitioning</i><br>
&emsp;&emsp;&ensp;list -> remove some queries out of the ranking calculations. <i>i.e [7,8,9] will remove query <b>7, 8,and 9</b> from the calculation</small></i><br>


In [None]:
# single dimension ranking by storage without excluding queries
storageSDRank.calculateRank()

In [None]:
# single dimension ranking by storage excluding query 7,8,and 9
excludeQuery = [7,8,9]
storageSDRank.calculateRank(excludeQuery)

In [None]:
# slicing partition single dimension ranking by predicate partitioning
partitionSDRank.calculateRank('horizontal')

In [None]:
# slicing schema single dimension ranking by predicate partitioning and csv storage format while excluding some queries
schemasSDRank.calculateRank('predicate', 'csv', [3,4,5])

### Rank.SDRank.plotRadar

#### SDRank.<b>plotRadar</b>()

<small>Ranking over one dimension is insufficient when it counts multiple dimensions. The presence of trade-offs reduces the accuracy of single dimension ranking functions which could be seen in the radar plot below.</small><br><br>
<small>This method returns a radar chart that shows the presence of trade-offs by using the single dimension ranking criterion that reduces the accuracy of the other dimensions</small>

In [None]:
# This example shows a figure of the top configuration of ranking by schema is optimized towards its dimension only, ignoring the other two dimension.
from Rank import SDRank
SDRank(config_watdiv, logs_watdiv, '100M', 'schemas').plotRadar()

In addition to radar plot, PAPyA also provides visualization that shows the performance of a single dimension parameters that user can choose in terms of their rank scores<br>
This <b>plot</b> method takes a single argument which is the view projection option that user can specify

Schema SD Ranks pivoting Storage formats for Predicate Partitioning

In [None]:
from Rank import SDRank
# example of schema dimension plots
config = "settings.yaml"
logs = "log"

SDRank(config, logs, '100M', 'schemas').plot('predicate')
SDRank(config, logs, '100M', 'storage').plot('st')
SDRank(config, logs, '100M', 'partition').plot('csv')

In [None]:
config = "settings_watdiv.yaml" # config file location
logs = "log_watdiv" # logs file location

from Rank import SDRank

queries = ['Q11', 'Q14']
schemaSDRank = SDRank(config, logs, '100M', 'schemas').plotBox(queries)

In [None]:
# example of MDRank class with 100M dataset size as ranking set of the experiment
from Rank import MDRank

config = "settings_watdiv.yaml"
logs = "log_watdiv"

multiDimensionRank = MDRank(config, logs, '250M')

In [None]:
# this is the top 5 configurations according to paretoQ method sorted from best to worst
multiDimensionRank.paretoQ().head()

In [None]:
# this is the top 5 configurations according to paretoAgg method sorted from best to worst
multiDimensionRank.paretoAgg().head()

The <b>plot</b> method shows the solutions for _paretoAgg_ as shades of green areas projected in a three dimensional space

In [None]:
multiDimensionRank.plot()

In [None]:
# both conformance and coherence classes takes a list of ranking criterion that the user can specify
from Ranker import Conformance, Coherence

config = 'settings_watdiv.yaml'
logs = 'log_watdiv'

conformance_set = ['schemas', 'partition', 'storage', 'paretoQ', 'paretoAgg']
coherence_set = ['schemas', 'partition', 'storage', 'paretoQ', 'paretoAgg']

conf = Conformance(config, logs, '100M', conformance_set, 5, 28)
coh = Coherence(config, logs,coherence_set)

In [None]:
conf.run()

In [None]:
conf.configurationQueryRanks(dimension = 'paretoAgg', mode=0)

In [None]:
coh.run('250M', '500M')

In [None]:
# only takes single dimensions
coh.heatMap('100M', "500M", dimension='paretoQ')

In [None]:
coh.heatMapSubtract('100M', '250M', '500M', dimension='paretoQ')

In [None]:
import ahpy

query_comparison = {('Q1', 'Q2'): 1/7, ('Q1', 'Q3'): 1, ('Q1', 'Q4'): 1/9, ('Q1', 'Q5'): 1/3,
                    ('Q1', 'Q6'): 1/5, ('Q1', 'Q7'): 1/7, ('Q1', 'Q8'): 1/9, ('Q1', 'Q9'): 1/3, ('Q1', 'Q10'): 1, ('Q1', 'Q11'): 1,
                    ('Q2', 'Q3'): 7, ('Q2', 'Q4'): 1/9, ('Q2', 'Q5'): 3,
                    ('Q2', 'Q6'): 5, ('Q2', 'Q7'): 1, ('Q2', 'Q8'): 1/9, ('Q2', 'Q9'): 3, ('Q2', 'Q10'): 7, ('Q2', 'Q11'): 7,
                    ('Q3', 'Q4'): 1/9, ('Q3', 'Q5'): 1/3,
                    ('Q3', 'Q6'): 1/5, ('Q3', 'Q7'): 1/7, ('Q3', 'Q8'): 1/9, ('Q3', 'Q9'): 1/3, ('Q3', 'Q10'): 1, ('Q3', 'Q11'): 1,
                    ('Q4', 'Q5'): 3,
                    ('Q4', 'Q6'): 5, ('Q4', 'Q7'): 7, ('Q4', 'Q8'): 1, ('Q4', 'Q9'): 3, ('Q4', 'Q10'): 9, ('Q4', 'Q11'): 9,
                    ('Q5', 'Q6'): 1/5, ('Q5', 'Q7'): 1/7, ('Q5', 'Q8'): 1/9, ('Q5', 'Q9'): 1, ('Q5', 'Q10'): 3, ('Q5', 'Q11'): 3,
                    ('Q6', 'Q7'): 1/7, ('Q6', 'Q8'): 1/9, ('Q6', 'Q9'): 3, ('Q6', 'Q10'): 5, ('Q6', 'Q11'): 5,
                    ('Q7', 'Q8'): 1/9, ('Q7', 'Q9'): 3, ('Q7', 'Q10'): 7, ('Q7', 'Q11'): 7,
                    ('Q8', 'Q9'): 3, ('Q8', 'Q10'): 9, ('Q8', 'Q11'): 9,
                    ('Q9', 'Q10'): 3, ('Q9', 'Q11'): 3,
                    ('Q10', 'Q11'): 1,}

queries = ahpy.Compare(name='Queries', comparisons=query_comparison, precision=3, random_index='saaty')

print(queries.target_weights)

print(queries.consistency_ratio)