# Perfomance PACE (paper - technical evaluation)

PACE was designed/implemented to be used in a notebook environment. Hence, a notebook should be used to evaluate its performance.

We need to evaluate three spects:
1. Missingness computation (time in sec)
2. Visualisation (time in sec)
3. RAM (`psutil.virtual(memory)`)

The results of the evaluation are written to a csv file. 

<line>
    
Todo:
    
- [x] write results to file
- [x] load parameters from `config.yaml` file: same file should be used for UpSet (Python script) adn PACE (notebook)
- [ ] set seeds

In [1]:
from pace.membership import Membership
from pace.plots import PlotSession
from technical_evaluation import generate_pattern
from datetime import datetime
import psutil
import yaml
import pandas as pd
import csv

## Load config file

In [2]:
config_yaml = open("config.yaml")
config = yaml.load(config_yaml, Loader=yaml.FullLoader)

In [3]:
num_rows = config["dataset"]["num_rows"]
num_cols = config["dataset"]["num_cols"]
patterns = config["patterns"]
filename = config["output"]["filename"]
package = "pace"
output_file = f"{filename}_{package}.csv"

## Functions

In [4]:
def eval_pace(df, package, pattern, num_rows, num_cols):
    """
    Evaluates the performance of PACE by timing the 
    missingness computation and the visualisation of the provided data.

    Parameters
    ----------
    df : pd.DataFrame
        data frame
    package : str
        name of the evaluated visualisation package 
    pattern : str
        name of the pattern used to generate data
    num_rows : int
        number of rows in the dataset (records)
    num_cols : int
        number of columns in the dataset
    Returns 
    -------
    """
    try:
        results = [
            [
                package,
                pattern,
                num_rows,
                num_cols,
                "START",
                None,
                psutil.virtual_memory(),
            ]
        ]
        # compute missingness
        start_time = datetime.now()
        data_missing = Membership.from_data_frame(df)
        time2 = datetime.now()
        td = time2 - start_time
        results.append(
            [
                package,
                pattern,
                num_rows,
                num_cols,
                "COMPUTE",
                td.seconds + td.microseconds / 1e6,
                psutil.virtual_memory(),
            ]
        )
        # visualisations
        time3 = datetime.now()
        session = PlotSession(df)
        time4 = datetime.now()
        session.add_plot("a")
        time5 = datetime.now()
        td = time5 - time3 # decide what's a fair comparison
        results.append(
            [
                package,
                pattern,
                num_rows,
                num_cols,
                "VISUALIZE",
                td.seconds + td.microseconds / 1e6,
                psutil.virtual_memory(),
            ]
        )
        return results
    except:
        raise
    

## Generate data and evaluate

In [5]:
with open(output_file, "w", newline="\n") as csvfile:
    w = csv.writer(csvfile, delimiter=",")
    w.writerow(
        [
            "Package",
            "Pattern",
            "Num_rows",
            "Num_cols",
            "Stage",
            "Tims (s)",
            "RAM",
        ]
    )

    for pattern in patterns:
        for row in num_rows:
            for col in num_cols:
                # step 1: generate data
                df = generate_pattern(pattern,row, col)
                # step 2: evaluate pace on data
                results = eval_pace(df, package, pattern, row, col)
                # step 3: write results to output
                w.writerows(results)

In [7]:
type(results)

list