# Format and export the experimental results

#### Authors

* Juan Carlos Alfaro Jiménez
* Juan Ángel Aledo Sánchez
* José Antonio Gámez Martín

In this notebook, we format and export (`.csv` and `.tex` files) the experimental results. Below, we detail the steps.

## 1. Arguments

First, we add the command line arguments:

In [None]:
import argparse

In [None]:
description = "Format and export the experimental results."

In [None]:
parser = argparse.ArgumentParser(description=description)

* The path to the experimental results:

In [None]:
arg = "--source"

In [None]:
type = str

In [None]:
help = "Path to the experimental results"

In [None]:
parser.add_argument(arg, type=type, help=help);

* The path to the tables:

In [None]:
arg = "--destination"

In [None]:
type = str

In [None]:
help = "Path to the tables"

In [None]:
parser.add_argument(arg, type=type, help=help);

* The name of the target output variable:

In [None]:
arg = "--output"

In [None]:
default = "test_score"

In [None]:
type = str

In [None]:
help = "Name of the target output variable"

In [None]:
parser.add_argument(arg, default=default, type=type, help=help);

* The number of decimal digits for the numeric output:

In [None]:
arg = "--digits"

In [None]:
default = 3

In [None]:
type = int

In [None]:
help = "Number of decimal digits for the numeric output"

In [None]:
parser.add_argument(arg, default=default, type=type, help=help);

* The methods to filter:

In [None]:
arg = "--methods"

In [None]:
default = ".*"

In [None]:
type = str

In [None]:
help = "Methods to filter"

In [None]:
parser.add_argument(arg, default=default, type=type, help=help);

* The datasets to filter:

In [None]:
arg = "--datasets"

In [None]:
default = ".*"

In [None]:
type = str

In [None]:
help = "Datasets to filter"

In [None]:
parser.add_argument(arg, default=default, type=type, help=help);

Now, we parse the command line arguments:

In [None]:
from pathlib import Path

In [None]:
arguments = Path("arguments.txt").read_text("utf-8").split(" ")

In [None]:
args = parser.parse_args(arguments)

And rename the variables:

In [None]:
source = args.source

In [None]:
source

In [None]:
destination = args.destination

In [None]:
destination

In [None]:
output = args.output

In [None]:
output

In [None]:
digits = args.digits

In [None]:
digits

In [None]:
methods = args.methods

In [None]:
methods

In [None]:
datasets = args.datasets

In [None]:
datasets

## 2. Load

Second, we get the files with the experiments results (`source`):

In [None]:
import os

In [None]:
pattern = os.path.join("work", source, "**", "*.csv")

In [None]:
import glob

In [None]:
files = glob.glob(pattern, recursive=True)

Now, we filter the methods (`methods`) and datasets (`datasets`):

In [None]:
import re

In [None]:
pattern = os.path.join("work", ".*", datasets, methods + ".csv")

In [None]:
r = re.compile(pattern)

In [None]:
files = filter(r.match, files)

Finally, we read the experimental results for the target output variable (`output`) and include a column with the file:

In [None]:
usecols = [output]

In [None]:
import pandas as pd

In [None]:
read_csv = lambda file: pd.read_csv(file, usecols=usecols).assign(file=file)

In [None]:
objs = map(read_csv, files)

In [None]:
results = pd.concat(objs)

In [None]:
results

## 3. Format

Third, we extract the method and the dataset from the filename:

In [None]:
df = results.file.str.split(os.sep, expand=True)

In [None]:
df = df.iloc[:, -2:]

In [None]:
df.columns = ["dataset", "method"]

In [None]:
df["method"] = df.method.str.replace(".csv", "", regex=False)

In [None]:
df

And include this information in the experimental results:

In [None]:
results = results.drop("file", axis=1)

In [None]:
objs = [results, df]

In [None]:
results = pd.concat(objs, axis=1)

In [None]:
results

Now, we group by `method` and `dataset` and compute the mean and standard deviation:

In [None]:
by = ["method", "dataset"]

In [None]:
func = ["mean", "std"]

In [None]:
results = results.groupby(by).agg(func)

In [None]:
results

And extract the statistics:

In [None]:
level = -1

In [None]:
mean = results[output]["mean"].unstack(level=level)

In [None]:
mean

In [None]:
std = results[output]["std"].unstack(level=level)

In [None]:
std

Then, we round by the number of decimal digits for the numeric output (`digits`):

In [None]:
func = ("{:" + "." + "{digits}" + "f}").format

In [None]:
mean = mean.round(digits).applymap(func, digits=digits)

In [None]:
mean

In [None]:
std = std.round(digits).applymap(func, digits=digits)

In [None]:
std

Finally, we create a tabular with the mean and standard deviation:

In [None]:
tabular = mean + " $ " + "\pm" + " $ " + std

In [None]:
tabular

## 4. Export

Fourth, we export the mean and tabular information to the directory with the tables (`destination`):

In [None]:
destination = os.path.join("work", destination, output)

In [None]:
Path(destination).mkdir(parents=True, exist_ok=True)

In [None]:
csv = os.path.join(destination, "mean.csv")

In [None]:
mean.to_csv(csv)

In [None]:
tex = os.path.join(destination, "tabular.tex")

In [None]:
tabular.to_latex(tex, escape=False)

Finally, we write the destination directory in a file for the `HTML` export:

In [None]:
Path("destination.txt").write_text(destination, encoding="utf-8");