## Grid Searches {#sec-app-grid}


In [None]:
projectdir = splitpath(pwd()) |>
    ss -> joinpath(ss[1:findall([s == "CounterfactualTraining.jl" for s in ss])[1]]...) 
cd(projectdir)

using CTExperiments
using CTExperiments.CSV
using CTExperiments.DataFrames
using CTExperiments.StatsBase

using DotEnv
DotEnv.load!()

In [None]:
res_dir = ENV["FINAL_GRID_RESULTS"]

To assess the hyperparameter sensitivity of our proposed training regime we ran multiple large grid searches for all of our synthetic datasets. We have grouped these grid searches into multiple categories: 

1. **Generator Parameters** (@sec-app-grid-gen): Investigates the effect of changing hyperparameters that affect the counterfactual outcomes during the training phase.
2. **Penalty Strengths** (@sec-app-grid-pen): Investigates the effect of changing the penalty strengths in out proposed objective (@eq-obj).
3. **Other Parameters** (@sec-app-grid-train): Investigates the effect of changing other training parameters, including the total number of generated counterfactuals in each epoch.

We begin by summarizing the high-level findings in @sec-app-grid-hl. For each of the categories, @sec-app-grid-gen to @sec-app-grid-train then present all details including the exact parameter grids, average predictive performance outcomes and key evaluation metrics for the generated counterfactuals. 

### Evaluation Details

To measure predictive performance, we compute the accuracy and F1-score for all models on test data (@tbl-acc-gen, @tbl-acc-pen, @tbl-acc-train). 

We evaluated 

### High-Level Findings {#sec-app-grid-hl}

Overall, we find that Counterfactual Training (CT) achieves it key objectives consistently across all hyperparameter settings and also broadly across datasets: plausibility is improved  We do observe strong sensitivity to certain hyperparameters, but clear and manageable patterns emerge in those cases. With respect to the underlying training data distribution, we find that outcomes for CT seem to depend to some degree on class separability.

#### Predictive Performance

We find that CT is associated with little to no decrease in average predictive performance for our synthetic datasets: test accuracy and F1-scores decrease by at most ~1 percentage point, but generally much less (@tbl-acc-gen, @tbl-acc-pen, @tbl-acc-train). Variation across hyperparameters is negligible as indicated by small standard deviations for these metrics across the board. 

#### Key Counterfactual Outcomes



### Generator Parameters {#sec-app-grid-gen}


In [None]:
grid_dir = joinpath(res_dir, "gen_params/mlp")
data_dirs = readdir(grid_dir) |> x -> joinpath.(grid_dir, x) |> x -> x[isdir.(x)]
eval_grids = (p -> EvaluationGrid(joinpath(p,"grid_config.toml"))).(data_dirs)
data_names = basename.(data_dirs)
suffix = "evaluation/results/ce/decision_threshold_exper---lambda_energy_exper---maxiter_exper---maxiter---decision_threshold_exper/"

The hyperparameter grid with varying generator parameters during training is shown in @nte-gen-params-final-run-train. The corresponding evaluation grid used for these experiments is shown in @nte-gen-params-final-run-eval.

::: {#nte-gen-params-final-run-train .callout-note}

## Training Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/grid_config.toml")) 
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

::: {#nte-gen-params-final-run-eval .callout-note}

## Evaluation Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/evaluation/evaluation_grid_config.toml"))
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

#### Accuracy

::: {#tbl-acc-gen}

::: {.content-hidden unless-format="pdf"}


In [None]:
#| output: asis
df = CTExperiments.aggregate_performance(eval_grids; byvars=["objective"]) 
get_table_inputs(df, "mean";backend=Val(:latex)) |>
    inputs -> tabulate_results(inputs; wrap_table=false)

:::

Predictive performance measures by dataset and objective averaged across training-phase parameters (@nte-gen-params-final-run-train) and evaluation-phase parameters (@nte-gen-params-final-run-eval).

:::

#### Plausibility


In [None]:
#| output: asis

fig_label_prefix = "grid-gen_params-plaus"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the plausibility measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "plausibility_distance_from_target.png"
fig_caption = "Average outcomes for the plausibility measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption) 
_str = join(include_img_commands, "\n\n")
println(_str)

#### Cost


In [None]:
#| output: asis

fig_label_prefix = "grid-gen_params-cost"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the cost measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "distance.png"
fig_caption = "Average outcomes for the cost measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption)
_str = join(include_img_commands, "\n\n")
println(_str)

### Penalty Strengths {#sec-app-grid-pen}


In [None]:
grid_dir = joinpath(res_dir, "penalties/mlp")
data_dirs = readdir(grid_dir) |> x -> joinpath.(grid_dir, x) |> x -> x[isdir.(x)]
eval_grids = (p -> EvaluationGrid(joinpath(p,"grid_config.toml"))).(data_dirs)
data_names = basename.(data_dirs)
suffix = "evaluation/results/ce/lambda_adversarial---lambda_energy_reg---lambda_energy_diff---lambda_adversarial---lambda_adversarial/"

The hyperparameter grid with varying penalty strengths during training is shown in @nte-pen-final-run-train. The corresponding evaluation grid used for these experiments is shown in @nte-pen-final-run-eval.

::: {#nte-pen-final-run-train .callout-note}

## Training Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/grid_config.toml")) 
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

::: {#nte-pen-final-run-eval .callout-note}

## Evaluation Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/evaluation/evaluation_grid_config.toml"))
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

#### Accuracy

::: {#tbl-acc-pen}

::: {.content-hidden unless-format="pdf"}


In [None]:
#| output: asis
df = CTExperiments.aggregate_performance(eval_grids; byvars=["objective"]) 
get_table_inputs(df, "mean";backend=Val(:latex)) |>
    inputs -> tabulate_results(inputs; wrap_table=false)

:::

Predictive performance measures by dataset and objective averaged across training-phase parameters (@nte-pen-final-run-train) and evaluation-phase parameters (@nte-pen-final-run-eval).

:::

#### Plausibility


In [None]:
#| output: asis

fig_label_prefix = "grid-pen-plaus"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the plausibility measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "plausibility_distance_from_target.png"
fig_caption = "Average outcomes for the plausibility measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption) 
_str = join(include_img_commands, "\n\n")
println(_str)

#### Cost


In [None]:
#| output: asis

fig_label_prefix = "grid-pen-cost"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the cost measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "distance.png"
fig_caption = "Average outcomes for the cost measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption)
_str = join(include_img_commands, "\n\n")
println(_str)

### Other Parameters {#sec-app-grid-train}


In [None]:
grid_dir = joinpath(res_dir, "training_params/mlp")
data_dirs = readdir(grid_dir) |> x -> joinpath.(grid_dir, x) |> x -> x[isdir.(x)]
eval_grids = (p -> EvaluationGrid(joinpath(p,"grid_config.toml"))).(data_dirs)
data_names = basename.(data_dirs)
suffix = "evaluation/results/ce/burnin---nce---nepochs---burnin---burnin/"

The hyperparameter grid with other varying training parameters is shown in @nte-train-final-run-train. The corresponding evaluation grid used for these experiments is shown in @nte-train-final-run-eval.

::: {#nte-train-final-run-train .callout-note}

## Training Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/grid_config.toml")) 
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

::: {#nte-train-final-run-eval .callout-note}

## Evaluation Phase


In [None]:
#| output: asis
dict = CTExperiments.from_toml(joinpath(grid_dir, "lin_sep/evaluation/evaluation_grid_config.toml"))
println(CTExperiments.dict_to_quarto_markdown(dict))

:::

#### Accuracy

::: {#tbl-acc-train}

::: {.content-hidden unless-format="pdf"}


In [None]:
#| output: asis
df = CTExperiments.aggregate_performance(eval_grids; byvars=["objective"]) 
get_table_inputs(df, "mean";backend=Val(:latex)) |>
    inputs -> tabulate_results(inputs; wrap_table=false)

:::

Predictive performance measures by dataset and objective averaged across training-phase parameters (@nte-train-final-run-train) and evaluation-phase parameters (@nte-train-final-run-eval).

:::

#### Plausibility


In [None]:
#| output: asis

fig_label_prefix = "grid-train-plaus"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the plausibility measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "plausibility_distance_from_target.png"
fig_caption = "Average outcomes for the plausibility measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption) 
_str = join(include_img_commands, "\n\n")
println(_str)

#### Cost


In [None]:
#| output: asis

fig_label_prefix = "grid-train-cost"
fig_labels = (nm -> "fig-$(fig_label_prefix)-$nm").(data_names)
_str = "The results with respect to the cost measure are shown in @$(fig_labels[1]) to @$(fig_labels[end])."
println(_str)

In [None]:
#| output: asis
 
imgfname = "distance.png"
fig_caption = "Average outcomes for the cost measure across hyperparameters."
full_paths = joinpath.(data_dirs, joinpath(suffix,imgfname))
include_img_commands = CTExperiments.get_img_command(data_names, full_paths, fig_labels; fig_caption)
_str = join(include_img_commands, "\n\n")
println(_str)