<h4 style="margin:3px;padding:3px;">Walkthrough</h4>

This walkthrough is used to generate the plots and tables in the TabMini paper. For convenience, we have already exported our benchmark results to Microsoft Excel and added a tab in the long format. We have then saved the [Excel file](results/test_scores.xlsx) as well as the tabs in the wide and long format as [test_scores_wide.csv](results/test_scores_wide.csv) and [test_scores_long.csv](results/test_scores_long.csv), respectively. In order to run the cells, you need to have [CriticalDifferenceDiagrams.jl](https://mirkobunse.github.io/CriticalDifferenceDiagrams.jl/stable/), [CSV.jl](https://csv.juliadata.org/stable/), [DataFrames.jl](https://dataframes.juliadata.org/stable/), [PGFPlots.jl](https://kristofferc.github.io/PGFPlotsX.jl/stable/), [Plots.jl](https://docs.juliaplots.org/stable/), [PyCall.jl](https://github.com/JuliaPy/PyCall.jl), [StatsBase.jl](https://juliastats.org/StatsBase.jl/stable/), and [StatsPlots.jl](https://github.com/JuliaPlots/StatsPlots.jl) installed. Additionally, you need the Python libraries [numpy](https://numpy.org/), [pandas](https://pandas.pydata.org/), and [PyMFE](https://pymfe.readthedocs.io/en/latest/).

<h5>Imports</h5>

In [None]:
include("helpers/generate_correlations.jl")
include("helpers/generate_metafeatures.jl")

using CriticalDifferenceDiagrams
using CSV
using DataFrames
using PGFPlots
using Plots
using StatsBase
using StatsPlots


results_wide = CSV.read("results/test_scores_wide.csv", DataFrame)
results_long = CSV.read("results/test_scores_long.csv", DataFrame);

<h5>Meta-Feature Generation</h5>

In [None]:
# write CSV
py"generate_metafeatures"("results/test_scores_wide.csv")
# read CSV
metafeatures = CSV.read("metafeatures.csv", DataFrame);

<h5>Figure 1</h5>

Figure 1 is a composite figure and made up of other figures, generated below.

<h5>Table 1</h5>

Table 1 is constructed with values from the relevant studies.

<h5>Figure 2a</h5>

In [None]:
methods = ["AutoPrognosis" "AutoGluon" "TabPFN" "HyperFast" "Logistic regression"]
sample_size_ranges = [1:12, 13:22, 23:31, 32:39, 40:44]
xticks_labels = ["32 to 100", "101 to 200", "201 to 300", "301 to 400", "401 to 500"]

Q3s = zeros(length(sample_size_ranges), length(methods))
Q2s = zeros(length(sample_size_ranges), length(methods))
Q1s = zeros(length(sample_size_ranges), length(methods))
for (idx_a, approach) in enumerate(methods)
    for (idx_r, sample_size_range) in enumerate(sample_size_ranges)
        Q3s[idx_r, idx_a] = quantile(results_wide[sample_size_range, approach], 0.75)
        Q2s[idx_r, idx_a] = quantile(results_wide[sample_size_range, approach], 0.5)
        Q1s[idx_r, idx_a] = quantile(results_wide[sample_size_range, approach], 0.25)
    end
end

Plots.plot(Q2s,
    ribbon=(Q2s .- Q1s, Q3s .- Q2s),
    fillalpha=0.15,
    ylabel="Mean test AUC",
    xlabel="Sample size range",
    xticks=(1:5, xticks_labels),
    label=methods,
    linewidth=5,
    legend=:bottomright,
    margin=10Plots.mm,
    marker=:dot,
    markersize=6,
    palette=:tab10
)

# Plots.scalefontsizes(1.2)
# savefig("plots/auc.svg")
# savefig("plots/auc.pdf");

<h5>Figure 2b</h5>

In [None]:
cdd_plot = CriticalDifferenceDiagrams.plot(
    results_long,
    :approach,
    :dataset,
    :auc,
    maximize_outcome=true
)

# PGFPlots.save("plots/cdd.svg", cdd_plot)
# PGFPlots.save("plots/cdd.pdf", cdd_plot);

<h5>Dataset Reduction</h5>

As mentioned in the experimental results, we have also performed pairwise mean test AUC comparisons for all datasets that TabPFN was not meta-trained on. This dataset reduction prevented us from finding statistically significant performance differences between logistic regression and the other methods, though (p > 0.05).

In [None]:
# datasets that TabPFN was not meta-trained on
datasets_reduced = [
    # M = 32 - 100 (12 datasets)
    ["analcatdata_aids", "analcatdata_asbestos", "analcatdata_bankruptcy", "analcatdata_creditscore",
    "analcatdata_cyyoung8092", "analcatdata_cyyoung9302", "analcatdata_fraud", "analcatdata_japansolvent",
    "labor", "lupus", "parity5", "postoperative_patient_data"],
    # M = 101 - 200 (6 datasets)
    ["analcatdata_boxing1", "analcatdata_boxing2", "appendicitis", "glass2", "molecular_biology_promoters",
    "mux6"],
    # M = 201 - 300 (1 dataset)
    ["hungarian"],
    # M = 301 - 400 (3 datasets)
    ["bupa", "colic", "horse_colic"],
    # M = 401 - 500 (2 datasets)
    ["clean1", "house_votes_84"]
]

# dataframe using only the reduced datasets
results_long_reduced = DataFrame([String[], String[], Float64[]], names(results_long))
for datasets in datasets_reduced
    for dataset in datasets
        append!(results_long_reduced, results_long[results_long.dataset .== dataset, :])
    end
end

cdd_plot_reduced = CriticalDifferenceDiagrams.plot(
    results_long_reduced,
    :approach,
    :dataset,
    :auc,
    maximize_outcome=true
)

# PGFPlots.save("plots/cdd_reduced.svg", cdd_plot_reduced)
# PGFPlots.save("plots/cdd_reduced.pdf", cdd_plot_reduced);

<h5>Figure 3</h5>

In [None]:
methods = ["AutoPrognosis", "AutoGluon", "TabPFN", "HyperFast"]
py"generate_correlations"(methods, "results/test_scores_wide.csv")

# from correlations.txt
clustering = [0, 0, 0, 1]
complexity = [3, 2, 0, 0]
concept = [0, 4, 0, 0]
general = [0, 0, 0, 0]
infotheory = [2, 0, 0, 0]
itemset = [2, 0, 0, 0]
landmarking = [0, 1, 0, 0]
modelbased = [0, 1, 0, 0]
statistical = [3, 2, 10, 9]

StatsPlots.groupedbar(
        [clustering complexity concept general infotheory itemset landmarking modelbased statistical],
        bar_position=:stack,
        xticks=(1:4, ["AutoPrognosis" "AutoGluon" "TabPFN" "HyperFast"]),
        label=["Clustering" "Complexity" "Concept" "General" "Info theory" "Itemset" "Landmarking" #=
        =# "Model-based" "Statistical"],
        linecolor=:white,
        palette=:tab20
)

# Plots.scalefontsizes(1.2)
# savefig("plots/bar.svg")
# savefig("plots/bar.pdf");

<h5>Table 2</h5>

In [None]:
for characteristic in ["nr_inst", "nr_attr", "freq_class.min", "EPV", "nr_bin"]
    print("$(characteristic):\n
          Mean: $(mean(metafeatures[!, characteristic]))\n
          Std: $(std(metafeatures[!, characteristic], corrected=false))\n
          Min: $(minimum(metafeatures[!, characteristic]))\n
          25%: $(quantile(metafeatures[!, characteristic], 0.25))\n
          50%: $(quantile(metafeatures[!, characteristic], 0.5))\n
          75%: $(quantile(metafeatures[!, characteristic], 0.75))\n
          Max: $(maximum(metafeatures[!, characteristic]))\n")
end;

<h5>Figure 4a</h5>

In [None]:
Plots.boxplot(["TabMini"],
    metafeatures[!, "nr_inst"],
    ylabel="Sample size",
    xticks = false,
    legend=false,
    palette=:tab10,
    ylim=(0, 500)
)

# Plots.scalefontsizes(1.2)
# savefig("plots/boxplot.svg")
# savefig("plots/boxplot.pdf");

<h5>Figure 4b</h5>

In [None]:
Plots.scatter(metafeatures[!, "nr_inst"], metafeatures[!, "nr_attr"],
    xlabel="Sample size",
    ylabel="Feature set size",
    ylim=(0, 70),
    markersize=10,
    markerstrokewidth=0,
    palette=:tab10,
    legend=false
)

# Plots.scalefontsizes(1.5)
# savefig("plots/scatter.svg")
# savefig("plots/scatter.pdf");

<h5>Table 3</h5>

Table 3 is constructed with raw values from our benchmark results.

<h5>Table 4</h5>

Table 4 is constructed with raw values from our benchmark results.

<h5>Figure 5</h5>

Figure 5 is made with the meta-feature correlations generated for Figure 3.