# Create tables/figures for dual update experiments in ICLR submission
In this notebook we'll create the tables and figures for the dual updates experiments in the ICRL paper. We logged runs using W&B which were run with:
```sh
python dual_updates.py -m ++wandb.use_wandb=True +experiment=airfoil_dual_updates,boston_dual_updates,protein_dual_updates ++random_seed=42,10,48,412,46392 hydra/launcher=lumi_30mins
```

This submits 5x seeds for each of the 3 UCI data sets (Airfoil/Boston/Protein). Each job logs time to train on $\mathcal{D}_1$, time to retrain on $\mathcal{D}_1 \cup \mathcal{D}_2$, time to perform dual updates with $\mathcal{D}_2$, the NLPD after retraining from scratch and the NLPD after performing dual updates.

In [1]:
import pandas as pd
import wandb

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Load data

Either download the runs from W&B (and save as csv) or load data from a csv.

In [2]:
download_runs = False

In [3]:
WANDB_ENTITY = "aalto-ml"
WANDB_PROJECT = "sl-fast-updates"

In [4]:
# After reverting to before break (so from new-main branch)
WANDB_RUNS = [
    # Boston
    "tcda79gx", # seed=42
    "ysw1s7pv", # seed=10
    "6yy0k2h7", # seed=48
    "mrw720qv", # seed=412
    "i23ept1c", # seed=46392
    # Airfoil
    "uwibbgre", # seed=42
    "8cq742an", # seed=10
    "823psd1g", # seed=48
    "hb8bfn4b", # seed=412
    "8guyubn8", # seed=46392
    # Protein
    "kya84v8f", # seed=42
    "qiyhp2yx", # seed=10
    "8sxhb0cw", # seed=48
    "qhzqrykg", # seed=412
    "tn880e9t", # seed=46392
]

In [5]:
save_path: str = "./csv/dual-updates.csv"

if download_runs:
    df = save_wandb_runs_as_csv(save_path=save_path)
else:
    df = pd.read_csv(save_path)

## Create Table 3 - SFR's dual updates are fast and effective

In [6]:
def format_time_mean_pm_std(row):
    mean = f"{row['time_mean']:.2f}"
    std = f"{(row['time_std']):.2f}"
    return "\\valtime{" + mean + "}{" + std + "}"

def format_nlpd_mean_pm_std(row):
    mean = f"{row['nlpd_mean']:.2f}"
    std = f"{(row['nlpd_std']):.2f}"
    return "\\val{" + mean + "}{" + std + "}"

def create_dual_updates_table(df):
    # Only keep models we want in table (i.e. remove NN MAP)
    df = df[df["model"].isin(["SFR (GP)"])]
    
    # Only keep training methods we want
    df = df[df["method"].isin(["Train D1", "Train D1 -> Update D2", "Train D1+D2"])]

    # Calculatet mean/std of NLPD/time over 5 seeds
    df_with_stats = (
        df.groupby(["dataset", "method"])
        .agg(
            nlpd_mean=("nlpd", "mean"),
            nlpd_std=("nlpd", "std"),
            time_mean=("time", "mean"),
            time_std=("time", "std"),
            time_count=("time", "count"),
        )
        .reset_index()
    )

    # Add columns with latex formatted mean +/- std
    df_with_stats["nlpd_mean_pm_std"] = df_with_stats.apply(format_nlpd_mean_pm_std, axis=1)
    df_with_stats["time_mean_pm_std"] = df_with_stats.apply(format_time_mean_pm_std, axis=1)
    # return df_with_stats

    # Format the table
    updates_table = df_with_stats.pivot(
        index="dataset",
        # index=["dataset", "N", "D", "C"],
        columns="method",
        values=["nlpd_mean_pm_std", "time_mean_pm_std"],
    #     values="nlpd_mean_pm_std",
    )
    updates_table.index.names = [None]
    updates_table.columns.names = [None, None]

    # Rename the columns
    updates_table.rename(columns={"nlpd_mean_pm_std": "NLPD $\downarrow$", 
                              "time_mean_pm_std": "Time (s) $\downarrow$",
                              "Train D1": "Train w. $\mathcal{D}_1$", 
                              "Train D1 -> Update D2": "Updates w. $\mathcal{D}_2$ (Ours)", 
                              "Train D1+D2": "Retrain w. $\mathcal{D}_1 \cup \mathcal{D}_2$",
                              "method": " "
                              }, inplace=True)
    # Rename the data sets
    updates_table.rename(index={"boston": "\sc Boston",
                            "airfoil": "\sc Airfoil",
                            "protein": "\sc Protein",
                            "method": "",
                           }, inplace=True)

    print(updates_table.to_latex(column_format="l|l|ll|l|ll", escape=False, multicolumn_format="c|"))
    with open("./tabs/dual_updates_table.tex", "w") as file:
        file.write(
            updates_table.to_latex(column_format="l|l|ll|l|ll", escape=False, multicolumn_format="c|")
        )
    return updates_table
 
create_dual_updates_table(df)

\begin{tabular}{l|l|ll|l|ll}
\toprule
 & \multicolumn{3}{c|}{NLPD $\downarrow$} & \multicolumn{3}{c|}{Time (s) $\downarrow$} \\
 & Train w. $\mathcal{D}_1$ & Updates w. $\mathcal{D}_2$ (Ours) & Retrain w. $\mathcal{D}_1 \cup \mathcal{D}_2$ & Train w. $\mathcal{D}_1$ & Updates w. $\mathcal{D}_2$ (Ours) & Retrain w. $\mathcal{D}_1 \cup \mathcal{D}_2$ \\
\midrule
\sc Airfoil & \val{0.60}{0.02} & \val{0.50}{0.02} & \val{0.47}{0.03} & \valtime{19.65}{0.99} & \valtime{0.04}{0.00} & \valtime{18.22}{2.27} \\
\sc Boston & \val{0.23}{0.01} & \val{0.16}{0.02} & \val{0.13}{0.02} & \valtime{11.45}{1.93} & \valtime{0.02}{0.00} & \valtime{7.48}{0.67} \\
\sc Protein & \val{0.42}{0.01} & \val{0.15}{0.01} & \val{0.14}{0.00} & \valtime{30.17}{8.62} & \valtime{0.82}{0.06} & \valtime{30.61}{6.27} \\
\bottomrule
\end{tabular}



Unnamed: 0_level_0,NLPD $\downarrow$,NLPD $\downarrow$,NLPD $\downarrow$,Time (s) $\downarrow$,Time (s) $\downarrow$,Time (s) $\downarrow$
Unnamed: 0_level_1,Train w. $\mathcal{D}_1$,Updates w. $\mathcal{D}_2$ (Ours),Retrain w. $\mathcal{D}_1 \cup \mathcal{D}_2$,Train w. $\mathcal{D}_1$,Updates w. $\mathcal{D}_2$ (Ours),Retrain w. $\mathcal{D}_1 \cup \mathcal{D}_2$
\sc Airfoil,\val{0.60}{0.02},\val{0.50}{0.02},\val{0.47}{0.03},\valtime{19.65}{0.99},\valtime{0.04}{0.00},\valtime{18.22}{2.27}
\sc Boston,\val{0.23}{0.01},\val{0.16}{0.02},\val{0.13}{0.02},\valtime{11.45}{1.93},\valtime{0.02}{0.00},\valtime{7.48}{0.67}
\sc Protein,\val{0.42}{0.01},\val{0.15}{0.01},\val{0.14}{0.00},\valtime{30.17}{8.62},\valtime{0.82}{0.06},\valtime{30.61}{6.27}
