This notbook (almost) replicates Figure 4a presented in the paper "*Amplified Early Stopping Bias: Overestimated Performance with Deep Learning*" as an example. Specifically, we run the code to train a multi-layer Perceptron on random Gaussian vectors for multiple network depth, training sample sizes, and input feutere sizes.  

Run the code below if you are using Google Colab (or probably also on other cloud services).

In [1]:
!git clone https://github.com/NonaRjb/DeepOverestimation.git

Cloning into 'DeepOverestimation'...
remote: Enumerating objects: 202, done.[K
remote: Counting objects: 100% (202/202), done.[K
remote: Compressing objects: 100% (131/131), done.[K
remote: Total 202 (delta 116), reused 147 (delta 65), pack-reused 0 (from 0)[K
Receiving objects: 100% (202/202), 88.05 KiB | 371.00 KiB/s, done.
Resolving deltas: 100% (116/116), done.


In [2]:
%cd DeepOverestimation
%pwd

/content/DeepOverestimation


'/content/DeepOverestimation'

In [3]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install mat73
!{sys.executable} -m pip install mne
!{sys.executable} -m pip install torchmetrics

Collecting mat73
  Downloading mat73-0.65-py3-none-any.whl.metadata (3.6 kB)
Downloading mat73-0.65-py3-none-any.whl (19 kB)
Installing collected packages: mat73
Successfully installed mat73-0.65
Collecting mne
  Downloading mne-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading mne-1.8.0-py3-none-any.whl (7.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m91.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mne
Successfully installed mne-1.8.0
Collecting torchmetrics
  Downloading torchmetrics-1.5.1-py3-none-any.whl.metadata (20 kB)
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.11.8-py3-none-any.whl.metadata (5.2 kB)
Downloading torchmetrics-1.5.1-py3-none-any.whl (890 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m890.6/890.6 kB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lightning_utilities-0.11.8-py3-none-any.whl (26 kB)
Installing collecte

## The original values
In the original experiment that results in the numbers presented in Fig. 4a, we used the following values for different inputs to the Python program:

1. **D (input feature size):** d_array = (4 8 16 32 64 128 256 512)
2. **N (number of training samples):** n_array = (50 100 200 400 800 1600 3200)
3. **L (network depth):** l_array = (1 4 32 64)
4. **H (network width):** 16
5. **O (optimizer):** AdamW

Also we used a batch size of 16 and a learning rate of 0.0001. The total number of epochs was at most 500.

## What we run here
As using all the list values from the above list takes a long time to run, we will use a smaller list to test the code. Specifically, we will run the code with:
1. **D:** d_array = (16)
2. **N:** n_array = (50 100 200 400 800)
3. **L:** l_array = (1 32)

In [None]:
import itertools

# Define your lists of variables
list_l = [1, 32]
list_n = [50, 100, 200, 400, 800]
list_d = [16]

# Iterate over all combinations of $l, $n, and $d
for l, n, d in itertools.product(list_l, list_n, list_d):
    print(f"Running command with l={l}, n={n}, d={d}")
    command = f"python3 train_random.py -b 16 --lr 0.0001 --epochs 500 --hidden_size 16 -l {l} -n {n} --n_test 5000 -d {d} -r 2 --optim adamw --seed 42 --save_path ./out/ --experiment H16_dnl"
    print(f"Running: {command}")
    !{command}

Running command with l=1, n=50, d=16
Running: python3 train_random.py -b 16 --lr 0.0001 --epochs 500 --hidden_size 16 -l 1 -n 50 --n_test 5000 -d 16 -r 2 --optim adamw --seed 42 --save_path ./out/ --experiment H16_dnl
device:  cuda
best epoch = 0
Train Loss = 0.6293611228466034, Train ROC-AUC = 0.698039174079895
Val Loss = 0.7232206463813782, Val ROC-AUC = 0.5
Test Loss = 0.7039472538823137, Test ROC-AUC = 0.500819206237793

best epoch = 498
Train Loss = 0.6710297167301178, Train ROC-AUC = 0.6587302088737488
Val Loss = 0.664358913898468, Val ROC-AUC = 0.5
Test Loss = 0.7027428776692277, Test ROC-AUC = 0.4986627995967865

best epoch = 0
Train Loss = 0.6523978114128113, Train ROC-AUC = 0.7215685844421387
Val Loss = 0.7087868452072144, Val ROC-AUC = 0.5
Test Loss = 0.7027921067259182, Test ROC-AUC = 0.4939705729484558

best epoch = 499
Train Loss = 0.6616694629192352, Train ROC-AUC = 0.6904761791229248
Val Loss = 0.6881256699562073, Val ROC-AUC = 0.6666666865348816
Test Loss = 0.701516592

## Visualization

To visualize the results, we use the visualization code in `data_analysis/visualize_results.py`.

In [None]:
%run data_analysis/visualize_results.py -x "n" -y "l" --y_vals 1 32 --experiment "H16_dnl" --task "line_plot_average" --root_path "./out/"