# Model Training and Evaluation Workflow

## Notebook Overview

This notebook provides a complete and reproducible workflow for training and evaluating **VSLNet** and **VSLBase** models on the Ego4D Natural Language Queries (NLQ) task.

The workflow is divided into four main sections:

1.  **Experiment Setup**: Configures the entire environment. This includes mounting Google Drive, defining a dynamic configuration for the experiment (model type, features, etc.), installing dependencies, cloning the repository, and unpacking the dataset to the local filesystem for performance.
2.  **Symbolic Links & Data Preparation:** Here, we prepare the data for training. This involves creating symbolic links to point the training scripts to the correct data and feature directories. We then run the `prepare_ego4d_dataset.py` script, which preprocesses the annotation files into the format required by the model's data loaders.

3.  **Training:** The final section is dedicated to launching the model training. We will show the command that executes the `main.py` script, using the parameters defined in our configuration, to start a training run. By changing the configuration in Section 1, we are able to reproduce any of the experiments documented in our report (except for the Exstension, see notebook 03).
4.  **Save Results to Drive**: An optional final step to copy the resulting model checkpoints and logs from the Colab environment back to Google Drive for persistent storage.

## 1. Experiment Set-up

### 1.1. Mount Google Drive
We begin by mounting Google Drive to access our datasets and save our experiment results.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### 1.2. Clone Model Repository
Next, we clone the `VSLNet_Code` folder from our GitHub repository. This contains the core Python scripts for the model, training, and evaluation.

In [2]:
%%bash

# Clone the repository (if it doesn't already exist)
if [ ! -d "VSLNet_Code" ]; then
  git clone https://github.com/pietrogiancristofaro2001/ego4d-nlq-project.git
  #The above clones the whole project. Let's move the required code folder
  mv ego4d-nlq-project/VSLNet_Code .
  rm -rf ego4d-nlq-project
  echo "Repository cloned."
else
  echo "Repository already exists."
fi


Repository cloned.


Cloning into 'ego4d-nlq-project'...


Now, we change the current directory of the notebook to `VSLNet_Code` using the `%cd` magic command. This ensures that all subsequent cells will be executed from this path, allowing scripts and utilities to be called directly.

In [4]:
%cd VSLNet_Code

/content/VSLNet_Code


### 1.3. Define Experiment Configuration
This is the main control cell for all our experiments. Modify the variables in this cell to select the model, visual features, text encoder and run number. The `vars.sh` script, which controls the entire workflow, will be generated automatically based on these settings.

In [5]:
# --- CHOOSE OUR EXPERIMENT CONFIGURATION ---

MODEL_USED = "vslbase"  # Options: "vslnet", "vslbase"
FEATURE_TYPE = "omnivore" # Options: "egovlp", "omnivore"
TEXT_ENCODER = "glove"   # Options: "bert", "glove"
RUN_NUMBER = 4          # An integer for the run number, e.g., 1, 2, 3

# --- AUTO-GENERATED SETTINGS ---

# Set feature directory and dimension based on FEATURE_TYPE
if FEATURE_TYPE == "egovlp":
    FEATURE_DIR_NAME = "egovlp_fp16"
    VISUAL_FEATURE_DIM = 256
elif FEATURE_TYPE == "omnivore":
    FEATURE_DIR_NAME = "omnivore_video_swinl_fp16"
    VISUAL_FEATURE_DIM = 1536
else:
    raise ValueError("Invalid FEATURE_TYPE selected.")

# Construct the unique experiment name
EXPERIMENT_NAME = f"{MODEL_USED}_{FEATURE_TYPE}_{TEXT_ENCODER}_run{RUN_NUMBER}"

# --- GENERATE THE vars.sh SCRIPT CONTENT ---

vars_sh_content = f"""
#!/bin/bash

# --- Dynamic Experiment Configuration ---
export NAME={EXPERIMENT_NAME}
export MODEL_NAME={MODEL_USED} # vsl_net or vsl_base
export VISUAL_FEATURE_TYPE={FEATURE_TYPE} # egovlp or omnivore
export TEXT_ENCODER_TYPE={TEXT_ENCODER} # bert or glove
export VISUAL_FEATURE_DIM={VISUAL_FEATURE_DIM}

# --- Static Path Configuration ---
export FEATURE_SOURCE_ZIP_PATH=/content/drive/MyDrive/EgoVisionProject/Data #change with the directory on which there is the ego4d_data.zip
export DRIVE_ZIP_FILENAME=ego4d_data.zip
export MODEL_BASE_DIR=/content/drive/MyDrive/EgoVisionProject/Experiments
export LOCAL_DATA_ROOT=/content/data


# --- Derived Path Configuration ---
export TASK_NAME=nlq_official_v1_$NAME
export BASE_DIR=$LOCAL_DATA_ROOT/dataset/$TASK_NAME
export FEATURE_BASE_DIR=$LOCAL_DATA_ROOT/features/$TASK_NAME/official
export FEATURE_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/{FEATURE_DIR_NAME}
export LOCAL_ANNOTATIONS_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/annotations
export LOCAL_TRAIN_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_train.json
export LOCAL_VAL_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_val.json
export LOCAL_TEST_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_test_unannotated.json
export LOCAL_MODEL_DIR=$LOCAL_DATA_ROOT/experiments
"""

# Write the content to the vars.sh file
with open("vars.sh", "w") as f:
    f.write(vars_sh_content)

print("vars.sh file generated successfully for experiment:")
print(f"--> {EXPERIMENT_NAME}")

vars.sh file generated successfully for experiment:
--> vslbase_omnivore_glove_run4


### 1.4. Install Dependencies
We install all the necessary Python libraries for the project. These are listed in the `requirements.txt` file.

In [6]:
%%bash
%%capture

pip install -r requirements.txt

Collecting submitit (from -r requirements.txt (line 7))
  Downloading submitit-1.5.3-py3-none-any.whl.metadata (7.9 kB)
Collecting terminaltables (from -r requirements.txt (line 9))
  Downloading terminaltables-3.1.10-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting bitsandbytes (from -r requirements.txt (line 16))
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.7

bash: line 1: fg: no job control


### 1.5. Unpack Dataset
We copy the `ego4d_data.zip` file from Drive to the local Colab storage and unzip it.

In [7]:
%%bash
source vars.sh

# Create the local destination directory
mkdir -p "$LOCAL_DATA_ROOT"

# Full path of the zip file on Drive
DRIVE_ZIP_FILE_PATH="$FEATURE_SOURCE_ZIP_PATH/$DRIVE_ZIP_FILENAME"
# Temporary local path to copy the zip file to
LOCAL_TEMP_ZIP_FILE="/content/$DRIVE_ZIP_FILENAME"

if [ -f "$DRIVE_ZIP_FILE_PATH" ]; then
    echo "Copying $DRIVE_ZIP_FILENAME..."
    cp "$DRIVE_ZIP_FILE_PATH" "$LOCAL_TEMP_ZIP_FILE"

    echo "Extracting file to $LOCAL_DATA_ROOT..."
    # -o overwrites existing files, -q for quiet mode
    unzip -o -q "$LOCAL_TEMP_ZIP_FILE" -d "$LOCAL_DATA_ROOT"

    echo "Removing temporary zip file..."
    rm "$LOCAL_TEMP_ZIP_FILE"

    echo "Data setup complete."
else
    echo "ERROR: File not found at $DRIVE_ZIP_FILE_PATH"
    exit 1
fi

Copying ego4d_data.zip...
Extracting file to /content/data...
Removing temporary zip file...
Data setup complete.


## 2. Data Preparation & Symbolic Links  

### 2.1. Create GloVe Symbolic Link
We create the symbolic link for glove. Run this cell only if glove is used in the experimental setup.

In [8]:
%%bash
source vars.sh
CWD=$(pwd)

# --- Symbolic Link for GloVe ---
# Only create this link if the experiment uses GloVe
if [ "$TEXT_ENCODER_TYPE" == "glove" ]; then
  GLOVE_FILE_PATH="$LOCAL_DATA_ROOT/ego4d_data/v1/glove_encoder/glove.840B.300d.txt"
  EXPECTED_GLOVE_PARENT_DIR="$CWD/data/features"
  mkdir -p "$EXPECTED_GLOVE_PARENT_DIR"
  ln -sfn "$GLOVE_FILE_PATH" "$EXPECTED_GLOVE_PARENT_DIR/glove.840B.300d.txt"
  echo -e "\nCreated symlink for GloVe embeddings:"
  ls -l "$EXPECTED_GLOVE_PARENT_DIR"
fi


Created symlink for GloVe embeddings:
total 4
lrwxrwxrwx 1 root root 61 Jun 25 09:07 glove.840B.300d.txt -> /content/data/ego4d_data/v1/glove_encoder/glove.840B.300d.txt


### 2.2. Run Data Preprocessing Script
Now we run the `prepare_ego4d_dataset.py` script. This script reads the raw JSON annotation files (`nlq_train.json`, etc.), processes them, and saves them in a format optimized for training. It also processes the video features.

In [9]:
%%bash
source vars.sh

echo "Creating output directories..."
mkdir -p "$BASE_DIR"
mkdir -p "$FEATURE_BASE_DIR"

echo "Running data preparation script..."
python utils/prepare_ego4d_dataset.py \
    --input_train_split "$LOCAL_TRAIN_SPLIT" \
    --input_val_split "$LOCAL_VAL_SPLIT" \
    --input_test_split "$LOCAL_TEST_SPLIT" \
    --video_feature_read_path "$FEATURE_DIR" \
    --clip_feature_save_path "$FEATURE_BASE_DIR" \
    --output_save_path "$BASE_DIR"

echo "Data preparation finished."

Creating output directories...
Running data preparation script...
Reading [train]: /content/data/ego4d_data/v1/annotations/nlq_train.json
# train: 11291
Writing [train]: /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4/train.json
Reading [val]: /content/data/ego4d_data/v1/annotations/nlq_val.json
# val: 3874
Writing [val]: /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4/val.json
Reading [test]: /content/data/ego4d_data/v1/annotations/nlq_test_unannotated.json
# test: 4004
Writing [test]: /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4/test.json
Data preparation finished.


Extracting features:   0%|          | 0/1659 [00:00<?, ?it/s]Extracting features:   0%|          | 6/1659 [00:00<00:33, 49.51it/s]Extracting features:   1%|          | 15/1659 [00:00<00:23, 71.40it/s]Extracting features:   2%|▏         | 26/1659 [00:00<00:19, 85.23it/s]Extracting features:   2%|▏         | 35/1659 [00:00<00:25, 63.92it/s]Extracting features:   3%|▎         | 43/1659 [00:00<00:30, 52.56it/s]Extracting features:   3%|▎         | 51/1659 [00:00<00:27, 58.17it/s]Extracting features:   4%|▍         | 63/1659 [00:00<00:21, 73.26it/s]Extracting features:   4%|▍         | 72/1659 [00:01<00:25, 62.34it/s]Extracting features:   5%|▍         | 80/1659 [00:01<00:23, 66.10it/s]Extracting features:   5%|▌         | 88/1659 [00:01<00:23, 66.38it/s]Extracting features:   6%|▌         | 96/1659 [00:01<00:23, 65.80it/s]Extracting features:   6%|▋         | 106/1659 [00:01<00:22, 70.45it/s]Extracting features:   7%|▋         | 115/1659 [00:01<00:20, 74.82it/s]Extracting fe

### 2.3. Create Symbolic Links
The training scripts expect the data and feature directories to be in specific locations. We create symbolic links (`ln -sfn`) to point from the expected locations to our actual data folders in the local Colab storage. This avoids modifying the core scripts. We create links for the annotations and video features,processed in the previous cell.

In [10]:
%%bash

source vars.sh

CWD=$(pwd)
#Base directory for symbolic link generation
mkdir -p "$CWD/data/dataset"
# Create also the subdirectory $TASK_NAME below features
mkdir -p "$CWD/data/features/$TASK_NAME"

# 1. Annotations link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/dataset/$TASK_NAME"
ln -sfn "$BASE_DIR" "$CWD/data/dataset/$TASK_NAME"
echo "Annotations link: $CWD/data/dataset/$TASK_NAME -> $BASE_DIR"

# 2. Processed features link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/features/$TASK_NAME/official"
ln -sfn "$FEATURE_BASE_DIR" "$CWD/data/features/$TASK_NAME/official"
echo "Features link: $CWD/data/features/$TASK_NAME/official -> $FEATURE_BASE_DIR"

echo "--- Setup completed. Checks below: ---"
echo "Annotations target ($BASE_DIR) exists?"
ls -ld "$BASE_DIR"
echo "Annotations link ($CWD/data/dataset/$TASK_NAME) points to:"
ls -ld "$CWD/data/dataset/$TASK_NAME"

echo "Features target (FEATURE_BASE_DIR) exists?"
ls -ld "$FEATURE_BASE_DIR"
echo "Features link ($CWD/data/features/$TASK_NAME/official) points to:"
ls -ld "$CWD/data/features/$TASK_NAME/official"

Annotations link: /content/VSLNet_Code/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4 -> /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4
Features link: /content/VSLNet_Code/data/features/nlq_official_v1_vslbase_omnivore_glove_run4/official -> /content/data/features/nlq_official_v1_vslbase_omnivore_glove_run4/official
--- Setup completed. Checks below: ---
Annotations target (/content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4) exists?
drwxr-xr-x 2 root root 4096 Jun 25 09:08 /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4
Annotations link (/content/VSLNet_Code/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4) points to:
lrwxrwxrwx 1 root root 65 Jun 25 09:09 /content/VSLNet_Code/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4 -> /content/data/dataset/nlq_official_v1_vslbase_omnivore_glove_run4
Features target (FEATURE_BASE_DIR) exists?
drwxr-xr-x 2 root root 126976 Jun 25 09:08 /content/data/features/nlq_offic

## 3. Training

### 3.1. Launching the Training Script
This is the main step. We execute `main.py` with all the configured parameters from `vars.sh`. This command starts the training process for the defined model, features, and hyperparameters.

In [11]:
%%bash
source vars.sh

# --- Hyper-parameter Configuration ---
export DATALOADER_WORKERS=1
export NUM_WORKERS=2
export BATCH_SIZE=32
export DIM=128
export NUM_EPOCH=10
export MAX_POS_LEN=128
export INIT_LR=0.0025

# --- Construct TensorBoard Log Name ---
export TB_LOG_NAME="${NAME}_bs${BATCH_SIZE}_dim${DIM}_epoch${NUM_EPOCH}_ilr${INIT_LR}"

# Create local directories for saving models and logs, if they don't exist
mkdir -p "$LOCAL_MODEL_DIR/$NAME"

echo "--- Starting Training ---"
echo "Experiment Name: $NAME"
echo "Model: $MODEL_NAME"
echo "Video Features: $VISUAL_FEATURE_TYPE (Dim: $VISUAL_FEATURE_DIM)"
echo "Text Encoder: $TEXT_ENCODER_TYPE"
echo "--------------------------"

python main.py \
    --task $TASK_NAME \
    --mode train \
    --predictor $TEXT_ENCODER_TYPE \
    --dim $DIM \
    --model_type $MODEL_NAME \
    --video_feature_dim $VISUAL_FEATURE_DIM \
    --max_pos_len $MAX_POS_LEN \
    --init_lr $INIT_LR \
    --epochs $NUM_EPOCH \
    --batch_size $BATCH_SIZE \
    --fv official \
    --num_workers $NUM_WORKERS \
    --data_loader_workers $DATALOADER_WORKERS \
    --model_dir "$LOCAL_MODEL_DIR/$NAME" \
    --eval_gt_json "$LOCAL_VAL_SPLIT" \
    --log_to_tensorboard $TB_LOG_NAME \
    --tb_log_freq 5 \
    --remove_empty_queries_from train

--- Starting Training ---
Experiment Name: vslbase_omnivore_glove_run4
Model: vslbase
Video Features: omnivore (Dim: 1536)
Text Encoder: glove
--------------------------
Running with Namespace(save_dir='datasets', model_type='vslbase', resume_from_checkpoint=None, pretrain='no', task='nlq_official_v1_vslbase_omnivore_glove_run4', eval_gt_json='/content/data/ego4d_data/v1/annotations/nlq_val.json', fv='official', max_pos_len=128, num_workers=2, data_loader_workers=1, word_size=None, char_size=None, word_dim=300, video_feature_dim=1536, char_dim=50, dim=128, highlight_lambda=5.0, num_heads=8, drop_rate=0.2, predictor='glove', gpu_idx='0', seed=12345, mode='train', epochs=12, batch_size=32, num_train_steps=None, init_lr=0.002, clip_norm=1.0, warmup_proportion=0.0, extend=0.1, period=100, text_agnostic=False, video_agnostic=False, model_dir='/content/data/experiments/vslbase_omnivore_glove_run4', model_name='vslnet', suffix=None, log_to_tensorboard='vslbase_omnivore_glove_run4_bs32_dim128_

2025-06-25 09:09:36.753591: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-25 09:09:36.770766: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750842576.792187    7208 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750842576.798901    7208 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-25 09:09:36.820518: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

## 4. Save Results to Google Drive (Optional)
After the training is complete, the model checkpoints and prediction files are stored in the local Colab environment. This final, optional step copies the entire experiment folder from the local storage to our specified directory on Google Drive for permanent storage.

In [None]:
%%bash
source vars.sh

# Source directory (local)
SOURCE_DIR="$LOCAL_MODEL_DIR/$NAME"

# Destination directory (on Google Drive)
DEST_DIR="$MODEL_BASE_DIR"

# Check if the local experiment directory exists
if [ -d "$SOURCE_DIR" ]; then
  echo "Copying results from $SOURCE_DIR to $DEST_DIR..."
  # Create the base destination directory on Drive if it doesn't exist
  mkdir -p "$DEST_DIR"
  # Copy the entire experiment folder recursively
  cp -r "$SOURCE_DIR" "$DEST_DIR"
  echo "Copy complete!"
  echo "You can find your results in: $DEST_DIR/$NAME"
else
  echo "ERROR: Source directory $SOURCE_DIR not found. Was the training completed?"
fi