# 02 - Augmentation, Pre-training, and Fine-tuning Pipeline

## Notebook Overview

This notebook implements and evaluates a complete pipeline for enhancing a Natural Language Queries (NLQ) model through data augmentation. The entire workflow, central to our project's extension, is divided into three major phases:

1.  **Phase I: LLM-Powered Data Augmentation:** We begin by leveraging a Large Language Model (LLM) to generate a new, synthetic training dataset. Starting from the timestamped narrations in Ego4D, we create NLQ-style questions and automatically associate them with precise temporal ground-truth windows. This phase includes a robust data filtering and validation process to ensure the quality of the synthetic data.

2.  **Phase II: Pre-training on Augmented Data:** The newly generated dataset is used to pre-train a baseline NLQ model (VSLNet). The goal of this phase is to teach the model the fundamental patterns of egocentric question-answering on a large and diverse set of synthetic examples, providing it with a powerful head start before it sees any human-annotated data.

3.  **Phase III: Fine-tuning on Official Data:** Finally, the model pre-trained on our synthetic data is fine-tuned on the official `nlq_train.json` dataset. This step adapts the generalized knowledge acquired during pre-training to the specific distribution and nuances of the official benchmark data. The ultimate goal is to demonstrate that this pre-training/fine-tuning strategy improves performance compared to training on the official data alone.

## 1. Environment and Data Setup
This initial section handles all the necessary setup to prepare our Colab environment. We will mount Google Drive, clone the model repository, install dependencies, and unpack the dataset into the local runtime for fast access.

### 1.1. Mount Google Drive
We begin by mounting Google Drive to access our datasets. Step needed only to upload ego4d_data.zip and to save results for persistency.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### 1.2. Clone Model Repository and Set Directory
Next, we clone our `VSLNet_Code` repository, a modified version of the baseline provided for the project. After we set it as the main working directory for this notebook. This allows us to call scripts directly.

In [2]:
%%bash
# Clone the repository (if it doesn't already exist)
if [ ! -d "VSLNet_Code" ]; then
  git clone https://github.com/pietrogiancristofaro2001/ego4d-nlq-project.git
  # We only need the VSLNet_Code folder
  mv ego4d-nlq-project/VSLNet_Code .
  rm -rf ego4d-nlq-project
  echo "Repository cloned successfully."
else
  echo "Repository already exists."
fi

Repository cloned successfully.


Cloning into 'ego4d-nlq-project'...


In [3]:
# Change the notebook's working directory
%cd VSLNet_Code


/content/VSLNet_Code


### 1.3. Configure Environment for Augmentation and Pre-training
This is the main control cell for the first two phases of our project. It generates a `vars.sh` file **inside the current directory (`VSLNet_Code/`)**. This script defines all paths and parameters needed for data augmentation and for the subsequent pre-training run.

In [27]:
# --- Main Configuration ---
#We use our best model configuration for data augumentation, but in case we can change just modifying these parameters
PRETRAIN_MODEL_USED = "vslnet"  # Options: "vslnet", "vslbase"
PRETRAIN_FEATURE_TYPE = "egovlp" # Options: "egovlp", "omnivore"
PRETRAIN_TEXT_ENCODER = "bert"   # Options: "bert", "glove"
RUN_NUMBER = 2 #useful to distnguish different experiments

# --- Auto-generated settings based on configuration ---
if PRETRAIN_FEATURE_TYPE == "egovlp":
    feature_dir_name = "egovlp_fp16"
    visual_feature_dim = 256
elif PRETRAIN_FEATURE_TYPE == "omnivore":
    feature_dir_name = "omnivore_video_swinl_fp16"
    visual_feature_dim = 1536
else:
    raise ValueError("Invalid FEATURE_TYPE selected.")

pretrain_experiment_name = f"pretrain_{PRETRAIN_MODEL_USED}_{PRETRAIN_FEATURE_TYPE}_{PRETRAIN_TEXT_ENCODER}_run{RUN_NUMBER}"

# --- vars.sh content ---
vars_sh_content = f"""
#!/bin/bash

# --- I. SHARED PATH CONFIGURATION ---
export FEATURE_SOURCE_ZIP_PATH=/content/drive/MyDrive/EgoVisionProject/Data
export DRIVE_ZIP_FILENAME=ego4d_data.zip
export LOCAL_DATA_ROOT=/content/data
export EXPERIMENTS_BASE_DIR=$LOCAL_DATA_ROOT/experiments

# --- II. DATA AUGMENTATION & PRE-TRAINING SHARED PATHS ---
export LOCAL_ANNOTATIONS_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/annotations
export AUGMENTED_JSON_PATH=$LOCAL_ANNOTATIONS_DIR/nlq_train_augmented_3.json
export NARRATION_JSON_PATH=$LOCAL_ANNOTATIONS_DIR/narration.json
export LOCAL_VAL_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_val.json
export LOCAL_TEST_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_test_unannotated.json

# --- III. PRE-TRAINING SPECIFIC CONFIGURATION ---
export PRETRAIN_EXPERIMENT_NAME={pretrain_experiment_name}
export PRETRAIN_MODEL_NAME={PRETRAIN_MODEL_USED}
export PRETRAIN_VISUAL_FEATURE_TYPE={PRETRAIN_FEATURE_TYPE}
export PRETRAIN_TEXT_ENCODER_TYPE={PRETRAIN_TEXT_ENCODER}
export PRETRAIN_VISUAL_FEATURE_DIM={visual_feature_dim}
export PRETRAIN_FEATURE_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/{feature_dir_name}
export PRETRAIN_TRAIN_SPLIT=$AUGMENTED_JSON_PATH
export PRETRAIN_DATASET_DIR=$LOCAL_DATA_ROOT/dataset/$PRETRAIN_EXPERIMENT_NAME
export PRETRAIN_FEATURE_DIR_PROC=$LOCAL_DATA_ROOT/features/$PRETRAIN_EXPERIMENT_NAME/official
export PRETRAINED_CHECKPOINT_PATH=$EXPERIMENTS_BASE_DIR/{pretrain_experiment_name}
"""

# Write the content to vars.sh in the current directory (VSLNet_Code/)
with open("vars.sh", "w") as f:
    f.write(vars_sh_content)


### 1.4. Install Dependencies
We install all required Python libraries from the repository's `requirements.txt`

In [5]:
%%bash
%%capture

pip install -r requirements.txt

Collecting submitit (from -r requirements.txt (line 7))
  Downloading submitit-1.5.3-py3-none-any.whl.metadata (7.9 kB)
Collecting terminaltables (from -r requirements.txt (line 9))
  Downloading terminaltables-3.1.10-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting bitsandbytes (from -r requirements.txt (line 16))
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->-r requirements.txt (line 3))
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.7

bash: line 1: fg: no job control


### 1.5. Extract Dataset from Google Drive
We use the variables defined in our `vars.sh` file to copy and extract the main dataset from Drive to the local Colab storage.

In [6]:
%%bash

source vars.sh

# Create local directory and extract data
mkdir -p "$LOCAL_DATA_ROOT"
DRIVE_ZIP_FILE_PATH="$FEATURE_SOURCE_ZIP_PATH/$DRIVE_ZIP_FILENAME"
LOCAL_TEMP_ZIP_FILE="/content/$DRIVE_ZIP_FILENAME"

if [ -f "$DRIVE_ZIP_FILE_PATH" ]; then
    echo "Copying $DRIVE_ZIP_FILENAME..."
    cp "$DRIVE_ZIP_FILE_PATH" "$LOCAL_TEMP_ZIP_FILE"
    echo "Extracting data..."
    unzip -o -q "$LOCAL_TEMP_ZIP_FILE" -d "$LOCAL_DATA_ROOT"
    rm "$LOCAL_TEMP_ZIP_FILE"
    echo "Data setup complete."
else
    echo "ERROR: File not found at $DRIVE_ZIP_FILE_PATH"
fi

Copying ego4d_data.zip...
Extracting data...
Data setup complete.


### 1.6. Load Metadata and Create Valid Narration Groups
This cell performs the core pre-computation. It loads all necessary annotation files, filters out videos that are present in the validation/test sets, creates helper maps for clips, calculates the `beta_map` (average time between narrations), useful to create the ground truth timestamps starting form the single timestamp of the annotations and finally constructs a list of all possible valid groups of `k` consecutive narrations.

In [None]:
import json
import os
import random
import uuid
from tqdm.auto import tqdm
import glob
import numpy as np

print("--- Starting Pre-computation and Filtering ---")

# Define all necessary paths relative to the repo root
repo_root = "/content"
local_data_root = os.path.join(repo_root, "data")
ego4d_json_path = os.path.join(local_data_root, 'ego4d_data', 'ego4d.json')
narration_path = os.path.join(local_data_root, 'ego4d_data', 'v1', 'annotations', 'narration.json')
val_json_path = os.path.join(local_data_root, 'ego4d_data', 'v1', 'annotations', 'nlq_val.json')
test_json_path = os.path.join(local_data_root, 'ego4d_data', 'v1', 'annotations', 'nlq_test_unannotated.json')
feature_dir_path = os.environ.get('PRETRAIN_FEATURE_DIR', os.path.join(local_data_root, 'ego4d_data/v1/egovlp_fp16'))

# Load core JSON files
print("Loading core JSON files...")
with open(ego4d_json_path, 'r') as f: ego4d_data = json.load(f)
with open(narration_path, 'r') as f: all_narrations_data = json.load(f)
print("Files loaded successfully.")


# 1. Exclude videos from val/test sets
print("\nFiltering out validation/test set videos...")
excluded_video_uids = set()
try:
    with open(val_json_path, 'r') as f: val_data = json.load(f)
    for video in val_data.get('videos', []): excluded_video_uids.add(video['video_uid'])
    with open(test_json_path, 'r') as f: test_data = json.load(f)
    for video in test_data.get('videos', []): excluded_video_uids.add(video['video_uid'])
    print(f"Found {len(excluded_video_uids)} unique videos to exclude.")
except FileNotFoundError:
    print(f"Warning: Could not find val/test JSON files.")


# 2. Check for existing features
print("\nFiltering out videos without pre-extracted features...")
existing_video_ids = {os.path.basename(f).split('.')[0] for f in glob.glob(os.path.join(feature_dir_path, '*.pt'))}
print(f"Found {len(existing_video_ids)} videos with features.")


# 3. Create a lookup map for clips for efficient access
print("\nCreating clip lookup maps...")
all_clips_map = {clip['clip_uid']: clip for clip in ego4d_data.get('clips', [])}
video_to_clips_map = {}
for clip in ego4d_data.get('clips', []):
    vid_uid = clip.get('video_uid')
    if vid_uid not in video_to_clips_map: video_to_clips_map[vid_uid] = []
    video_to_clips_map[vid_uid].append(clip)
print("Lookup maps created.")


# 4. Pre-compute Beta map (avg. time between narrations per video)
print("\nPre-computing beta map...")
video_to_beta_map = {}
for video_uid, video_content in all_narrations_data.items():
    if video_uid in excluded_video_uids or video_uid not in existing_video_ids: continue
    narrations_list = video_content.get("narration_pass_1", {}).get("narrations", [])
    if len(narrations_list) < 2: continue
    narrations_list.sort(key=lambda x: x['timestamp_sec'])
    diffs = [narrations_list[i+1]['timestamp_sec'] - narrations_list[i]['timestamp_sec'] for i in range(len(narrations_list)-1)]
    positive_diffs = [d for d in diffs if d > 0]
    if positive_diffs: video_to_beta_map[video_uid] = np.mean(positive_diffs)
print("Beta map computed.")


# 5. Create all possible valid narration groups
print("\nConstructing valid narration groups...")
k_narrations = 5
all_valid_groups = []
for video_uid, video_content in tqdm(all_narrations_data.items(), desc="Processing Videos"):
    # Additional filter: process only videos for which we have a beta value
    if video_uid not in video_to_beta_map: continue

    clips_for_this_video = video_to_clips_map.get(video_uid, [])
    narrations_list = video_content.get("narration_pass_1", {}).get("narrations", [])
    if len(narrations_list) < k_narrations: continue

    narrations_list.sort(key=lambda x: x['timestamp_sec'])

    for i in range(len(narrations_list) - k_narrations + 1):
        current_group = narrations_list[i : i + k_narrations]
        group_start_time = current_group[0]['timestamp_sec']
        group_end_time = current_group[-1]['timestamp_sec']

        # This is the key check: ensure the group is fully contained in a single parent clip
        parent_clip = next((c for c in clips_for_this_video if c['video_start_sec'] <= group_start_time and c['video_end_sec'] >= group_end_time), None)

        if parent_clip:
            all_valid_groups.append({
                "video_uid": video_uid,
                "narrations": current_group,
                "parent_clip_uid": parent_clip['clip_uid']
            })

print(f"\nPreprocessing complete. Found {len(all_valid_groups)} total valid groups.")

--- Starting Pre-computation and Filtering ---
Loading core JSON files...
Files loaded successfully.

Filtering out validation/test set videos...
Found 505 unique videos to exclude.

Filtering out videos without pre-extracted features...
Found 9611 videos with features.

Creating clip lookup maps...
Lookup maps created.

Pre-computing beta map...
Beta map computed.

Constructing valid narration groups...


Processing Videos:   0%|          | 0/9645 [00:00<?, ?it/s]


Preprocessing complete. Found 591181 total valid groups.


## 2. Timestamp Window Analysis & Debugging
This is a critical step. Before running the expensive LLM, we must ensure our logic for creating timestamp windows is robust. In this section, we will analyze the `window_duration` calculation and verify that it produces valid, non-collapsing time intervals. We will experiment with the formula to find a stable configuration. REMOVE BEFORE FINAL SUBMISSION

In [None]:
# --- DEBUGGING SCRIPT ---
# This cell now only performs the analysis, assuming all data structures were created above.

print("--- Starting Timestamp Debugging Analysis ---")

# Parameters from the EgoVLP paper
alpha = 4.9
# NEW PARAMETER: Let's define a minimum duration to prevent windows from collapsing.
MIN_WINDOW_DURATION_SEC = 1.0

# --- Analysis Loop ---
num_groups_to_inspect = 10 # Let's inspect a few random groups
successful_windows = 0
total_narrations_inspected = 0

random.shuffle(all_valid_groups) # Process in random order

for group_data in all_valid_groups[:num_groups_to_inspect]:
    video_uid = group_data['video_uid']
    parent_clip_uid = group_data['parent_clip_uid']
    parent_clip_info = all_clips_map.get(parent_clip_uid)
    beta_i = video_to_beta_map.get(video_uid)

    if not parent_clip_info or not beta_i: continue

    print(f"\n--- Inspecting Group from Video: {video_uid} | Parent Clip: {parent_clip_uid} ---")
    print(f"Parent Clip Boundaries: [{parent_clip_info['video_start_sec']:.2f}, {parent_clip_info['video_end_sec']:.2f}] | Avg. narration gap (beta): {beta_i:.2f}s")

    for narration_obj in group_data["narrations"]:
        total_narrations_inspected += 1
        t_i = narration_obj['timestamp_sec']

        # Original calculation from EgoVLP
        calculated_duration = beta_i / alpha

        # Our new robust calculation
        window_duration = max(MIN_WINDOW_DURATION_SEC, calculated_duration)

        # Calculate and clip the window to the parent clip's boundaries
        start_time_abs = max(parent_clip_info['video_start_sec'], t_i - (window_duration / 2))
        end_time_abs = min(parent_clip_info['video_end_sec'], t_i + (window_duration / 2))

        is_valid = "✅ VALID" if start_time_abs < end_time_abs else "❌ INVALID"
        if start_time_abs < end_time_abs: successful_windows += 1

        print(f"  Narration at {t_i:.2f}s -> "
              f"Proposed duration: {calculated_duration:.2f}s -> "
              f"Final duration: {window_duration:.2f}s -> "
              f"Final Window: [{start_time_abs:.2f}, {end_time_abs:.2f}] -> {is_valid}")

print(f"\n--- Analysis Complete ---")
print(f"Successfully created {successful_windows} valid windows out of {total_narrations_inspected} narrations inspected.")

--- Starting Timestamp Debugging Analysis ---

--- Inspecting Group from Video: 3568bcef-48e3-43a1-827a-829c1392583f | Parent Clip: 265d9ba5-ea0b-4302-a0c3-c45313556339 ---
Parent Clip Boundaries: [990.00, 1470.00] | Avg. narration gap (beta): 2.30s
  Narration at 1020.17s -> Proposed duration: 0.47s -> Final duration: 1.00s -> Final Window: [1019.67, 1020.67] -> ✅ VALID
  Narration at 1023.85s -> Proposed duration: 0.47s -> Final duration: 1.00s -> Final Window: [1023.35, 1024.35] -> ✅ VALID
  Narration at 1025.02s -> Proposed duration: 0.47s -> Final duration: 1.00s -> Final Window: [1024.52, 1025.52] -> ✅ VALID
  Narration at 1027.01s -> Proposed duration: 0.47s -> Final duration: 1.00s -> Final Window: [1026.51, 1027.51] -> ✅ VALID
  Narration at 1030.14s -> Proposed duration: 0.47s -> Final duration: 1.00s -> Final Window: [1029.64, 1030.64] -> ✅ VALID

--- Inspecting Group from Video: ecb53665-3b24-416f-9a63-24030aed6f97 | Parent Clip: be4c4301-4b98-499f-8c98-dfe41f72df02 ---
Par

## 3. LLM-Powered Query Generation
Now that we have a robust method for creating valid timestamp windows, we can proceed with generating the synthetic queries. This section covers loading the Large Language Model (LLM), defining the prompt, and running the main generation loop.

### 3.1. Configure and Load the LLM
We will load the `Gemma-2b-it` model from Google. We use 4-bit quantization (`bitsandbytes`) to load the model efficiently on a Colab GPU.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Define the model ID and quantization configuration
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

print(f"Loading model: {model_id}...")
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
# The device_map="auto" argument will intelligently place model parts on GPU and CPU.
llm_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
print("LLM loaded successfully.")

Loading model: meta-llama/Meta-Llama-3-8B-Instruct...


tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

LLM loaded successfully.


### 3.2. Define the Generation Prompt
We create a function that encapsulates the prompt engineering strategy. This function takes a narration text and formats it into a detailed prompt, including role-playing, constraints, and few-shot examples based on the most common query templates from the NLQ benchmark, to guide the LLM's output effectively.

In [None]:
def create_batch_generation_prompt(narration_texts):
    """
    Creates a final, enhanced prompt for batch generation, meticulously designed
    to align with the Ego4D NLQ benchmark specifications. This prompt leverages
    role-playing, official templates, relational query requirements, and complex
    examples to guide the LLM in generating high-quality, diverse, and relevant questions.

    Args:
        narration_texts (list of str): A list containing 5 consecutive narration strings.

    Returns:
        str: A fully formatted prompt ready for the Llama 3 Instruct model.
    """
    # 1. Clean the input narrations and format them into a numbered list for the prompt.
    cleaned_narrations = []
    for i, text in enumerate(narration_texts):
        # --- CORRECTION ---
        # The user correctly pointed out that an if/elif structure would fail if
        # a narration contained both #C and #O markers.
        # This new sequential logic ensures all markers are processed correctly.
        # We start with the original text for each narration.
        processed_text = text
        # Replace egocentric markers first. Note the extra space in '#C C' for robustness.
        processed_text = processed_text.replace('#C C', 'I').replace('#C', 'I')
        # Replace other-person markers.
        processed_text = processed_text.replace('#O', '')
        # Final cleaning of any residual whitespace.
        cleaned_text = processed_text.strip()
        # --- END CORRECTION ---

        cleaned_narrations.append(f"{i+1}. {cleaned_text}")

    # Join the cleaned narrations into a single block.
    narration_block = "\n".join(cleaned_narrations)

    # 2. Construct the final, enhanced prompt using the official Llama 3 chat template.
    prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Your task is to act as an expert data annotator creating questions for the Ego4D Natural Language Query (NLQ) benchmark. Your goal is to generate questions that a person would ask to recall details from their past experiences.

You will be given 5 consecutive action narrations. You MUST generate exactly 5 corresponding questions.

---
### **Core Instructions**
1.  **Generate 5 Questions:** Create one question for each numbered narration.
2.  **Use Official Templates:** At least 4 of your questions MUST be based on the official Ego4D NLQ templates below. Be creative and vary the templates you use.
3.  **Create 1 Relational Query:** At least 1 of your questions MUST create a temporal relationship between two of the narrations using "before" or "after".
4.  **Hide the Answer:** The question must not contain the key information from the narration (the object, the place, the method).
5.  **First-Person Perspective:** If the narration uses "I", the question MUST be in the first person.

---
### **Official Ego4D NLQ Templates to Use**
**OBJECTS:**
- "Where is object X before / after event Y?"
- "Where is object X?"
- "What did I put in X?"
- "How many X's?"
- "What X did I Y?"
- "In what location did I see object X?"
- "State of an object" (e.g., "Is the door open?")

**PLACE:**
- "Where did I put X?"

**PEOPLE:**
- "Who did I interact with when I did activity X?"
- "Who did I talk to in location X?"

---
### **Example for Simple Actions**
**Input Narrations:**
1. I pick up a red bottle
2. I walk to the kitchen
3. I open the jar with a cloth
4. I place the jar on the counter
5. I take a spoon from the drawer

**Your Output Questions:**
1. What did I pick up?
2. Where did I walk to?
3. How did I open the jar?
4. What did I do after opening the jar?
5. What did I take from the drawer?

---
### **Example for a Complex Action**
**Input Narrations:**
1. I pick up the bottle of water and pour it into the glass.
2. ... (other narrations)

**Your Output Questions:**
1. Where did I pour the water?
2. ... (other questions)

---
### **Final Task**
Now, generate 5 high-quality questions for the following narrations.

**Input Narrations:**
{narration_block}

**Your Output Questions:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
    return prompt

In [7]:
# Questo comando è il più efficace per liberare spazio nel tuo caso
!rm -rf /root/.cache/huggingface/hub

# Dopodiché, controlla di nuovo lo spazio per vedere il miglioramento
!df -h

Filesystem      Size  Used Avail Use% Mounted on
overlay         113G   73G   41G  65% /
tmpfs            64M     0   64M   0% /dev
shm              41G     0   41G   0% /dev/shm
/dev/root       2.0G  1.2G  775M  61% /usr/sbin/docker-init
/dev/sda1       119G   98G   22G  82% /kaggle/input
tmpfs            42G  1.2M   42G   1% /var/colab
tmpfs            42G     0   42G   0% /proc/acpi
tmpfs            42G     0   42G   0% /proc/scsi
tmpfs            42G     0   42G   0% /sys/firmware
drive           100G   62G   39G  62% /content/drive


### 3.3. Setup to resume generation after faults
This cell initializes the environment for the query generation process.
It uses the drive path to ensure
all progress is saved to Google Drive, making the process fault-tolerant since the generation is very long.
It checks for previously saved progress and loads it, allowing the script
to be stopped and resumed without reprocessing data.

In [None]:
import pickle
import os
import glob

# --- Configuration ---
DRIVE_PROJECT_PATH = "/content/drive/MyDrive/EgoVisionProject/Data"

# Define the path where progress and checkpoints will be saved.
SAVE_DIR = os.path.join(DRIVE_PROJECT_PATH, "generation_progress")
os.makedirs(SAVE_DIR, exist_ok=True)
print(f"Progress will be saved in: {SAVE_DIR}")

# Define the full paths for the checkpoint files.
PROCESSED_TRACKER_FILE = os.path.join(SAVE_DIR, "processed_groups_tracker.pkl")
ALL_DATA_FILE = os.path.join(SAVE_DIR, "all_generated_data.pkl")

# --- Resume Logic ---
# This block checks if a previous run was interrupted and loads its progress.

# Try to load the set of already processed group IDs. A 'set' is used for
# efficient O(1) checking of whether a group has been processed.
if os.path.exists(PROCESSED_TRACKER_FILE):
    print("Resuming from a previous session. Loading processed groups tracker...")
    with open(PROCESSED_TRACKER_FILE, 'rb') as f:
        processed_groups_tracker = pickle.load(f)
    print(f"Found {len(processed_groups_tracker)} groups that were already processed.")
else:
    # If no tracker file exists, start with an empty set. This happens on the very first run.
    print("Starting a new generation session. Initializing tracker.")
    processed_groups_tracker = set()

# Try to load the list of data that was already generated in previous runs.
if os.path.exists(ALL_DATA_FILE):
    print("Loading previously generated data...")
    with open(ALL_DATA_FILE, 'rb') as f:
        all_generated_data = pickle.load(f)
    print(f"Loaded {len(all_generated_data)} previously generated annotation groups.")
else:
    # If no data file exists, start with an empty list.
    print("No previously generated data found.")
    all_generated_data = []

# This list will be appended to during the generation process.
generated_data_groups = all_generated_data

Progress will be saved in: /content/drive/MyDrive/EgoVisionProject/Data/generation_progress
Resuming from a previous session. Loading processed groups tracker...
Found 2397 groups that were already processed.
Loading previously generated data...
Loaded 2397 previously generated annotation groups.


### 3.4. Run the Generation Loop
This cell executes the core data augmentation task. It is engineered to be resilient and fault-tolerant, making it suitable for long execution times in Google Colab.

The loop performs the following key functions:

- Resumes from Checkpoints: It uses a tracker file to identify and skip any narration groups that have already been processed, ensuring no work is duplicated across sessions.
- Batch Generation: It processes narrations in batches to efficiently generate queries using the pre-configured LLM as mentioned in the project details.
- Robust Output Parsing: A dedicated cleaning function parses the model's raw output, removing formatting artifacts to ensure the final queries are clean.
- Periodic Checkpointing: The script automatically saves its complete state (both the generated data and the progress tracker) to Google Drive at regular intervals. This prevents significant data loss from potential runtime disconnections or timeouts.

In [None]:
# ==============================================================================
# CELL 2: RESILIENT GENERATION LOOP (with Automatic Goal Calculation)
#
# Description:
# This final version automatically calculates how many batches are left to reach
# your overall target. You only need to set the OVERALL_TARGET_BATCHES
# variable once. The script will then figure out the goal for the current
# session based on your saved progress.
# ==============================================================================
import re
import logging
from transformers import logging as transformers_logging

# --- Robust Cleaning Function ---
def clean_query_final(query_text):
    """
    A robust function to clean generated query text from the LLM.
    It removes potential prefixes like "**OBJECTS:**" and surrounding quotes.
    """
    text = query_text.strip()
    text = re.sub(r'^\s*\*\*.*?\*\*:\s*', '', text) # Remove prefixes like **WORD:**
    text = text.strip('\'"') # Remove surrounding quotes
    return text

# --- Configuration for this Run ---
OVERALL_TARGET_BATCHES = 2500
BATCH_SIZE = 5
SAVE_CHECKPOINT_FREQUENCY = 50 # How often to save progress to disk.

# --- Initialize alpha ---
# This value is based on the EgoVLP paper and is necessary for timestamp calculation.
alpha = 4.9
# This parameter prevents timestamp windows from collapsing to zero duration.
MIN_WINDOW_DURATION_SEC = 1.0

# Suppress non-critical transformers warnings for a cleaner output.
transformers_logging.set_verbosity_error()

# --- Data Preparation & Automatic Goal Calculation ---
print("\n--- Preparing Batches for This Session ---")

# 1. Filter out already processed groups to avoid redundant work.
print(f"Total groups available in source data: {len(all_valid_groups)}")
print(f"Number of previously processed groups in tracker: {len(processed_groups_tracker)}")
unprocessed_groups = [
    group for group in all_valid_groups
    if f"{group['video_uid']}_{group['parent_clip_uid']}" not in processed_groups_tracker
]
print(f"Number of available groups to process: {len(unprocessed_groups)}")

# 2. Build new batches only from the UNPROCESSED groups.
random.shuffle(unprocessed_groups)
narration_batches = []
for group_data in unprocessed_groups:
    valid_narrations = [n for n in group_data["narrations"] if '#unsure' not in n['narration_text'].lower()]
    for i in range(0, len(valid_narrations), BATCH_SIZE):
        batch = valid_narrations[i:i+BATCH_SIZE]
        if len(batch) == BATCH_SIZE:
            narration_batches.append({
                "video_uid": group_data['video_uid'],
                "parent_clip_uid": group_data['parent_clip_uid'],
                "narrations_batch": batch,
                "group_id": f"{group_data['video_uid']}_{group_data['parent_clip_uid']}"
            })

# 3. AUTOMATICALLY CALCULATE THE GOAL FOR THIS SESSION
print(f"\nOverall project goal: {OVERALL_TARGET_BATCHES} batches.")
# Calculate how many batches are still needed to reach the overall goal.
remaining_to_reach_goal = OVERALL_TARGET_BATCHES - len(processed_groups_tracker)
remaining_to_reach_goal = max(0, remaining_to_reach_goal) # Ensure it's not negative.

# The number of batches to run is the smaller of what's needed and what's available.
total_batches_to_run = min(remaining_to_reach_goal, len(narration_batches))

if total_batches_to_run > 0:
    print(f"Calculated {remaining_to_reach_goal} batches remaining to hit target.")
    print(f"Will generate {total_batches_to_run} new batches in this session.")
else:
    print("Goal reached! No more batches need to be generated.")
print("----------------------------------------\n")

# --- Main Resilient Generation Loop ---
batches_processed_this_session = 0
if total_batches_to_run > 0:
    # The progress bar is now correctly initialized with the goal for this session.
    with tqdm(total=total_batches_to_run, desc="Generating New Batches") as pbar:
        # The loop now iterates over a slice of the narration_batches,
        # ensuring it only runs for the required number of iterations.
        for batch_data in narration_batches[:total_batches_to_run]:
            group_id = batch_data["group_id"]
            if group_id in processed_groups_tracker: continue

            try:
                # (Core generation logic is unchanged)
                narration_texts = [n['narration_text'] for n in batch_data['narrations_batch']]
                prompt = create_batch_generation_prompt(narration_texts)
                if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token
                inputs = tokenizer(prompt, return_tensors="pt").to(llm_model.device)
                prompt_token_length = inputs.input_ids.shape[1]
                outputs = llm_model.generate(**inputs, max_new_tokens=300, do_sample=False)
                response_tokens = outputs[0][prompt_token_length:]
                assistant_response = tokenizer.decode(response_tokens, skip_special_tokens=True).strip()
                generated_queries = re.findall(r'^\s*\d+\.\s*(.*)', assistant_response, re.MULTILINE)

                if len(generated_queries) > 0:
                    queries_for_this_group = []
                    # (Logic for processing and printing is unchanged)
                    print()
                    for i, query_text in enumerate(generated_queries):
                        narration_obj = batch_data['narrations_batch'][i]
                        clean_query = clean_query_final(query_text)
                        print(f"[Input Narration {i+1}]: '{narration_obj['narration_text']}'")
                        print(f"  [Generated Query {i+1}]: '{clean_query}'")
                        t_i = narration_obj['timestamp_sec']
                        parent_clip_info = all_clips_map.get(batch_data['parent_clip_uid'])
                        beta_i = video_to_beta_map.get(batch_data['video_uid'])
                        calculated_duration = beta_i / alpha
                        window_duration = max(MIN_WINDOW_DURATION_SEC, calculated_duration)
                        start_time_abs = max(parent_clip_info['video_start_sec'], t_i - (window_duration / 2))
                        end_time_abs = min(parent_clip_info['video_end_sec'], t_i + (window_duration / 2))
                        if start_time_abs >= end_time_abs: continue
                        queries_for_this_group.append({
                            "query": clean_query, "template": "LLM-Generated-Llama3",
                            "video_start_sec": start_time_abs, "video_end_sec": end_time_abs,
                            "clip_start_sec": start_time_abs - parent_clip_info['video_start_sec'],
                            "clip_end_sec": end_time_abs - parent_clip_info['video_start_sec']
                        })

                    if queries_for_this_group:
                        generated_data_groups.append({
                            "video_uid": batch_data['video_uid'], "parent_clip_uid": batch_data['parent_clip_uid'],
                            "language_queries": queries_for_this_group
                        })

                    processed_groups_tracker.add(group_id)
                    batches_processed_this_session += 1
                    pbar.update(1)

                    # --- Save Progress Checkpoint ---
                    if batches_processed_this_session % SAVE_CHECKPOINT_FREQUENCY == 0:
                        print(f"\n--- Saving checkpoint after {batches_processed_this_session} batches this session ---")
                        with open(ALL_DATA_FILE, 'wb') as f: pickle.dump(generated_data_groups, f)
                        with open(PROCESSED_TRACKER_FILE, 'wb') as f: pickle.dump(processed_groups_tracker, f)
                        print("--- Checkpoint saved successfully! ---")

            except Exception as e:
                print(f"\nAn error occurred: {e}")
                continue

# --- Final Save ---
# Always save one last time at the end of the run to capture any remaining batches.
print("\n--- Process finished. Performing final save. ---")
with open(ALL_DATA_FILE, 'wb') as f: pickle.dump(generated_data_groups, f)
with open(PROCESSED_TRACKER_FILE, 'wb') as f: pickle.dump(processed_groups_tracker, f)
print("--- Final data saved successfully! ---")

# Restore the default logger verbosity for other cells in the notebook.
transformers_logging.set_verbosity_warning()
print(f"\nTotal annotations now generated across all sessions: {len(generated_data_groups)}")


--- Preparing Batches for This Session ---
Total groups available in source data: 591181
Number of previously processed groups in tracker: 2397
Number of available groups to process: 342871

Overall project goal: 2500 batches.
Calculated 103 batches remaining to hit target.
Will generate 103 new batches in this session.
----------------------------------------



Generating New Batches:   0%|          | 0/103 [00:00<?, ?it/s]


[Input Narration 1]: '#C C holds the solar system picture'
  [Generated Query 1]: 'What did I hold?'
[Input Narration 2]: '#C C puts the solar system picture on the wall'
  [Generated Query 2]: 'Where did I put the solar system picture?'
[Input Narration 3]: '#C C rotates the solar system picture on the wall'
  [Generated Query 3]: 'How did I manipulate the solar system picture on the wall?'
[Input Narration 4]: '#C C sticks the solar system picture on the wall'
  [Generated Query 4]: 'What did I do to the solar system picture after putting it on the wall?'
[Input Narration 5]: '#C C looks around.'
  [Generated Query 5]: 'What did I do after rotating the solar system picture on the wall?'

[Input Narration 1]: '#C C flips the dough with her right hand.'
  [Generated Query 1]: 'What did I do with the dough using my right hand?'
[Input Narration 2]: '#C C rolls out the dough with the rolling pin with both hands.'
  [Generated Query 2]: 'Where did I use the rolling pin?'
[Input Narration

### 3.4. Format and Save the Augmented Dataset
Finally, we load the collected data from the final checkpoint file, we convert it into the required JSON format and save it to a file. This file can then be used as input for the pre-training phase.


In [None]:
import json
import uuid

# --- Configuration ---
JSON_PATH = "/content/data/ego4d_data/v1/annotations/nlq_train_augmented_3.json"  #augumented json path

# Ensure the directory for the final JSON file exists.
os.makedirs(os.path.dirname(JSON_PATH), exist_ok=True)

# --- Main Logic ---
# Load the complete generated dataset from the final checkpoint file.
if os.path.exists(ALL_DATA_FILE):
    print(f"Loading final dataset from {ALL_DATA_FILE}")
    with open(ALL_DATA_FILE, 'rb') as f:
        final_generated_data = pickle.load(f)
    print(f"Loaded {len(final_generated_data)} total annotation groups.")

    # --- JSON Conversion Logic ---
    # This logic is based on your provided script for creating the final JSON structure.
    print("\nConverting annotation blocks to the final JSON format...")
    final_output = {"version": "1.0", "description": "Augmented NLQ dataset - Generated with Llama 3", "videos": []}
    output_videos_map = {}

    for datum in tqdm(final_generated_data, desc="Final Conversion"):
        video_uid = datum['video_uid']
        parent_clip_uid = datum['parent_clip_uid']
        parent_clip_info = all_clips_map.get(parent_clip_uid)
        if not parent_clip_info: continue

        if video_uid not in output_videos_map:
            output_videos_map[video_uid] = {"video_uid": video_uid, "clips": []}
        video_entry = output_videos_map[video_uid]

        output_clip_entry = next((c for c in video_entry["clips"] if c["clip_uid"] == parent_clip_uid), None)
        if not output_clip_entry:
            output_clip_entry = {
                "clip_uid": parent_clip_uid,
                "video_start_sec": parent_clip_info['video_start_sec'],
                "video_end_sec": parent_clip_info['video_end_sec'],
                "annotations": []
            }
            video_entry["clips"].append(output_clip_entry)

        # Each group of generated queries becomes a new annotation block.
        new_annotation_block = {
            "annotation_uid": str(uuid.uuid4()),
            "language_queries": datum['language_queries']
        }
        output_clip_entry["annotations"].append(new_annotation_block)

    final_output['videos'] = list(output_videos_map.values())

    # Save the final JSON file to the path defined in your configuration.
    with open(JSON_PATH, 'w') as f:
        json.dump(final_output, f, indent=2)

    print(f"\nProcess complete! Final augmented dataset saved to: {JSON_PATH}")
else:
    print("Error: Could not find the final data file. Please run the generation cell (Cell 2) first.")

Loading final dataset from /content/drive/MyDrive/EgoVisionProject/Data/generation_progress/all_generated_data.pkl
Loaded 2499 total annotation groups.

Converting annotation blocks to the final JSON format...


Final Conversion:   0%|          | 0/2499 [00:00<?, ?it/s]


Process complete! Final augmented dataset saved to: /content/data/ego4d_data/v1/annotations/nlq_train_augmented_3.json


### 3.5. Save Augmented Dataset to Google Drive (Optional)
As a final step for the data augmentation phase, this optional cell copies the generated `nlq_train_augmented.json` file from the local Colab storage to our specified folder on Google Drive.

In [None]:
%%bash
source vars.sh

# The source file is the local path where we saved our augmented data
SOURCE_FILE="$AUGMENTED_JSON_PATH"

# The destination directory on Google Drive. We can reuse the path where the original data zip is located.
DEST_DIR="$FEATURE_SOURCE_ZIP_PATH"

# Check if the source file actually exists before trying to copy
if [ -f "$SOURCE_FILE" ]; then
  echo "Copying augmented dataset from:"
  echo "$SOURCE_FILE"
  echo "to Google Drive directory:"
  echo "$DEST_DIR"

  # Ensure the destination directory exists
  mkdir -p "$DEST_DIR"

  # Copy the file
  cp "$SOURCE_FILE" "$DEST_DIR"

  echo -e "\n Copy complete!"
  echo "You can now find your file in your Google Drive."
else
  echo " ERROR: Source file $SOURCE_FILE not found."
  echo "Please ensure the previous cell (3.4) ran successfully and created the file."
fi

Copying augmented dataset from:
/content/data/ego4d_data/v1/annotations/nlq_train_augmented_3.json
to Google Drive directory:
/content/drive/MyDrive/EgoVisionProject/Data

 Copy complete!
You can now find your file in your Google Drive.


### 3.6. Data Cleaning
As a final step for the data augmentation phase, this cell clean the generated `nlq_train_augmented.json` file creating a new one with ...

In [10]:
# ==============================================================================
# FINAL DATASET CLEANING SCRIPT
#
# Description:
# This script loads the final generated JSON file and applies a robust
# cleaning function specifically designed to remove prefixes like
# "**STATE OF AN OBJECT:**" and any surrounding quotes. It then saves
# the cleaned data to a new file, ensuring the dataset is perfect for training.
# ==============================================================================
import json
from tqdm.auto import tqdm
import re
import os

# --- Configuration ---
# TODO: Make sure this path points to the JSON file you generated.
INPUT_JSON_PATH = "/content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_3.json"

# Define the path for the new, cleaned output file.
OUTPUT_JSON_PATH = "/content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_cleaned.json"

print(f"Loading dataset from: {INPUT_JSON_PATH}")
print(f"Filtered dataset will be saved to: {OUTPUT_JSON_PATH}")

# --- Bulletproof Cleaning Function ---
def final_cleaner(query_text):
    """
    A definitive function to clean prefixes. It finds the first colon ':'
    and takes everything after it. This is robust to any prefix variation.
    """
    # Find the position of the first colon.
    colon_index = query_text.find(':')

    # If a colon is found, the real query starts after it.
    if colon_index != -1:
        # Take the substring starting one character after the colon.
        text_after_colon = query_text[colon_index + 1:]
        # Aggressively strip any leading/trailing whitespace and surrounding quotes.
        final_query = re.sub(r'\*\* "','',text_after_colon)
    else:
        # If no colon is found, there is no prefix to clean. Use the original text.
        final_query = query_text

    return final_query

# --- Main Script ---
try:
    with open(INPUT_JSON_PATH, 'r') as f:
        original_data = json.load(f)
except FileNotFoundError:
    print(f"ERROR: Input file not found at {INPUT_JSON_PATH}")
    raise

# Create a new data structure for the filtered results.
filtered_data = {
    "version": original_data.get("version", "1.0"),
    "description": f"Cleaned version of {original_data.get('description', 'Augmented NLQ dataset')}",
    "videos": []
}

# Counters for tracking the changes.
total_queries = 0
queries_cleaned = 0

print("\nStarting final cleaning process...")
for video in tqdm(original_data.get('videos', []), desc="Processing Videos"):
    new_clips = []
    for clip in video.get('clips', []):
        new_annotations = []
        for annotation in clip.get('annotations', []):

            # Create a deep copy to avoid modifying the list while iterating
            cleaned_language_queries = []

            for query_obj in annotation.get('language_queries', []):
                total_queries += 1
                original_query_text = query_obj['query']

                # Apply the definitive cleaning function
                cleaned_query_text = final_cleaner(original_query_text)

                # Check if any change was made
                if original_query_text != cleaned_query_text:
                    queries_cleaned += 1

                # Update the query object with the cleaned text
                new_query_obj = query_obj.copy()
                new_query_obj['query'] = cleaned_query_text
                cleaned_language_queries.append(new_query_obj)

            # Only keep the annotation block if it still contains queries.
            if cleaned_language_queries:
                new_annotation = annotation.copy()
                new_annotation['language_queries'] = cleaned_language_queries
                new_annotations.append(new_annotation)

        if new_annotations:
            new_clip = clip.copy()
            new_clip['annotations'] = new_annotations
            new_clips.append(new_clip)

    if new_clips:
        new_video = video.copy()
        new_video['clips'] = new_clips
        filtered_data['videos'].append(new_video)

print("\n--- Cleaning Report ---")
print(f"Total queries processed: {total_queries}")
print(f"Number of queries cleaned: {queries_cleaned}")

# --- Save the Cleaned Data ---
print(f"\nSaving cleaned dataset to {OUTPUT_JSON_PATH}...")
with open(OUTPUT_JSON_PATH, 'w') as f:
    json.dump(filtered_data, f, indent=2)

print("Cleaning and saving complete!")

Loading dataset from: /content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_3.json
Filtered dataset will be saved to: /content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_cleaned.json

Starting final cleaning process...


Processing Videos:   0%|          | 0/1666 [00:00<?, ?it/s]


--- Cleaning Report ---
Total queries processed: 12495
Number of queries cleaned: 4085

Saving cleaned dataset to /content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_cleaned.json...
Cleaning and saving complete!


## 4. Setup for Pre-training on Augmented Data
Now that we have our augmented dataset, we need to prepare it for the VSLNet model. This involves running the `prepare_ego4d_dataset.py` script, which preprocesses the JSON file and the corresponding features into a format optimized for the data loader.

We will use the configuration variables (prefixed with `PRETRAIN_...`) that we defined in our main `vars.sh` file at the beginning of the notebook.

In [28]:
%%bash
source vars.sh

#useful in case we already have obtained the augmented train.json in another run otherwise comment this row
AUGMENTED_JSON_PATH="/content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_cleaned.json"

echo "Creating output directories for processed pre-training data..."
mkdir -p "$PRETRAIN_DATASET_DIR"
mkdir -p "$PRETRAIN_FEATURE_DIR_PROC"

echo "Running data preparation script for the pre-training phase..."
# Run the scriptto prepare the dataset
python utils/prepare_ego4d_dataset.py \
    --input_train_split "$AUGMENTED_JSON_PATH" \
    --input_val_split "$LOCAL_VAL_SPLIT" \
    --input_test_split "$LOCAL_TEST_SPLIT" \
    --video_feature_read_path "$PRETRAIN_FEATURE_DIR" \
    --clip_feature_save_path "$PRETRAIN_FEATURE_DIR_PROC" \
    --output_save_path "$PRETRAIN_DATASET_DIR"

echo "Pre-training data preparation finished."

Creating output directories for processed pre-training data...
Running data preparation script for the pre-training phase...
Reading [train]: /content/drive/MyDrive/EgoVisionProject/Data/nlq_train_augmented_cleaned.json
# train: 12495
Writing [train]: /content/data/dataset/pretrain_vslnet_egovlp_bert_run2/train.json
Reading [val]: /content/data/ego4d_data/v1/annotations/nlq_val.json
# val: 3874
Writing [val]: /content/data/dataset/pretrain_vslnet_egovlp_bert_run2/val.json
Reading [test]: /content/data/ego4d_data/v1/annotations/nlq_test_unannotated.json
# test: 4004
Writing [test]: /content/data/dataset/pretrain_vslnet_egovlp_bert_run2/test.json
Pre-training data preparation finished.


Extracting features:   0%|          | 0/3160 [00:00<?, ?it/s]Extracting features:   1%|          | 18/3160 [00:00<00:17, 175.87it/s]Extracting features:   1%|          | 36/3160 [00:00<00:17, 175.54it/s]Extracting features:   2%|▏         | 68/3160 [00:00<00:13, 237.32it/s]Extracting features:   3%|▎         | 92/3160 [00:00<00:13, 222.22it/s]Extracting features:   4%|▍         | 121/3160 [00:00<00:12, 243.47it/s]Extracting features:   5%|▍         | 146/3160 [00:00<00:12, 234.57it/s]Extracting features:   5%|▌         | 170/3160 [00:00<00:15, 191.93it/s]Extracting features:   6%|▌         | 191/3160 [00:00<00:17, 169.35it/s]Extracting features:   7%|▋         | 210/3160 [00:01<00:17, 169.13it/s]Extracting features:   7%|▋         | 233/3160 [00:01<00:16, 181.92it/s]Extracting features:   8%|▊         | 252/3160 [00:01<00:16, 177.34it/s]Extracting features:   9%|▊         | 271/3160 [00:01<00:17, 167.02it/s]Extracting features:   9%|▉         | 291/3160 [00:01<00:16, 170.

### 4.2. Create Symbolic Links
The training scripts expect the data and feature directories to be in specific locations within the working directory. We create symbolic links (`ln -sfn`) to point from these expected locations to our actual data folders in the local Colab storage. This is a crucial step for the data preparation and training scripts to work properly.

In [29]:
%%bash

source vars.sh

CWD=$(pwd)
#Base directory for symbolic link generation
mkdir -p "$CWD/data/dataset"
# Create also the subdirectory $TASK_NAME below features
mkdir -p "$CWD/data/features/$PRETRAIN_EXPERIMENT_NAME"

# 1. Annotations link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/dataset/$PRETRAIN_EXPERIMENT_NAME"
ln -sfn "$PRETRAIN_DATASET_DIR" "$CWD/data/dataset/$PRETRAIN_EXPERIMENT_NAME"
echo "Annotations link: $CWD/data/dataset/$PRETRAIN_EXPERIMENT_NAME -> $PRETRAIN_DATASET_DIR"

# 2. Processed features link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/features/$PRETRAIN_EXPERIMENT_NAME/official"
ln -sfn "$PRETRAIN_FEATURE_DIR_PROC" "$CWD/data/features/$PRETRAIN_EXPERIMENT_NAME/official"
echo "Features link: $CWD/data/features/$PRETRAIN_EXPERIMENT_NAME/official -> $PRETRAIN_FEATURE_DIR_PROC"

echo "--- Setup completed. Checks below: ---"
echo "Annotations target PRETRAIN_DATASET_DIR exists?"
ls -ld "$PRETRAIN_DATASET_DIR"
echo "Annotations link $CWD/data/dataset/$PRETRAIN_EXPERIMENT_NAME points to:"
ls -ld "$CWD/data/dataset/$PRETRAIN_EXPERIMENT_NAME"

echo "Features target PRETRAIN_FEATURE_DIR_PROC exists?"
ls -ld "$PRETRAIN_FEATURE_DIR_PROC"
echo "Features link $CWD/data/features/$PRETRAIN_EXPERIMENT_NAME/official points to:"
ls -ld "$CWD/data/features/$PRETRAIN_EXPERIMENT_NAME/official"

Annotations link: /content/VSLNet_Code/data/dataset/pretrain_vslnet_egovlp_bert_run2 -> /content/data/dataset/pretrain_vslnet_egovlp_bert_run2
Features link: /content/VSLNet_Code/data/features/pretrain_vslnet_egovlp_bert_run2/official -> /content/data/features/pretrain_vslnet_egovlp_bert_run2/official
--- Setup completed. Checks below: ---
Annotations target PRETRAIN_DATASET_DIR exists?
drwxr-xr-x 2 root root 4096 Jun 24 15:53 /content/data/dataset/pretrain_vslnet_egovlp_bert_run2
Annotations link /content/VSLNet_Code/data/dataset/pretrain_vslnet_egovlp_bert_run2 points to:
lrwxrwxrwx 1 root root 54 Jun 24 16:01 /content/VSLNet_Code/data/dataset/pretrain_vslnet_egovlp_bert_run2 -> /content/data/dataset/pretrain_vslnet_egovlp_bert_run2
Features target PRETRAIN_FEATURE_DIR_PROC exists?
drwxr-xr-x 2 root root 212992 Jun 24 15:53 /content/data/features/pretrain_vslnet_egovlp_bert_run2/official
Features link /content/VSLNet_Code/data/features/pretrain_vslnet_egovlp_bert_run2/official points

## 5. Launch Pre-training
With the data prepared, we can now launch the pre-training script `main.py`. This script will train the VSLNet model from scratch using only our synthetic dataset. The resulting model checkpoint (the last checkpoint) will be saved locally and can then be optionally copied to Google Drive. This checkpoint will serve as the starting point for the final fine-tuning phase.

In [31]:
%%bash

source vars.sh

# --- Hyper-parameter Configuration for Pre-training ---
export DATALOADER_WORKERS=1
export NUM_WORKERS=2
export BATCH_SIZE=32
export DIM=128
export NUM_EPOCH=13 # Adjust epochs as needed for pre-training
export MAX_POS_LEN=128
export INIT_LR=0.0015 #for run 5 0.001 and 10 epochs

# --- Construct TensorBoard Log Name ---
export TB_LOG_NAME="${PRETRAIN_EXPERIMENT_NAME}_bs${BATCH_SIZE}_dim${DIM}_epoch${NUM_EPOCH}_ilr${INIT_LR}"


mkdir -p "$PRETRAINED_CHECKPOINT_PATH"

echo "--- Starting PRE-TRAINING ---"
echo "Experiment Name: $PRETRAIN_EXPERIMENT_NAME"
echo "Model: $PRETRAIN_MODEL_NAME"
echo "Video Features: $PRETRAIN_VISUAL_FEATURE_TYPE (Dim: $PRETRAIN_VISUAL_FEATURE_DIM)"
echo "Text Encoder: $PRETRAIN_TEXT_ENCODER_TYPE"
echo "Training Data: AUGMENTED"
echo "--------------------------"

#we add the config parameter --pretrain yes to obtain the last checkpoint and we keep the --eval_gt_json also if we don't need it to avoid errors in the script
python main.py \
    --task "$PRETRAIN_EXPERIMENT_NAME" \
    --mode train \
    --pretrain yes \
    --predictor "$PRETRAIN_TEXT_ENCODER_TYPE" \
    --dim $DIM \
    --video_feature_dim $PRETRAIN_VISUAL_FEATURE_DIM \
    --max_pos_len $MAX_POS_LEN \
    --init_lr $INIT_LR \
    --epochs $NUM_EPOCH \
    --batch_size $BATCH_SIZE \
    --fv official \
    --eval_gt_json "$LOCAL_VAL_SPLIT" \
    --num_workers $NUM_WORKERS \
    --data_loader_workers $DATALOADER_WORKERS \
    --model_dir "$PRETRAINED_CHECKPOINT_PATH" \
    --log_to_tensorboard "$TB_LOG_NAME" \
    --tb_log_freq 5 \
    --remove_empty_queries_from train


--- Starting PRE-TRAINING ---
Experiment Name: pretrain_vslnet_egovlp_bert_run2
Model: vslnet
Video Features: egovlp (Dim: 256)
Text Encoder: bert
Training Data: AUGMENTED
--------------------------
Running with Namespace(save_dir='datasets', model_type='vslnet', resume_from_checkpoint=None, pretrain='yes', task='pretrain_vslnet_egovlp_bert_run2', eval_gt_json='/content/data/ego4d_data/v1/annotations/nlq_val.json', fv='official', max_pos_len=128, num_workers=2, data_loader_workers=1, word_size=None, char_size=None, word_dim=300, video_feature_dim=256, char_dim=50, dim=128, highlight_lambda=5.0, num_heads=8, drop_rate=0.2, predictor='bert', gpu_idx='0', seed=12345, mode='train', epochs=10, batch_size=32, num_train_steps=None, init_lr=0.002, clip_norm=1.0, warmup_proportion=0.0, extend=0.1, period=100, text_agnostic=False, video_agnostic=False, model_dir='/content/data/experiments/pretrain_vslnet_egovlp_bert_run2', model_name='vslnet', suffix=None, log_to_tensorboard='pretrain_vslnet_ego

2025-06-24 16:17:28.292134: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-24 16:17:28.311504: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750781848.333835   34023 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750781848.340766   34023 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-24 16:17:28.364086: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

### 5.1. Save Pre-training Results to Google Drive (Optional)
This optional step copies the entire pre-training experiment folder (containing model checkpoints and logs) from the local Colab storage to our specified directory on Google Drive for permanent storage.

In [18]:
%%bash
# Source the main configuration file
source vars.sh

# Source directory (local) for the pre-training run
SOURCE_DIR="$LOCAL_DATA_ROOT/experiments/$PRETRAIN_EXPERIMENT_NAME"

# Destination directory (on Google Drive)
DEST_DIR="/content/drive/MyDrive/EgoVisionProject/Experiments"

if [ -d "$SOURCE_DIR" ]; then
  echo "Copying pre-training results from $SOURCE_DIR to $DEST_DIR..."
  mkdir -p "$DEST_DIR"
  cp -r "$SOURCE_DIR" "$DEST_DIR"
  echo "Copy complete!"
  echo "You can find your results in: $DEST_DIR/$PRETRAIN_EXPERIMENT_NAME"
else
  echo "ERROR: Source directory $SOURCE_DIR not found. Was the pre-training completed?"
fi

Copying pre-training results from /content/data/experiments/pretrain_vslnet_egovlp_bert_run1 to /content/drive/MyDrive/EgoVisionProject/Experiments...
Copy complete!
You can find your results in: /content/drive/MyDrive/EgoVisionProject/Experiments/pretrain_vslnet_egovlp_bert_run1


## 6. Setup for Fine-tuning
This section prepares the environment for the final fine-tuning phase. The key step here is to re-configure our `vars.sh` file. We will update it to:
1.  Define a new, unique name for the fine-tuning experiment.
2.  Point the training script to the **official `nlq_train.json`** dataset.
3.  Reference the path to the **pre-trained model checkpoint** that we just saved.

### 6.1. Configure the enviroment for fine-tuning
This is the main control cell for the last step of our project. It generates a `vars.sh` file **inside the current directory (`VSLNet_Code/`)**. This script defines all paths and parameters needed for fine-tuning.
Set BEST CHECKPOINT with the name of our last checkpoint obtained during pre-training step.

In [40]:
# --- Main Configuration ---
#We use our best model configuration already used for pre-training, but in case we can change just modifying parameters
FINETUNING_MODEL_USED = "vslnet"  # Options: "vslnet", "vslbase"
FINETUNING_FEATURE_TYPE = "egovlp" # Options: "egovlp", "omnivore"
FINETUNING_TEXT_ENCODER = "bert"   # Options: "bert", "glove"
RUN_NUMBER = 12

# --- Auto-generated settings based on configuration ---
if FINETUNING_FEATURE_TYPE == "egovlp":
    feature_dir_name = "egovlp_fp16"
    visual_feature_dim = 256
elif FINETUNING_FEATURE_TYPE == "omnivore":
    feature_dir_name = "omnivore_video_swinl_fp16"
    visual_feature_dim = 1536
else:
    raise ValueError("Invalid FEATURE_TYPE selected.")

finetuning_experiment_name = f"finetuning_{FINETUNING_MODEL_USED}_{FINETUNING_FEATURE_TYPE}_{FINETUNING_TEXT_ENCODER}_run{RUN_NUMBER}"

#pretrained model configuration
pretrain_model_used = "vslnet"
pretrain_feature_type = "egovlp"
pretrain_text_encoder = "bert"
pretrain_run_number = 1
pretrain_experiment_name = f"pretrain_{pretrain_model_used}_{pretrain_feature_type}_{pretrain_text_encoder}_run{pretrain_run_number}"

# --- vars.sh content: change experiments_base_dir if we already have the checkpoint on drive ---
vars_sh_content = f"""
#!/bin/bash

# --- I. SHARED PATH CONFIGURATION ---
export FEATURE_SOURCE_ZIP_PATH=/content/drive/MyDrive/EgoVisionProject/Data
export DRIVE_ZIP_FILENAME=ego4d_data.zip
export LOCAL_DATA_ROOT=/content/data
export EXPERIMENTS_BASE_DIR=$LOCAL_DATA_ROOT/experiments

# --- II. FINE-TUNING SHARED PATHS ---
export LOCAL_ANNOTATIONS_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/annotations
export LOCAL_TRAIN_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_train.json
export LOCAL_VAL_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_val.json
export LOCAL_TEST_SPLIT=$LOCAL_ANNOTATIONS_DIR/nlq_test_unannotated.json

# --- III. FINE-TUNING SPECIFIC CONFIGURATION ---
export FINETUNING_EXPERIMENT_NAME={finetuning_experiment_name}
export FINETUNING_MODEL_NAME={FINETUNING_MODEL_USED}
export FINETUNING_VISUAL_FEATURE_TYPE={FINETUNING_FEATURE_TYPE}
export FINETUNING_TEXT_ENCODER_TYPE={FINETUNING_TEXT_ENCODER}
export FINETUNING_VISUAL_FEATURE_DIM={visual_feature_dim}
export FINETUNING_FEATURE_DIR=$LOCAL_DATA_ROOT/ego4d_data/v1/{feature_dir_name}
export FINETUNING_DATASET_DIR=$LOCAL_DATA_ROOT/dataset/$FINETUNING_EXPERIMENT_NAME
export FINETUNING_FEATURE_DIR_PROC=$LOCAL_DATA_ROOT/features/$FINETUNING_EXPERIMENT_NAME/official
export BEST_CHECKPOINT=vslnet_5083.t7
export PRETRAINED_CHECKPOINT_PATH=$EXPERIMENTS_BASE_DIR/{pretrain_experiment_name}/vslnet_{pretrain_experiment_name}_official_128_bert/model/$BEST_CHECKPOINT
"""

# Write the content to vars.sh in the current directory (VSLNet_Code/)
with open("vars.sh", "w") as f:
    f.write(vars_sh_content)


In [41]:
%%bash

source vars.sh

echo "Creating output directories for processed pre-training data..."
mkdir -p "$FINETUNING_DATASET_DIR"
mkdir -p "$FINETUNING_FEATURE_DIR_PROC"

echo "Running data preparation script for the OFFICIAL training data..."
python utils/prepare_ego4d_dataset.py \
    --input_train_split "$LOCAL_TRAIN_SPLIT" \
    --input_val_split "$LOCAL_VAL_SPLIT" \
    --input_test_split "$LOCAL_TEST_SPLIT" \
    --video_feature_read_path "$FINETUNING_FEATURE_DIR" \
    --clip_feature_save_path "$FINETUNING_FEATURE_DIR_PROC" \
    --output_save_path "$FINETUNING_DATASET_DIR"
echo "Official data preparation finished."

Creating output directories for processed pre-training data...
Running data preparation script for the OFFICIAL training data...
Reading [train]: /content/data/ego4d_data/v1/annotations/nlq_train.json
# train: 11291
Writing [train]: /content/data/dataset/finetuning_vslnet_egovlp_bert_run12/train.json
Reading [val]: /content/data/ego4d_data/v1/annotations/nlq_val.json
# val: 3874
Writing [val]: /content/data/dataset/finetuning_vslnet_egovlp_bert_run12/val.json
Reading [test]: /content/data/ego4d_data/v1/annotations/nlq_test_unannotated.json
# test: 4004
Writing [test]: /content/data/dataset/finetuning_vslnet_egovlp_bert_run12/test.json
Official data preparation finished.


Extracting features:   0%|          | 0/1659 [00:00<?, ?it/s]Extracting features:   2%|▏         | 37/1659 [00:00<00:04, 352.58it/s]Extracting features:   4%|▍         | 73/1659 [00:00<00:04, 350.82it/s]Extracting features:   7%|▋         | 109/1659 [00:00<00:04, 341.22it/s]Extracting features:   9%|▊         | 144/1659 [00:00<00:04, 312.69it/s]Extracting features:  11%|█         | 176/1659 [00:00<00:04, 301.65it/s]Extracting features:  12%|█▏        | 207/1659 [00:00<00:05, 289.16it/s]Extracting features:  14%|█▍        | 237/1659 [00:00<00:07, 200.49it/s]Extracting features:  16%|█▌        | 261/1659 [00:01<00:07, 186.26it/s]Extracting features:  18%|█▊        | 292/1659 [00:01<00:06, 212.74it/s]Extracting features:  20%|█▉        | 325/1659 [00:01<00:05, 239.99it/s]Extracting features:  22%|██▏       | 368/1659 [00:01<00:04, 287.83it/s]Extracting features:  24%|██▍       | 404/1659 [00:01<00:04, 306.39it/s]Extracting features:  26%|██▋       | 437/1659 [00:01<00:04, 29

Create simbolic links

In [42]:
%%bash

source vars.sh

CWD=$(pwd)
#Base directory for symbolic link generation
mkdir -p "$CWD/data/dataset"
# Create also the subdirectory $TASK_NAME below features
mkdir -p "$CWD/data/features/$FINETUNING_EXPERIMENT_NAME"

# 1. Annotations link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/dataset/$FINETUNING_EXPERIMENT_NAME"
ln -sfn "$FINETUNING_DATASET_DIR" "$CWD/data/dataset/$FINETUNING_EXPERIMENT_NAME"
echo "Annotations link: $CWD/data/dataset/$FINETUNING_EXPERIMENT_NAME -> $FINETUNING_DATASET_DIR"

# 2. Processed features link

# Remove the previous link if it exists and create the new one
rm -f "$CWD/data/features/$FINETUNING_EXPERIMENT_NAME/official"
ln -sfn "$FINETUNING_FEATURE_DIR_PROC" "$CWD/data/features/$FINETUNING_EXPERIMENT_NAME/official"
echo "Features link: $CWD/data/features/$FINETUNING_EXPERIMENT_NAME/official -> $FINETUNING_FEATURE_DIR_PROC"

echo "--- Setup completed. Checks below: ---"
echo "Annotations target FINETUNING_DATASET_DIR exists?"
ls -ld "$FINETUNING_DATASET_DIR"
echo "Annotations link $CWD/data/dataset/$FINETUNING_EXPERIMENT_NAME points to:"
ls -ld "$CWD/data/dataset/$FINETUNING_EXPERIMENT_NAME"

echo "Features target FINETUNING_FEATURE_DIR_PROC exists?"
ls -ld "$FINETUNING_FEATURE_DIR_PROC"
echo "Features link $CWD/data/features/$FINETUNING_EXPERIMENT_NAME/official points to:"
ls -ld "$CWD/data/features/$FINETUNING_EXPERIMENT_NAME/official"

Annotations link: /content/VSLNet_Code/data/dataset/finetuning_vslnet_egovlp_bert_run12 -> /content/data/dataset/finetuning_vslnet_egovlp_bert_run12
Features link: /content/VSLNet_Code/data/features/finetuning_vslnet_egovlp_bert_run12/official -> /content/data/features/finetuning_vslnet_egovlp_bert_run12/official
--- Setup completed. Checks below: ---
Annotations target FINETUNING_DATASET_DIR exists?
drwxr-xr-x 2 root root 4096 Jun 24 17:00 /content/data/dataset/finetuning_vslnet_egovlp_bert_run12
Annotations link /content/VSLNet_Code/data/dataset/finetuning_vslnet_egovlp_bert_run12 points to:
lrwxrwxrwx 1 root root 57 Jun 24 17:00 /content/VSLNet_Code/data/dataset/finetuning_vslnet_egovlp_bert_run12 -> /content/data/dataset/finetuning_vslnet_egovlp_bert_run12
Features target FINETUNING_FEATURE_DIR_PROC exists?
drwxr-xr-x 2 root root 126976 Jun 24 17:00 /content/data/features/finetuning_vslnet_egovlp_bert_run12/official
Features link /content/VSLNet_Code/data/features/finetuning_vslnet

## 7. Launch Fine-tuning
With the environment re-configured, we first prepare the official dataset and then launch the `main.py` script. The critical difference in this run is the addition of the `--resume_from_checkpoint` argument, which loads the weights from our pre-trained model. We also typically use a lower learning rate for fine-tuning.

In [43]:
%%bash
# Source the final, correct configuration file for fine-tuning
source vars.sh

# --- Hyper-parameter Configuration for Fine-tuning ---
# It's common practice to use a smaller learning rate for fine-tuning
export INIT_LR=0.00045
# Fine-tuning often requires fewer epochs to converge
export NUM_EPOCH=9
# Other parameters can remain the same
export DATA_LOADER_WORKERS=1
export NUM_WORKERS=2
export BATCH_SIZE=32
export DIM=128
export MAX_POS_LEN=128


# The output directory for this run, using the unique fine-tuning name
FINETUNING_MODEL_SAVE_DIR="$EXPERIMENTS_BASE_DIR/$FINETUNING_EXPERIMENT_NAME"
mkdir -p "$FINETUNING_MODEL_SAVE_DIR"

echo "--- Starting FINE-TUNING ---"
echo "Experiment Name: $FINETUNING_EXPERIMENT_NAME"
echo "Model: $FINETUNING_MODEL_NAME"
echo "Features: $FINETUNING_VISUAL_FEATURE_TYPE"
echo "Loading pre-trained checkpoint from: $PRETRAINED_CHECKPOINT_PATH"
echo "-------------------------------------"

# --- Safety Check for Checkpoint ---
# Before launching a long training run, let's verify the checkpoint file exists
if [ ! -f "$PRETRAINED_CHECKPOINT_PATH" ]; then
    echo "❌ ERROR: Pre-trained checkpoint not found at the specified path."
    echo "Path: $PRETRAINED_CHECKPOINT_PATH"
    echo "Please check your 'vars.sh' configuration and ensure the file exists on your Google Drive."
    exit 1
fi

# --- Launch main.py ---
# This command uses the fine-tuning configuration and, most importantly,
# the --resume_from_checkpoint flag to load the pre-trained model.
python main.py \
    --task "$FINETUNING_EXPERIMENT_NAME" \
    --mode train \
    --predictor "$FINETUNING_TEXT_ENCODER_TYPE" \
    --dim $DIM \
    --video_feature_dim "$FINETUNING_VISUAL_FEATURE_DIM" \
    --max_pos_len $MAX_POS_LEN \
    --init_lr $INIT_LR \
    --epochs $NUM_EPOCH \
    --batch_size $BATCH_SIZE \
    --fv official \
    --num_workers $NUM_WORKERS \
    --data_loader_workers $DATA_LOADER_WORKERS \
    --model_dir "$FINETUNING_MODEL_SAVE_DIR" \
    --eval_gt_json "$LOCAL_VAL_SPLIT" \
    --remove_empty_queries_from train \
    --resume_from_checkpoint "$PRETRAINED_CHECKPOINT_PATH"



--- Starting FINE-TUNING ---
Experiment Name: finetuning_vslnet_egovlp_bert_run12
Model: vslnet
Features: egovlp
Loading pre-trained checkpoint from: /content/data/experiments/pretrain_vslnet_egovlp_bert_run1/vslnet_pretrain_vslnet_egovlp_bert_run1_official_128_bert/model/vslnet_5083.t7
-------------------------------------
Running with Namespace(save_dir='datasets', model_type='vslnet', resume_from_checkpoint='/content/data/experiments/pretrain_vslnet_egovlp_bert_run1/vslnet_pretrain_vslnet_egovlp_bert_run1_official_128_bert/model/vslnet_5083.t7', pretrain='no', task='finetuning_vslnet_egovlp_bert_run12', eval_gt_json='/content/data/ego4d_data/v1/annotations/nlq_val.json', fv='official', max_pos_len=128, num_workers=2, data_loader_workers=1, word_size=None, char_size=None, word_dim=300, video_feature_dim=256, char_dim=50, dim=128, highlight_lambda=5.0, num_heads=8, drop_rate=0.2, predictor='bert', gpu_idx='0', seed=12345, mode='train', epochs=10, batch_size=32, num_train_steps=None, i

2025-06-24 17:00:46.152840: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-24 17:00:46.172098: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750784446.194497   45968 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750784446.201189   45968 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-24 17:00:46.223706: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [23]:
%%bash
# Source the main configuration file
source vars.sh

# Source directory (local) for the pre-training run
SOURCE_DIR="$LOCAL_DATA_ROOT/experiments/$FINETUNING_EXPERIMENT_NAME"

# Destination directory (on Google Drive)
DEST_DIR="/content/drive/MyDrive/EgoVisionProject/Experiments"

if [ -d "$SOURCE_DIR" ]; then
  echo "Copying pre-training results from $SOURCE_DIR to $DEST_DIR..."
  mkdir -p "$DEST_DIR"
  cp -r "$SOURCE_DIR" "$DEST_DIR"
  echo "Copy complete!"
  echo "You can find your results in: $DEST_DIR/$FINETUNING_EXPERIMENT_NAME"
else
  echo "ERROR: Source directory $SOURCE_DIR not found"
fi

Copying pre-training results from /content/data/experiments/finetuning_vslnet_egovlp_bert_run1 to /content/drive/MyDrive/EgoVisionProject/Experiments...
Copy complete!
You can find your results in: /content/drive/MyDrive/EgoVisionProject/Experiments/finetuning_vslnet_egovlp_bert_run1
