# Ego4D Moments Benchmark (NLQ) Quickstart

Please set your resources to GPU (Runtime -> Change runtime type -> GPU).

This quickstart will show:
1. An overview of the training data
2. How to train the baseline (VSLNet)

To begin: add your **access keys** below, change your Runtime Type to **GPU**, and run cells **one by one** as you read through. This helps avoid timeouts since Colab gives more GPU cycles to interactive notebooks.

## Resources
- [Baseline Repo](https://github.com/EGO4D/episodic-memory/tree/main/NLQ/VSLNet)
- [Docs](https://ego4d-data.org/docs/benchmarks/episodic-memory/)
- [EvalAI Challenge](https://eval.ai/web/challenges/challenge-page/1629/overview)

##Mount Google Drive:

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Download Data and Setup Environment

### **Fill In Your Access Info Here**
If you don't have access and secret keys, first sign the Ego4D License at [ego4ddataset.com](https://ego4ddataset.com)

In [1]:
import os
os.environ['AWS_ACCESS_KEY_ID'] = "AKIATEEVKTGZMNKNYPXA"
os.environ['AWS_SECRET_ACCESS_KEY'] = "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"

### **Set up CLIs and Download Annotations + Repo**

In [2]:
# Download the AWS and Ego4D CLIs, then download the annotations locally
%%bash
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

# Set up the AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -o awscliv2.zip >/dev/null
sudo ./aws/install >/dev/null 2>&1
aws configure set aws_access_key_id "$AWS_ACCESS_KEY_ID" && aws configure set aws_secret_access_key "$AWS_SECRET_ACCESS_KEY"
rm "awscliv2.zip"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 59.3M  100 59.3M    0     0  81.4M      0 --:--:-- --:--:-- --:--:-- 81.4M


### Install the ego4d CLI and Download Data

In [3]:
# Set up the Ego4D CLI
!pip install ego4d

Collecting ego4d
  Downloading ego4d-1.7.3.tar.gz (94 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/94.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.5/94.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting boto3 (from ego4d)
  Downloading boto3-1.40.16-py3-none-any.whl.metadata (6.7 kB)
Collecting dataclasses_json (from ego4d)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting iopath (from ego4d)
  Downloading iopath-0.1.10.tar.gz (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting botocore<1.41.0,>=1.40.16 (from boto3->ego4d)
  Downloading bo

In [4]:
# Download annotations
!ego4d --output_directory="/content/ego4d_data/" \
       --datasets annotations \
       --benchmarks nlq \
       --version v1 \
       -y

Datasets to download: {'annotations'}
Download Path: /content/ego4d_data/v1
Downloading Ego4D metadata json..
Ego4D Metadata: /content/ego4d_data/ego4d.json
Checking requested datasets and versions...
Created download directory for version 'v1' of dataset: 'annotations' at: /content/ego4d_data/v1/annotations
Benchmarks specified but ignored without a benchmarks field in manifest.
Retrieving object metadata from S3...
100% 31/31 [00:00<00:00, 704.76object/s]
Checking if latest file versions are already downloaded...
100% 31/31 [00:00<00:00, 70.39file/s]
No existing videos to filter.
Downloading 31 files..
100% 2.50G/2.51G [00:22<00:00, 290MiB/s]Checking file integrity...
100% 2.51G/2.51G [00:22<00:00, 120MiB/s]


In [8]:
!cp -r "/content/drive/MyDrive/episodic-memory/NLQ" "/content/NLQ"

## **Substep 1**: Select 50 queries

In [9]:
%%bash
# Select top 50 queries
python NLQ/EXTENSION2/select_query.py \
    --pred_file "/content/NLQ/EXTENSION2/best_prediction.json" \
    --val_file "/content/ego4d_data/v1/annotations/nlq_val.json" \
    --output "/content/NLQ/EXTENSION2/top50_queries.json" \
    --k 50

Saved top 50 candidate queries to /content/NLQ/EXTENSION2/top50_queries.json


Once you have the top 50 candidate queries, please manually annotate the answers in a new JSON file and select the top 50 annotated queries (*top50_annotated.json*)

##**Substep 2**: Download video and extract using ffmpeg

Collect the video_uid of the top 50 annotated queries that you select.

In [10]:
import json

# Load the annotated predictions from the JSON file
with open("/content/NLQ/EXTENSION2/top50_queries.json", "r") as f:
    top50 = json.load(f)

# Extract all unique video_uids
unique_video_uids = sorted(set(entry["video_uid"] for entry in top50))
print(f"Number of unique videos: {len(unique_video_uids)}")

# Save them into a temporary file to download the videos later
with open("video_uid_list.txt", "w") as f:
    for uid in unique_video_uids:
        f.write(uid + "\n")


Number of unique videos: 37


Download the videos related to the previously retrieved video segments one by one

In [11]:
!ego4d \
  --output_directory /content/ego4d_data \
  --datasets full_scale \
  --version v1 \
  --video_uid_file video_uid_list.txt \
  -y


Datasets to download: {'full_scale'}
Download Path: /content/ego4d_data/v1
Ego4D Metadata: /content/ego4d_data/ego4d.json
Checking requested datasets and versions...
Created download directory for version 'v1' of dataset: 'full_scale' at: /content/ego4d_data/v1/full_scale
Only downloading a subset of the video files because the 'video_uids' flag has been set on the command line or in the config file. A total of 37 video files will be downloaded.

Retrieving object metadata from S3...
100% 37/37 [00:00<00:00, 1801.18object/s]
Checking if latest file versions are already downloaded...
100% 37/37 [00:04<00:00,  8.01file/s]
No existing videos to filter.
Downloading 37 files..
100% 22.7G/22.7G [03:13<00:00, 275MiB/s]Checking file integrity...
100% 22.7G/22.7G [03:14<00:00, 126MiB/s]


Extract the parts of interest using ffmpeg without re-encoding to save space and time.

In [12]:
%%bash
python /content/NLQ/EXTENSION2/extract_clips.py \
    --queries_file "/content/NLQ/EXTENSION2/top50_queries.json" \
    --video_dir "/content/ego4d_data/v1/full_scale" \
    --clips_dir "/content/ego4d_data/v1/clips_top50"


Extracted: /content/ego4d_data/v1/clips_top50/805989f6-0696-4de2-ad9b-0f194e0ac48d_clip_00.mp4
Extracted: /content/ego4d_data/v1/clips_top50/7f4225ed-a076-4530-91cf-f3903c5d7637_clip_01.mp4
Extracted: /content/ego4d_data/v1/clips_top50/86343e9e-b932-41d3-ad6f-83f2c2fe5486_clip_02.mp4
Extracted: /content/ego4d_data/v1/clips_top50/8a6a3316-d682-4a76-81db-b244081765c9_clip_03.mp4
Extracted: /content/ego4d_data/v1/clips_top50/056db3f1-f957-46c8-b16b-c8fce22e78f9_clip_04.mp4
Extracted: /content/ego4d_data/v1/clips_top50/9f28e782-417c-4c8b-a7ae-42fc96a0e94f_clip_05.mp4
Extracted: /content/ego4d_data/v1/clips_top50/b737cd68-4e0d-440a-9813-a6c90080fac5_clip_06.mp4
Extracted: /content/ego4d_data/v1/clips_top50/2ac65951-49b0-4629-981e-edc34b8cdb0f_clip_07.mp4
Extracted: /content/ego4d_data/v1/clips_top50/8b9b9816-d6eb-4544-818e-9d59e400b80d_clip_08.mp4
Extracted: /content/ego4d_data/v1/clips_top50/3534864b-2289-4aaf-b3ed-10eeeee7acd2_clip_09.mp4
Extracted: /content/ego4d_data/v1/clips_top50/1294

## **Substep 3**: Adopt Video-LLaVA

Install requirements...


In [13]:
%%bash
pip install -U transformers
pip install bitsandbytes
python -m pip install av

Collecting transformers
  Downloading transformers-4.55.4-py3-none-any.whl.metadata (41 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.0/42.0 kB 2.3 MB/s eta 0:00:00
Downloading transformers-4.55.4-py3-none-any.whl (11.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 82.9 MB/s eta 0:00:00
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.55.2
    Uninstalling transformers-4.55.2:
      Successfully uninstalled transformers-4.55.2
Successfully installed transformers-4.55.4
Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.3/61.3 MB 10.9 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.47.0
Collecting av
  Downloading av-15.0.0-cp312-cp312-manylinux_2_28_x86_64

Adopt Video-LLaVA

In [14]:
%%bash
python /content/NLQ/EXTENSION2/llava.py \
    --clips_dir "/content/ego4d_data/v1/clips_top50" \
    --queries_json "/content/NLQ/EXTENSION2/top50_queries.json" \
    --output "/content/NLQ/EXTENSION2/answers_video_llava.json"


Starting Video-LLaVA model loading...
Model files will be downloaded
Model loaded successfully!
Starting processing of 50 video clips...
Saving results to /content/NLQ/EXTENSION2/answers_video_llava.json...
Processing completed!


2025-08-25 00:34:51.570833: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1756082091.851619    3856 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1756082091.923473    3856 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1756082092.498401    3856 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756082092.498443    3856 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756082092.498448    3856 computation_placer.cc:177] computation placer alr

##**Substep 4**: Evaluation
We use the following metrics:

1.   Rouge and Bleu Score
2.   Meteor Score



In [15]:
!pip install rouge_score evaluate


Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=6cf580fd7cf6dab75f9f37f865e9f98c75235bc35e59b19fba6af858dd3c88f5
  Stored in directory: /root/.cache/pip/wheels/85/9d/af/01feefbe7d55ef5468796f0c68225b6788e85d9d0a281e7a70
Successfully built rouge_score
Installing collected packages: rouge_score, evaluate
Successfully installed evaluate-0.4.5 rouge_score-0.1.2


In [22]:
%%bash
python /content/NLQ/EXTENSION2/compute_scores.py \
    --llava "/content/NLQ/EXTENSION2/answers_video_llava.json" \
    --gt "/content/NLQ/EXTENSION2/top50_annotated.json"


Score results saved to /content/score_results.json


2025-08-25 01:19:58.882701: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1756084798.902838   16050 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1756084798.908964   16050 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1756084798.924443   16050 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756084798.924467   16050 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756084798.924469   16050 computation_placer.cc:177] computation placer alr