# SUTD TrafficQA — CNN + LSTM baseline (MCQ-4)

This notebook runs the **CNN+LSTM** multiple-choice baseline (4 options) that mirrors the classic LSTM-style setup described in the SUTD-TrafficQA paper:
- **BiLSTM** encodes the question and each candidate answer (QA Bank)
- **CNN** encodes sampled frames
- **LSTM** encodes the frame sequence
- An **MLP** scores each of the 4 options

## Expected dataset layout
```
CS412-CV-FinalProject-main/
  SUTD/
    videos/
      *.mp4
    questions/
      R3_train.jsonl
      R3_val.jsonl
      R3_test.jsonl
```


## 0) (Optional) Install dependencies
If you're running in a fresh environment, uncomment the cell below.


In [None]:
# %pip install -r requirements.txt
# If you hit a torchvision import error in your environment, this baseline still runs using a small CNN fallback.


## 1) Point to your project + dataset
Set `PROJECT_ROOT` to the folder that contains `train_sutd_cnn_lstm.py`.


In [None]:
import os
import sys
from pathlib import Path

# If this notebook lives in the repo root, keep '.'
PROJECT_ROOT = Path('.').resolve()
print('PROJECT_ROOT:', PROJECT_ROOT)

# Make 'src/' importable
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

# Update this if your SUTD folder is elsewhere
SUTD_ROOT = PROJECT_ROOT / 'SUTD'
print('SUTD_ROOT:', SUTD_ROOT)


## 2) Sanity check the dataset structure


In [None]:
videos_dir = SUTD_ROOT / 'videos'
questions_dir = SUTD_ROOT / 'questions'

print('videos_dir exists:', videos_dir.exists())
print('questions_dir exists:', questions_dir.exists())
if questions_dir.exists():
    print('question files:', sorted([p.name for p in questions_dir.glob('*.jsonl')])[:10])
if videos_dir.exists():
    vids = list(videos_dir.glob('*'))
    print('num videos:', len(vids))
    print('sample videos:', [v.name for v in vids[:5]])

train_file = questions_dir / 'R3_train.jsonl'
val_file   = questions_dir / 'R3_val.jsonl'
test_file  = questions_dir / 'R3_test.jsonl'
print('train file:', train_file, 'exists:', train_file.exists())
print('val file:', val_file, 'exists:', val_file.exists())
print('test file:', test_file, 'exists:', test_file.exists())


## 3) Quick smoke test (small subset)
This runs a short training to verify everything works.


In [None]:
NUM_FRAMES = 16
BATCH_SIZE = 8
EPOCHS = 1

!python train_sutd_cnn_lstm.py \
  --sutd_root "{SUTD_ROOT}" \
  --num_frames {NUM_FRAMES} \
  --batch_size {BATCH_SIZE} \
  --epochs {EPOCHS} \
  --max_train_samples 200 \
  --max_val_samples 200 \
  --use_train_aug


## 4) Full training
Remove the `--max_*_samples` flags to train on the full split.


In [None]:
# Uncomment to train fully
# NUM_FRAMES = 16
# BATCH_SIZE = 8
# EPOCHS = 5
#
# !python train_sutd_cnn_lstm.py \
#   --sutd_root "{SUTD_ROOT}" \
#   --num_frames {NUM_FRAMES} \
#   --batch_size {BATCH_SIZE} \
#   --epochs {EPOCHS} \
#   --use_train_aug


## 5) Evaluate a checkpoint
By default, training writes:
- `outputs/sutd_cnn_lstm/best.pt`
- `outputs/sutd_cnn_lstm/last.pt`


In [None]:
ckpt = PROJECT_ROOT / 'outputs' / 'sutd_cnn_lstm' / 'best.pt'
print('Using ckpt:', ckpt, 'exists:', ckpt.exists())

!python eval_sutd_cnn_lstm.py \
  --sutd_root "{SUTD_ROOT}" \
  --ckpt "{ckpt}" \
  --num_frames {NUM_FRAMES}


## 6) Suggested ablations (good for your report)
- **Frame count**: `--num_frames 8/16/32/64`
- **Backbone**: `--cnn_backbone resnet18` vs `resnet50`
- **Augmentation**: toggle `--use_train_aug`
- **Freeze CNN**: (default freeze) remove `--freeze_cnn` if you want to finetune the visual encoder
