# Fine-tuning AOT+ Model on Extracted Frames Dataset

This notebook provides a template for fine-tuning AOT+ (or related) models on the `EXTRACTED_FRAMES` dataset. The `EXTRACTED_FRAMES` dataset is designed to load images and their corresponding polygon annotations from JSON files, making it suitable for custom datasets where annotations are in this format.

**Key Steps:**
1. Configure parameters (experiment name, model, paths, hyperparameters).
2. Load and build the configuration object (`cfg`).
3. Set up the environment and instantiate the `Trainer`.
4. Execute the training directly in the notebook.

**Before Running:**
- Ensure your `EXTRACTED_FRAMES` dataset is correctly placed (default assumption: `./extracted_frames/`).
- **Crucially, update the `pretrained_model_path` variable to point to your pretrained model weights.**
- Adjust other parameters like `model_name_str`, `batch_size`, `total_steps`, and GPU settings as needed.

In [12]:
# Setup Paths and Imports
import sys
import os
import random
import importlib # For get_config replication
import torch
import numpy as np
import json # For pretty printing config

# Add project root to sys.path to allow importing aot_plus modules
# Assuming the notebook is in aot_plus/
if '.' not in sys.path:
    sys.path.insert(0, '.') # Current directory (aot_plus/)
# To import from 'aot_plus.networks' etc., 'aot_plus' parent dir should be in path.
# If notebook is in 'aot_plus/', then '..' is its parent.
# If 'aot_plus' is the root of the project, then '.' is correct for files inside 'aot_plus'.
# Let's assume the notebook is run from the 'aot_plus' directory itself or its parent.
project_root = os.path.abspath('.') # If notebook is in aot_plus/
if 'aot_plus' not in project_root.split(os.sep)[-1]: # If current dir is not aot_plus
    # Try to find aot_plus directory if notebook is in parent like 'AOT-Tracker-plus/'
    if os.path.isdir(os.path.join(project_root, 'aot_plus')):
        project_root = os.path.join(project_root, 'aot_plus')
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Adjust path if running from parent directory of aot_plus (e.g. AOT-Tracker-plus)
if os.path.basename(os.getcwd()) != 'aot_plus' and os.path.exists('aot_plus'):
    print("Changing current working directory to 'aot_plus'")
    os.chdir('aot_plus')
    if '.' not in sys.path:
        sys.path.insert(0, '.')

print(f"Current working directory: {os.getcwd()}")
print(f"Sys.path includes: {sys.path[:3]}") # Show top few for brevity

# Import necessary components from the aot_plus project
try:
    from networks.managers.trainer import Trainer
    from utils.utils import Tee, copy_codes, make_log_dir 
except ModuleNotFoundError as e:
    print(f"Error importing project modules: {e}")
    print("Please ensure the notebook is in the 'aot_plus' directory, or the 'aot_plus' directory is in PYTHONPATH.")
    print("Alternatively, run the notebook from the parent directory of 'aot_plus'.")
    raise

In [13]:
# Configuration Parameters

# Experiment Settings
exp_name = "finetune_extracted_notebook"
model_name_str = "r50_aotl"  # Actual model identifier used by config system (e.g., "aott", "r50_aotl")
stage_str = "default"        # Stage for get_config (e.g., "default", "pre", "ytb")
dataset_name = "EXTRACTED_FRAMES" # Will be set in cfg.DATASETS

# Path to your pretrained model (VERY IMPORTANT - UPDATE THIS PATH)
# This will be set to cfg.PRETRAIN_MODEL
pretrained_model_path = ""  # e.g., "./pretrain_models/r50_aotl.pth" or "/path/to/your/model.pth"

# GPU Configuration (for single GPU execution in notebook)
gpu_id = 0 # Python variable for the GPU ID
enable_amp = True

# Training Hyperparameters (will be set in cfg)
batch_size = 2    # Adjust based on your GPU memory
total_steps = 10000 # Example, adjust as needed for fine-tuning
fix_random_seed = True

# Log directory (base directory for logs)
log_base_dir = os.path.join(os.getcwd(), f"logs_notebook_{exp_name}")

# --- Parameters previously handled by argparse in train.py ---
# These will be used to populate the cfg object

dist_start_gpu = gpu_id  # cfg.DIST_START_GPU (Used by Trainer for self.gpu if not distributed)
train_gpus = 1           # cfg.TRAIN_GPUS (Set to 1 for notebook's single GPU context)
train_batch_size = batch_size # cfg.TRAIN_BATCH_SIZE

# cfg.PRETRAIN_MODEL (Path to pretrained weights) - already defined as pretrained_model_path

# cfg.TRAIN_LR (If not set, uses default from config. Set to -1 to use default from config file)
train_lr = -1.0 

train_total_steps = total_steps # cfg.TRAIN_TOTAL_STEPS

# cfg.TRAIN_START_STEP (Usually 0 for new fine-tuning, or from a resumed checkpoint)
train_start_step = 0 

dist_url = '' # cfg.DIST_URL (Not used for single GPU)

# cfg.AMP (Handled by enable_amp for Trainer) - already defined as enable_amp
# cfg.LOG (Base log directory) - already defined as log_base_dir
# cfg.FIX_RANDOM - already defined as fix_random_seed

print(f"Experiment: {exp_name}, Model: {model_name_str}, Stage: {stage_str}")
if not pretrained_model_path:
    print("WARNING: pretrained_model_path is not set. Training will start from scratch unless the stage/model config loads one by default.")
else:
    print(f"Pretrained model: {pretrained_model_path}")
print(f"Log directory will be under: {log_base_dir}")

In [14]:
# Load Configuration
from datetime import datetime
# Replicate get_config function from tools/get_config.py
def get_config_notebook(stage_name, exp_name_cfg, model_cfg_name):
    try:
        # Construct module name, assuming 'configs' is a package in the project root
        module_name = 'configs.' + stage_name
        engine_config_module = importlib.import_module(module_name)
        
        if hasattr(engine_config_module, 'EngineConfig'):
            return engine_config_module.EngineConfig(exp_name_cfg, model_cfg_name)
        elif hasattr(engine_config_module, 'DefaultEngineConfig'): # Common for 'default' stage
             return engine_config_module.DefaultEngineConfig(exp_name_cfg, model_cfg_name)
        else:
            raise AttributeError(f"Neither 'EngineConfig' nor 'DefaultEngineConfig' class found in module '{module_name}'.")

    except ModuleNotFoundError:
        print(f"ERROR: Stage configuration module '{module_name}.py' not found.")
        print("Please ensure 'stage_str' in the notebook refers to a valid config file (e.g., 'default', 'pre', 'ytb').")
        raise
    except AttributeError as e:
        print(f"ERROR: Configuration class not found or attribute error: {e}")
        raise

cfg = get_config_notebook(stage_str, exp_name, model_name_str)

# Apply settings from notebook variables (previously argparse)
cfg.DATASETS = [dataset_name] # Must be a list
cfg.DIST_START_GPU = dist_start_gpu
cfg.TRAIN_GPUS = train_gpus # Set to 1 for single GPU

if train_batch_size > 0: cfg.TRAIN_BATCH_SIZE = train_batch_size
if pretrained_model_path: cfg.PRETRAIN_MODEL = pretrained_model_path
if train_lr > 0: cfg.TRAIN_LR = train_lr # Only override if train_lr is positive
if train_total_steps > 0: cfg.TRAIN_TOTAL_STEPS = train_total_steps
if train_start_step >= 0: cfg.TRAIN_START_STEP = train_start_step # Allow 0

cfg.LOG = log_base_dir # Base log directory for make_log_dir
cfg.FIX_RANDOM = fix_random_seed
# cfg.AMP = enable_amp # Trainer takes enable_amp directly, cfg.AMP might not be used by Trainer

# Initialize directories which might be done in EngineConfig's init_dir()
if hasattr(cfg, 'init_dir') and callable(cfg.init_dir):
    cfg.init_dir() # This will set up DIR_LOG, DIR_CKPT etc.
else:
    print("Warning: cfg.init_dir() method not found. Log directories might not be auto-created by config.")
    # Manually set some essential ones if not set by init_dir
    if not hasattr(cfg, 'DIR_LOG'): cfg.DIR_LOG = os.path.join(cfg.LOG, cfg.EXP_NAME)
    if not hasattr(cfg, 'DIR_CKPT'): cfg.DIR_CKPT = os.path.join(cfg.DIR_LOG, 'ckpt')

# Use make_log_dir to ensure the specific log directory for this run is created
def safe_make_log_dir(log_dir, name):
    ts = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
    new_log_dir = os.path.join(log_dir, f"{ts}_{name}")
    if os.path.isdir(new_log_dir):
        raise Exception(f"{new_log_dir} already exist ! abort ...")
    os.makedirs(new_log_dir)
    return new_log_dir
# cfg.EXP_NAME is used by make_log_dir to create the final experiment log path
current_log_dir = safe_make_log_dir(cfg.LOG, cfg.EXP_NAME) 
# Update cfg.DIR_LOG to the fully qualified path returned by make_log_dir
cfg.DIR_LOG = current_log_dir 
if not hasattr(cfg, 'DIR_TB_LOG'): cfg.DIR_TB_LOG = os.path.join(cfg.DIR_LOG, 'tensorboard') # Common practice
if not os.path.exists(cfg.DIR_TB_LOG): os.makedirs(cfg.DIR_TB_LOG, exist_ok=True)
if not os.path.exists(cfg.DIR_CKPT): os.makedirs(cfg.DIR_CKPT, exist_ok=True)

# Optional: copy_codes(current_log_dir)

# Save config to log directory (simple JSON dump for notebook)
try:
    config_save_path = os.path.join(current_log_dir, "config_notebook.json")
    # Create a serializable version of cfg (excluding non-serializable items if any)
    cfg_dict = {k: v for k, v in cfg.__dict__.items() if not k.startswith('__') and not callable(v)}
    with open(config_save_path, 'w') as f:
        json.dump(cfg_dict, f, indent=4)
    print(f"Configuration saved to {config_save_path}")
except Exception as e:
    print(f"Error saving config: {e}")

print(f"Configuration loaded. Log directory for this run: {cfg.DIR_LOG}")

In [17]:
# Training Execution

# This cell adapts the main_worker function from tools/train.py

current_gpu_id = gpu_id # Defined in config cell
torch.cuda.set_device(current_gpu_id)
print(f"Using GPU: {current_gpu_id}")

if cfg.FIX_RANDOM:
    # Use a fixed seed for notebook, or derive from gpu_id if preferred
    random_seed = 42 
    print(f"Fixing random seed to {random_seed}")
    os.environ['PYTHONHASHSEED'] = str(random_seed)
    random.seed(random_seed + 1)
    np.random.seed(random_seed + 2)
    torch.manual_seed(random_seed + 3)
    torch.cuda.manual_seed(random_seed + 4)
    torch.cuda.manual_seed_all(random_seed + 5)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Tee for logging output to a file in addition to notebook cell output
# Note: This will capture print statements. Trainer's internal logging might also write to files.
log_file_path = os.path.join(cfg.DIR_LOG, "print_notebook.log")
original_stdout = sys.stdout
tee = Tee(log_file_path)
sys.stdout = tee
print(f"Notebook output is being logged to: {log_file_path}")

print(f"Initiating Trainer on GPU {current_gpu_id} (rank 0 for notebook context)")
# rank is 0 as we are not using mp.spawn for distributed training in the notebook
cfg.DIST_ENABLE = False
cfg.MODEL_ENCODER_PRETRAIN = './pretrain_models/aotplus_R50_DeAOTL_Temp_pe_Slot_4_ema_20000.pth'
trainer = Trainer(rank=0, cfg=cfg, enable_amp=enable_amp) 

print("Starting training...")
try:
    trainer.sequential_training()
    print("Training finished successfully.")
except Exception as e:
    print(f"An error occurred during training: {e}")
    import traceback
    traceback.print_exc()
finally:
    # Restore original stdout and close the Tee logger
    sys.stdout = original_stdout
    tee.close()
    print(f"Restored stdout. Log file saved at: {log_file_path}")

Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Temp\ipykernel_17380\3187809420.py", line 38, in <module>
    trainer.sequential_training()
  File "d:\baidu\medventions file\RMEM\RMem_ocu\aot_plus\networks\managers\trainer.py", line 453, in sequential_training
    if train_sampler is not None:
AttributeError: 'NoneType' object has no attribute 'set_epoch'


In [None]:
# 方法一：使用 os 模块
import os

dir_path = '../extracted_frames/'
try:
    print(f"目录 {dir_path} 下的文件：")
    for name in os.listdir(dir_path):
        fullpath = os.path.join(dir_path, name)
        if os.path.isfile(fullpath):
            print(name)
except FileNotFoundError:
    print(f"目录 {dir_path} 不存在")

## Evaluating the Model

The `dataloaders` have been configured so that the `ExtractedFramesTrain` dataset can also be used for evaluation. If you wish to evaluate your fine-tuned model on the `extracted_frames` data (or a subset of it), you would typically:

1.  **Adapt the Configuration (`cfg`):**
    *   Ensure `cfg.EVAL_DATASETS` (or a similar configuration attribute used by evaluation scripts/logic) is set to `["EXTRACTED_FRAMES"]`.
    *   Set `cfg.TEST_BATCH_SIZE` as needed.
    *   Specify the checkpoint of your fine-tuned model for evaluation (e.g., `cfg.TEST_CKPT_PATH` or similar).

2.  **Evaluation Logic:**
    *   The `Trainer` class (`networks.managers.trainer.Trainer`) might have evaluation methods, or you might need to adapt logic from `tools/eval.py`.
    *   This would involve:
        *   Loading the fine-tuned model weights.
        *   Instantiating an evaluation dataloader: `eval_loader = build_eval_dataloader(cfg, "EXTRACTED_FRAMES", split="val")` (assuming "val" split is relevant or you use all data).
        *   Iterating through the `eval_loader` and running inference with the model.
        *   Calculating evaluation metrics (e.g., IoU).

*This notebook currently focuses on training. Full evaluation script integration is a separate step.*

## 5. Important Notes

*   **Update `pretrained_model_path`**: This is the most critical parameter for fine-tuning. Ensure it points to a valid `.pth` model file.
*   **Adjust Parameters**: Modify `batch_size` according to your GPU memory. Change `total_steps` based on how long you want to fine-tune. Select the correct `model_name_str` and `stage_str` matching your project's configurations in `aot_plus/configs/`.
*   **Logs**: Training logs, print statements (mirrored to `print_notebook.log`), TensorBoard files, and saved checkpoints will be saved to the directory specified in `cfg.DIR_LOG` (e.g., `./logs_notebook_finetune_extracted_notebook/finetune_extracted_notebook/`).
*   **Dataset Location**: The `ExtractedFramesTrain` dataset loader defaults to looking for data in `./extracted_frames/`. Ensure your images and JSON annotations are there.
*   **Kernel Interrupts**: If you interrupt the kernel, the training process should stop. The `finally` block in the training cell attempts to restore standard output.