# HuMAR Model Testing - Backbone Issues FIXED ✅

## What Was Wrong?
All timm-based backbones (MobileViT, EfficientFormer, PoolFormer) and SegFormer were crashing with:
```
TypeError: conv2d() received an invalid combination of arguments - got (NestedTensor, Parameter, ...)
```

## What Got Fixed?
✅ **MobileViT** - Now extracts tensor from NestedTensor before processing  
✅ **SegFormer** - Now handles NestedTensor properly  
✅ **EfficientFormer** - Now extracts tensor from NestedTensor  
✅ **PoolFormer** - Now extracts tensor from NestedTensor  

All backbones now return `Dict[str, NestedTensor]` matching the expected format.

## IMPORTANT: You MUST Restart the Kernel!
1. Click **Kernel → Restart Kernel** (or press restart button)
2. Re-run cells from the top
3. Test different backbones!

## Available Backbones (All Working Now!)

**Lightweight (Best for RTX 4050 6GB):**
- `mobilevit_xxs` - 1.3M params, ultra-fast ⚡⚡⚡⚡⚡
- `segformer_mit_b0` - 3.7M params, best efficiency ⭐

**Balanced:**
- `mobilevit_xs` - 2.3M params
- `mobilevit_s` - 5.6M params
- `segformer_mit_b1` - 13.7M params

**Higher Capacity:**
- `poolformer_s12` - 12M params
- `swin_T_224_1k` - 28M params (original)

See [BACKBONE_FIX_SUMMARY.md](BACKBONE_FIX_SUMMARY.md) for detailed troubleshooting.

In [None]:
class UniPHDArgs:
    def __init__(self):
        # -----------------------------
        # Config / Override Parameters
        # -----------------------------
        self.config_file = ""           # REQUIRED: path to .py config
        self.options = None             # List of overrides via DictAction (e.g., ["lr=1e-4"])

        # -----------------------------
        # Prompt / Text Encoder
        # -----------------------------
        self.freeze_text_encoder = False
        self.train_trigger = "text scribble point"
        self.eval_trigger = "text"

        self.kps_visi_trigger = True
        self.pose_guide_trigger = False
        self.late_within_attn_trigger = True
        self.within_type = "attn_graph"
        self.no_mask = False

        # -----------------------------
        # Model Backbone
        # -----------------------------
        self.backbone = "mobilevit_xxs"
        self.swin_pretrain_path = r"C:\Users\nikhi\Desktop\HuMAR\datasets\RefHuman"

        # -----------------------------
        # Dataset Parameters
        # -----------------------------
        self.dataset_file = "refhuman"
        self.coco_path = "../datasets/RefHuman"
        self.remove_difficult = False

        # -----------------------------
        # Training Parameters
        # -----------------------------
        self.output_dir = "./results/UniPHD_Results"
        self.note = ""
        self.device = "cuda"
        self.seed = 42

        self.resume = ""                # checkpoint path
        self.pretrain_model_path = None # external checkpoint
        self.finetune_ignore = None     # list[str]

        self.start_epoch = 0
        self.eval = False
        self.num_workers = 0
        self.find_unused_params = False
        self.save_log = False

        # -----------------------------
        # Distributed Training
        # -----------------------------
        self.world_size = 1
        self.dist_url = "env://"
        self.rank = 0
        self.local_rank = 0
        self.amp = False                # Mixed precision

        # -----------------------------
        # Additional Keys Updated Later
        # -----------------------------
        self.use_ema = False
        self.debug = False

        # -----------------------------
        # MODEL NAME (REQUIRED FOR build_model_main)
        # -----------------------------
        # MUST be set to something like "UniPHD", "UniRef", "UniSeg", etc.
        self.modelname = "uniphd"             # <---- you MUST fill this
        self.num_classes = 2

        self.lr = 0.0001
        self.lr_adjacent_matrix = 1e-04
        self.param_dict_type = 'default'
        self.lr_backbone = 1e-05
        self.lr_backbone_names = ['backbone.0']
        self.lr_linear_proj_names = ['reference_points', 'sampling_offsets']
        self.lr_linear_proj_mult = 0.1
        self.lr_text_encoder = 0.0001
        self.lr_text_encoder_names = ['text_encoder']
        self.batch_size = 4
        self.weight_decay = 0.0001
        self.epochs = 20
        self.lr_drop = 18
        self.save_checkpoint_interval = 5
        self.clip_max_norm = 0.1

        self.modelname = 'uniphd'
        self.frozen_weights = None
        self.use_checkpoint = False
        self.dilation = False
        self.position_embedding = 'sine'
        self.pe_temperatureH = 20
        self.pe_temperatureW = 20
        self.return_interm_indices = [0, 1, 2, 3]
        self.backbone_freeze_keywords = None

        # for transformer
        self.hidden_dim = 256
        self.dropout = 0.0
        self.dim_feedforward = 2048
        self.enc_layers = 6
        self.dec_layers = 6
        self.pre_norm = False
        self.return_intermediate_dec = True
        self.enc_n_points = 4
        self.dec_n_points = 4
        self.learnable_tgt_init = False
        self.transformer_activation = 'relu'

        # for main model
        self.num_classes=2
        self.nheads = 8
        self.num_queries = 20
        self.num_feature_levels = 4
        self.dec_pred_class_embed_share = False
        self.dec_pred_pose_embed_share = False
        self.two_stage_type = 'standard'
        self.two_stage_bbox_embed_share = False
        self.two_stage_class_embed_share = False
        self.cls_no_bias = False
        self.num_body_points = 17

        # for loss
        self.focal_alpha = 0.25
        self.cls_loss_coef = 2.0
        self.bbox_loss_coef = 5.0
        self.keypoints_loss_coef = 10.0
        self.keypoints_visi_loss_coef = 4.0
        self.oks_loss_coef=4.0
        self.giou_loss_coef = 2.0
        self.enc_loss_coef = 1.0
        self.interm_loss_coef = 1.0
        self.mask_loss_coef = 2.0
        self.dice_loss_coef = 5.0
        self.no_interm_loss = False
        self.aux_loss = True

        # for matcher
        self.matcher_type = 'HungarianMatcher'
        self.set_cost_class = 2.0
        self.set_cost_bbox = 5.0
        self.set_cost_giou = 2.0
        self.set_cost_keypoints = 10.0
        self.set_cost_keypoints_visi = 4.0
        self.set_cost_oks=4.0
        self.set_cost_kpvis = 0.0
        self.set_cost_mask = 2.0
        self.set_cost_dice = 5.0

        # for postprocess
        self.num_select = 20

        # for ema
        self.use_ema = False
        self.ema_decay = 0.9997
        self.ema_epoch = 0


    def __str__(self):
        """For clean printing."""
        return "\n".join([f"{k}: {v}" for k, v in self.__dict__.items()])


args = UniPHDArgs()
args.config_file = "configs/uniphd.py"
args.modelname = "uniphd"

In [None]:
def build_model_main():
    from models.registry import MODULE_BUILD_FUNCS
    assert 'uniphd' in MODULE_BUILD_FUNCS._module_dict
    build_func = MODULE_BUILD_FUNCS.get('uniphd')
    model, criterion, postprocessors = build_func(args)
    return model, criterion, postprocessors

In [None]:
model, criterion, postprocessors = build_model_main()

In [None]:
import torch
from util.misc import nested_tensor_from_tensor_list

# Generate random input tensor (batch_size=2, 3 channels, 256x192 image)
dummy_images = torch.randn(2, 3, 256, 192)

# Generate dummy targets with captions (required by the model)
dummy_targets = [
    {'caption': 'A person standing'},
    {'caption': 'A person sitting'}
]

# Move to device
device = next(model.parameters()).device
dummy_images = dummy_images.to(device)

# Create NestedTensor from images
samples = nested_tensor_from_tensor_list(dummy_images)

# Forward pass
with torch.no_grad():
    outputs = model(samples, dummy_targets)

print("✅ Forward pass successful!")
print(f"Output keys: {outputs.keys()}")

In [None]:
# Text Encoder: MiniLM
from torchinfo import summary
summary(model)

In [None]:
# Text Encoder: RobertA
from torchinfo import summary
summary(model)

## Dataset Analysis

In [None]:
import os
import pandas as pd
import json
import warnings
warnings.filterwarnings("ignore")
img_dir= r'C:\Users\nikhi\Desktop\HuMAR\datasets\RefHuman_Small\images'
train_path = r'C:\Users\nikhi\Desktop\HuMAR\datasets\RefHuman_Small\RefHuman_train2.json'
with open(train_path, 'r', encoding='utf-8') as f:
    train = json.load(f)
val_path = r'C:\Users\nikhi\Desktop\HuMAR\datasets\RefHuman_Small\RefHuman_val2.json'
with open(val_path, 'r', encoding='utf-8') as f:
    val = json.load(f)

In [None]:
small_img_paths = os.listdir(img_dir)

In [None]:
train_img_paths = []
for i in train['images']:
    train_img_paths.append(i['file_name'])  
val_img_paths = []
for i in val['images']:
    val_img_paths.append(i['file_name'])    
train_img_paths = set(train_img_paths)
val_img_paths = set(val_img_paths)

In [None]:
train_small_img_paths = []
val_small_img_paths = []

for i in small_img_paths:
    if i in train_img_paths:
        train_small_img_paths.append(i)
    elif i in val_img_paths:
        val_small_img_paths.append(i)

In [None]:
final_train_imgs = []
final_val_imgs = []
for i in train_small_img_paths:
    for j in train['images']:
        if i == j['file_name']:
            final_train_imgs.append(j)
            break
for i in val_small_img_paths:
    for j in val['images']:
        if i == j['file_name']:
            final_val_imgs.append(j)
            break

In [None]:
train['images'] = final_train_imgs
val['images'] = final_val_imgs

In [None]:
with open('datasets/RefHuman_Small/RefHuman_train.json', "w", encoding="utf-8") as _f:
    json.dump(train, _f, ensure_ascii=False, indent=2)

with open('datasets/RefHuman_Small/RefHuman_val.json', "w", encoding="utf-8") as _f:
    json.dump(val, _f, ensure_ascii=False, indent=2)

In [None]:
# Test with different backbones
# Available options:
# MobileViT: 'mobilevit_xxs', 'mobilevit_xs', 'mobilevit_s'
# SegFormer: 'segformer_mit_b0', 'segformer_mit_b1'
# EfficientFormer: 'efficientformerv2_s0', 'efficientformerv2_s1', 'efficientformer_l1'
# PoolFormer: 'poolformer_s12', 'poolformer_s24', 'poolformer_s36'
# Swin: 'swin_T_224_1k'

print("Current backbone:", args.backbone)
print("\nTo test a different backbone, edit the UniPHDArgs class and set:")
print("self.backbone = 'your_chosen_backbone'")
print("\nThen restart kernel and rebuild the model.")

## Testing Fixed Backbones

After restarting the kernel, you can test different backbones by changing `args.backbone` in the first cell.