Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for guidance on semantic segmentation #7

Open
jameslahm opened this issue Jun 23, 2024 · 1 comment
Open

Request for guidance on semantic segmentation #7

jameslahm opened this issue Jun 23, 2024 · 1 comment

Comments

@jameslahm
Copy link

jameslahm commented Jun 23, 2024

Thanks for your great work! I try to leverage the STViT-R-Swin-S in the semantic segmentation task according to Sec 6.3 in the paper. I use the pretrained checkpoint of STViT-R-Swin-S in #5 with the Top-1 accuracy of 82.43%. I use the https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation code and replace the configs/swin/upernet_swin_small_patch4_window7_512x512_160k_ade20k.py file with the follows

_base_ = [
    '../_base_/models/upernet_swin.py', '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
]
model = dict(
    backbone=dict(
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        ape=False,
        drop_path_rate=0.3,
        patch_norm=True,
        use_checkpoint=False,
        window_sample_size=3, 
        k_window_size_1=14,
        k_window_size_2=21, 
        restore_k_window_size=27,
        multi_scale='multi_scale_semantic_token1', 
        relative_pos=False, 
        # use_conv_pos=False, 
        # use_layer_scale=False, 
        pad_mask=True
    ),
    decode_head=dict(
        in_channels=[96, 192, 384, 768],
        num_classes=150
    ),
    auxiliary_head=dict(
        in_channels=384,
        num_classes=150
    ))

# AdamW optimizer, no weight decay for position embedding & layer norm in backbone
optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01,
                 paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
                                                 'relative_position_bias_table': dict(decay_mult=0.),
                                                 'norm': dict(decay_mult=0.)}))

lr_config = dict(_delete_=True, policy='poly',
                 warmup='linear',
                 warmup_iters=1500,
                 warmup_ratio=1e-6,
                 power=1.0, min_lr=0.0, by_epoch=False)

# By default, models are trained on 8 GPUs with 2 images per GPU
data=dict(samples_per_gpu=2)

# Training for 240k steps
runner = dict(type='IterBasedRunner', max_iters=240000)
checkpoint_config = dict(by_epoch=False, interval=24000)
evaluation = dict(interval=24000, metric='mIoU')

I copy the backbone file from https://github.com/changsn/STViT-R-Object-Detection/blob/main/mmdet/models/backbones/swin_transformer.py and only change the 18-th line to from mmseg.utils import get_root_logger. However, I only obtain the 46.36 mIoU using the --aug-test. It has a gap with the 48.3 mIoU in the Table 12 in the paper. Could you please give me some guidance on how to correctly reproduce the result? Thanks a lot. I'd appreciate it very much.

@changsn
Copy link
Owner

changsn commented Jun 24, 2024

I completed this work during my internship in Alibaba. I only took a portion of the code including classification and object detection when I resigned. The results of semantic segmentation are not good so that we only showed them in the supplementary materials. We tuned many parameters but excuse me that I do not recall them back. I am sorry again about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants