Skip to content

Latest commit

 

History

History
51 lines (43 loc) · 3.68 KB

File metadata and controls

51 lines (43 loc) · 3.68 KB

PiT

  • 论文:Rethinking Spatial Dimensions of Vision Transformers

  • 官方项目:naver-ai/pit

  • 模型代码:pit.py

  • 验证集数据处理:

    # 图像后端:pil
    # 输入图像大小:224x224
    transforms = T.Compose([
        T.Resize(248, interpolation='bicubic'),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
  • 模型细节:

    Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model
    PiT-Ti pit_ti 4.9 0.7 72.91 91.40 Download
    PiT-XS pit_xs 10.6 1.4 78.18 94.16 Download
    PiT-S pit_s 23.5 2.9 81.08 95.33 Download
    PiT-B pit_b 73.8 12.5 82.44 95.71 Download
    PiT-Ti distilled pit_ti_distilled 4.9 0.7 74.54 92.10 Download
    PiT-XS distilled pit_xs_distilled 10.6 1.4 79.31 94.36 Download
    PiT-S distilled pit_s_distilled 23.5 2.9 81.99 95.79 Download
    PiT-B distilled pit_b_distilled 73.8 12.5 84.14 96.86 Download
  • 引用:

    @article{heo2021pit,
        title={Rethinking Spatial Dimensions of Vision Transformers},
        author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh},
        journal={arXiv: 2103.16302},
        year={2021},
    }