-
官方项目:naver-ai/pit
-
模型代码:pit.py
-
验证集数据处理:
# 图像后端:pil # 输入图像大小:224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
模型细节:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model PiT-Ti pit_ti 4.9 0.7 72.91 91.40 Download PiT-XS pit_xs 10.6 1.4 78.18 94.16 Download PiT-S pit_s 23.5 2.9 81.08 95.33 Download PiT-B pit_b 73.8 12.5 82.44 95.71 Download PiT-Ti distilled pit_ti_distilled 4.9 0.7 74.54 92.10 Download PiT-XS distilled pit_xs_distilled 10.6 1.4 79.31 94.36 Download PiT-S distilled pit_s_distilled 23.5 2.9 81.99 95.79 Download PiT-B distilled pit_b_distilled 73.8 12.5 84.14 96.86 Download
-
引用:
@article{heo2021pit, title={Rethinking Spatial Dimensions of Vision Transformers}, author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh}, journal={arXiv: 2103.16302}, year={2021}, }