Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to reproduce results of Market1501 based on SBS(R50-ibn) results #439

Closed
sijun-zhou opened this issue Mar 22, 2021 · 14 comments
Closed

Comments

@sijun-zhou
Copy link

sijun-zhou commented Mar 22, 2021

my environment:
python3.6
pytorch 1.2.0
cuda 10.0.130
apex 0.1
GPU 2*2080TI

I train it with 2 2080ti gpu card on market1501 dataset with all default settings of sbs_R50-ibn.yml but i cannot reproduce the results
My highest results is as follows for highest top1(92.64%) and map(78.78%) respectively, which is far less then the model zone 95.7%(top1) and 89.3%(map):

##################################### top1 ################################################
[03/22 10:37:55 fastreid.utils.events]: eta: 0:00:27 epoch/iter: 59/11999 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 9.157e-05 time: 0.2132 data_time: 0.0007 lr: 7.00e-07 max_mem: 9573M
[03/22 10:38:22 fastreid.utils.events]: eta: 0:00:00 epoch/iter: 59/12119 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 7.518e-05 time: 0.2134 data_time: 0.0009 lr: 7.00e-07 max_mem: 9573M
[03/22 10:38:23 fastreid.engine.defaults]: Prepare testing set
[03/22 10:38:23 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format:
subset # ids # images # cameras
|:---------|:--------|:-----------|:------------|
| query | 750 | 3368 | 6 |
| gallery | 751 | 15913 | 6 |
[03/22 10:38:23 fastreid.evaluation.evaluator]: Start inference on 19281 images
[03/22 10:38:30 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1033 s / batch. ETA=0:00:14
[03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.542858 (0.106458 s / batch per device, on 2 devices)
[03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.103480 s / batch per device, on 2 devices)
[03/22 10:40:17 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format:
[03/22 10:40:17 fastreid.evaluation.testing]: Evaluation results in csv format:
Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
|:-----------|:---------|:---------|:----------|:------|:-------|:---------|
| Market1501 | 92.64 | 97.06 | 98.28 | 78.01 | 42.65 | 85.32 |
###########################################################################################

##################################### map ################################################
[03/22 10:27:15 fastreid.utils.events]: eta: 0:08:44 epoch/iter: 48/9799 total_loss: 14.32 loss_cls: 14.32 loss_triplet: 0.001028 time: 0.2104 data_time: 0.0009 lr: 1.04e-04 max_mem: 9573M
[03/22 10:27:38 fastreid.utils.events]: eta: 0:08:22 epoch/iter: 48/9897 total_loss: 14.46 loss_cls: 14.45 loss_triplet: 0.0009101 time: 0.2105 data_time: 0.0006 lr: 1.04e-04 max_mem: 9573M
[03/22 10:28:01 fastreid.utils.events]: eta: 0:07:59 epoch/iter: 49/9999 total_loss: 14.38 loss_cls: 14.38 loss_triplet: 0.001046 time: 0.2107 data_time: 0.0009 lr: 8.80e-05 max_mem: 9573M
[03/22 10:28:24 fastreid.engine.defaults]: Prepare testing set
[03/22 10:28:24 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format:
subset # ids # images # cameras
|:---------|:--------|:-----------|:------------|
| query | 750 | 3368 | 6 |
| gallery | 751 | 15913 | 6 |
[03/22 10:28:24 fastreid.evaluation.evaluator]: Start inference on 19281 images
[03/22 10:28:32 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1015 s / batch. ETA=0:00:14
[03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.644438 (0.107154 s / batch per device, on 2 devices)
[03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.104161 s / batch per device, on 2 devices)
[03/22 10:30:45 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format:
[03/22 10:30:45 fastreid.evaluation.testing]: Evaluation results in csv format:
Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
|:-----------|:---------|:---------|:----------|:------|:-------|:---------|
| Market1501 | 92.49 | 97.39 | 98.25 | 78.78 | 44.38 | 85.64 |
###########################################################################################

@sijun-zhou
Copy link
Author

sijun-zhou commented Mar 22, 2021

Below is my training configs:
(fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 43535 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --num-gpus 2
Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
[03/22 09:47:02 fastreid]: Rank of current process: 0. World size: 2
[03/22 09:47:03 fastreid]: Environment info:


sys.platform linux
Python 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0]
numpy 1.19.5
fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid
FASTREID_ENV_MODULE
PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1 GeForce RTX 2080 Ti
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
cv2 4.5.1


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.18.1 (Git Hash 7de7e5d02bf687f971e7668963649728356e0c20)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  • CuDNN 7.6.2
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[03/22 09:47:03 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
[03/22 09:47:03 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml:
BASE: ../Base-SBS.yml

MODEL:
BACKBONE:
WITH_IBN: True

DATASETS:
NAMES: ("Market1501",)
TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 09:47:03 fastreid]: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
NAIVE_WAY: True
NUM_INSTANCE: 16
NUM_WORKERS: 8
PK_SAMPLER: True
DATASETS:
COMBINEALL: False
NAMES: ('Market1501',)
TESTS: ('Market1501',)
INPUT:
AUGMIX_PROB: 0.0
AUTOAUG_PROB: 0.1
CJ:
BRIGHTNESS: 0.15
CONTRAST: 0.15
ENABLED: False
HUE: 0.1
PROB: 0.5
SATURATION: 0.1
DO_AFFINE: False
DO_AUGMIX: False
DO_AUTOAUG: True
DO_FLIP: True
DO_PAD: True
FLIP_PROB: 0.5
PADDING: 10
PADDING_MODE: constant
REA:
ENABLED: True
PROB: 0.5
VALUE: [123.675, 116.28, 103.53]
RPT:
ENABLED: False
PROB: 0.5
SIZE_TEST: [384, 128]
SIZE_TRAIN: [384, 128]
KD:
MODEL_CONFIG: ['']
MODEL_WEIGHTS: ['']
MODEL:
BACKBONE:
DEPTH: 50x
FEAT_DIM: 2048
LAST_STRIDE: 1
NAME: build_resnet_backbone
NORM: BN
PRETRAIN: True
PRETRAIN_PATH:
WITH_IBN: True
WITH_NL: True
WITH_SE: False
DEVICE: cuda
FREEZE_LAYERS: ['backbone']
HEADS:
CLS_LAYER: circleSoftmax
EMBEDDING_DIM: 0
MARGIN: 0.35
NAME: EmbeddingHead
NECK_FEAT: after
NORM: BN
NUM_CLASSES: 0
POOL_LAYER: gempoolP
SCALE: 64
WITH_BNNECK: True
LOSSES:
CE:
ALPHA: 0.2
EPSILON: 0.1
SCALE: 1.0
CIRCLE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
COSFACE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
FL:
ALPHA: 0.25
GAMMA: 2
SCALE: 1.0
NAME: ('CrossEntropyLoss', 'TripletLoss')
TRI:
HARD_MINING: True
MARGIN: 0.0
NORM_FEAT: False
SCALE: 1.0
META_ARCHITECTURE: Baseline
PIXEL_MEAN: [123.675, 116.28, 103.53]
PIXEL_STD: [58.395, 57.120000000000005, 57.375]
QUEUE_SIZE: 8192
WEIGHTS:
OUTPUT_DIR: logs/market1501/sbs_R50-ibn
SOLVER:
BASE_LR: 0.00035
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 20
DELAY_EPOCHS: 30
ETA_MIN_LR: 7e-07
FP16_ENABLED: False
FREEZE_FC_ITERS: 0
FREEZE_ITERS: 1000
GAMMA: 0.1
HEADS_LR_FACTOR: 1.0
IMS_PER_BATCH: 64
MAX_EPOCH: 60
MOMENTUM: 0.9
NESTEROV: True
OPT: Adam
SCHED: CosineAnnealingLR
STEPS: [40, 90]
WARMUP_FACTOR: 0.1
WARMUP_ITERS: 2000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0005
WEIGHT_DECAY_BIAS: 0.0005
TEST:
AQE:
ALPHA: 3.0
ENABLED: False
QE_K: 5
QE_TIME: 1
EVAL_PERIOD: 10
FLIP_ENABLED: False
IMS_PER_BATCH: 128
METRIC: cosine
PRECISE_BN:
DATASET: Market1501
ENABLED: False
NUM_ITER: 300
RERANK:
ENABLED: False
K1: 20
K2: 6
LAMBDA: 0.3
ROC_ENABLED: False
[03/22 09:47:03 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml
[03/22 09:47:03 fastreid.utils.env]: Using a generated random seed 3342157
[03/22 09:47:03 fastreid.engine.defaults]: Prepare training set
[03/22 09:47:03 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format:
subset # ids # images # cameras
|:---------|:--------|:-----------|:------------|
| train | 751 | 12936 | 6 |
[03/22 09:47:03 fastreid.engine.defaults]: Auto-scaling the num_classes=751
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint:
NL_2.0.g.{weight, bias}
NL_2.0.W.0.{weight, bias}
NL_2.0.W.1.{weight, bias, running_mean, running_var}
NL_2.0.theta.{weight, bias}
NL_2.0.phi.{weight, bias}
NL_2.1.g.{weight, bias}
NL_2.1.W.0.{weight, bias}
NL_2.1.W.1.{weight, bias, running_mean, running_var}
NL_2.1.theta.{weight, bias}
NL_2.1.phi.{weight, bias}
NL_3.0.g.{weight, bias}
NL_3.0.W.0.{weight, bias}
NL_3.0.W.1.{weight, bias, running_mean, running_var}
NL_3.0.theta.{weight, bias}
NL_3.0.phi.{weight, bias}
NL_3.1.g.{weight, bias}
NL_3.1.W.0.{weight, bias}
NL_3.1.W.1.{weight, bias, running_mean, running_var}
NL_3.1.theta.{weight, bias}
NL_3.1.phi.{weight, bias}
NL_3.2.g.{weight, bias}
NL_3.2.W.0.{weight, bias}
NL_3.2.W.1.{weight, bias, running_mean, running_var}
NL_3.2.theta.{weight, bias}
NL_3.2.phi.{weight, bias}
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model:
fc.{weight, bias}
[03/22 09:47:04 fastreid.engine.defaults]: Model:
Baseline(
(backbone): ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(NL_1): ModuleList()
(NL_2): ModuleList(
(0): Non_local(
(g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
)
(1): Non_local(
(g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
)
)
(NL_3): ModuleList(
(0): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
(1): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
(2): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
)
(NL_4): ModuleList()
)
(heads): EmbeddingHead(
(pool_layer): GeneralizedMeanPoolingP(Parameter containing:
tensor([3.], device='cuda:0', requires_grad=True), output_size=1)
(bottleneck): Sequential(
(0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(classifier): CircleSoftmax(in_features=2048, num_classes=751, scale=64, margin=0.35)
)
)
[03/22 09:47:14 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch
[03/22 09:47:14 fastreid.engine.train_loop]: Starting training from epoch 0
[03/22 09:47:14 fastreid.engine.hooks]: Freeze layer group "backbone" training for 1000 iterations
[03/22 09:47:30 fastreid.utils.events]: eta: 0:12:59 epoch/iter: 0/199 total_loss: 64.82 loss_cls: 50.34 loss_triplet: 14.5 time: 0.0668 data_time: 0.0010 lr: 6.63e-05 max_mem: 9426M
[03/22 09:47:30 fastreid.utils.events]: eta: 0:12:57 epoch/iter: 0/201 total_loss: 64.8 loss_cls: 50.34 loss_triplet: 14.47 time: 0.0667 data_time: 0.0011 lr: 6.67e-05 max_mem: 9426M
[03/22 09:47:44 fastreid.utils.events]: eta: 0:13:04 epoch/iter: 1/399 total_loss: 64.29 loss_cls: 49.86 loss_triplet: 14.4 time: 0.0681 data_time: 0.0007 lr: 9.78e-05 max_mem: 9426M
[03/22 09:47:44 fastreid.utils.events]: eta: 0:13:02 epoch/iter: 1/403 total_loss: 64.29 loss_cls: 49.84 loss_triplet: 14.4 time: 0.0681 data_time: 0.0008 lr: 9.85e-05 max_mem: 9426M
[03/22 09:47:58 fastreid.utils.events]: eta: 0:12:50 epoch/iter: 2/599 total_loss: 63.29 loss_cls: 48.93 loss_triplet: 14.38 time: 0.0681 data_time: 0.0008 lr: 1.29e-04 max_mem: 9426M
[03/22 09:47:58 fastreid.utils.events]: eta: 0:12:48 epoch/iter: 2/605 total_loss: 63.28 loss_cls: 48.91 loss_triplet: 14.42 time: 0.0681 data_time: 0.0007 lr: 1.30e-04 max_mem: 9426M
[03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/799 total_loss: 61.76 loss_cls: 47.67 loss_triplet: 14.23 time: 0.0683 data_time: 0.0007 lr: 1.61e-04 max_mem: 9426M
[03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/807 total_loss: 61.73 loss_cls: 47.65 loss_triplet: 14.23 time: 0.0684 data_time: 0.0008 lr: 1.62e-04 max_mem: 9426M
[03/22 09:48:25 fastreid.utils.events]: eta: 0:12:21 epoch/iter: 4/999 total_loss: 60.35 loss_cls: 46.21 loss_triplet: 14 time: 0.0682 data_time: 0.0009 lr: 1.92e-04 max_mem: 9426M
[03/22 09:48:25 fastreid.engine.hooks]: Open layer group "backbone" training

@sijun-zhou
Copy link
Author

sijun-zhou commented Mar 22, 2021

inferece accuracy is also far lower than the accuracy posted in the model zone.
Any one can help me solve this out? Thanks in advance!

(fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 41755 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --eval-only MODEL.WEIGHTS logs/market1501/sbs_R50-ibn/model_best.pth MODEL.DEVICE cuda:0
Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False)
[03/22 11:49:03 fastreid]: Rank of current process: 0. World size: 1
[03/22 11:49:04 fastreid]: Environment info:


sys.platform linux
Python 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0]
numpy 1.19.5
fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid
FASTREID_ENV_MODULE
PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1 GeForce RTX 2080 Ti
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
cv2 4.5.1


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.18.1 (Git Hash 7de7e5d02bf687f971e7668963649728356e0c20)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  • CuDNN 7.6.2
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[03/22 11:49:04 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False)
[03/22 11:49:04 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml:
BASE: ../Base-SBS.yml

MODEL:
BACKBONE:
WITH_IBN: True

DATASETS:
NAMES: ("Market1501",)
TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 11:49:04 fastreid]: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
NAIVE_WAY: True
NUM_INSTANCE: 16
NUM_WORKERS: 8
PK_SAMPLER: True
DATASETS:
COMBINEALL: False
NAMES: ('Market1501',)
TESTS: ('Market1501',)
INPUT:
AUGMIX_PROB: 0.0
AUTOAUG_PROB: 0.1
CJ:
BRIGHTNESS: 0.15
CONTRAST: 0.15
ENABLED: False
HUE: 0.1
PROB: 0.5
SATURATION: 0.1
DO_AFFINE: False
DO_AUGMIX: False
DO_AUTOAUG: True
DO_FLIP: True
DO_PAD: True
FLIP_PROB: 0.5
PADDING: 10
PADDING_MODE: constant
REA:
ENABLED: True
PROB: 0.5
VALUE: [123.675, 116.28, 103.53]
RPT:
ENABLED: False
PROB: 0.5
SIZE_TEST: [384, 128]
SIZE_TRAIN: [384, 128]
KD:
MODEL_CONFIG: ['']
MODEL_WEIGHTS: ['']
MODEL:
BACKBONE:
DEPTH: 50x
FEAT_DIM: 2048
LAST_STRIDE: 1
NAME: build_resnet_backbone
NORM: BN
PRETRAIN: True
PRETRAIN_PATH:
WITH_IBN: True
WITH_NL: True
WITH_SE: False
DEVICE: cuda:0
FREEZE_LAYERS: ['backbone']
HEADS:
CLS_LAYER: circleSoftmax
EMBEDDING_DIM: 0
MARGIN: 0.35
NAME: EmbeddingHead
NECK_FEAT: after
NORM: BN
NUM_CLASSES: 0
POOL_LAYER: gempoolP
SCALE: 64
WITH_BNNECK: True
LOSSES:
CE:
ALPHA: 0.2
EPSILON: 0.1
SCALE: 1.0
CIRCLE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
COSFACE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
FL:
ALPHA: 0.25
GAMMA: 2
SCALE: 1.0
NAME: ('CrossEntropyLoss', 'TripletLoss')
TRI:
HARD_MINING: True
MARGIN: 0.0
NORM_FEAT: False
SCALE: 1.0
META_ARCHITECTURE: Baseline
PIXEL_MEAN: [123.675, 116.28, 103.53]
PIXEL_STD: [58.395, 57.120000000000005, 57.375]
QUEUE_SIZE: 8192
WEIGHTS: logs/market1501/sbs_R50-ibn/model_best.pth
OUTPUT_DIR: logs/market1501/sbs_R50-ibn
SOLVER:
BASE_LR: 0.00035
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 20
DELAY_EPOCHS: 30
ETA_MIN_LR: 7e-07
FP16_ENABLED: False
FREEZE_FC_ITERS: 0
FREEZE_ITERS: 1000
GAMMA: 0.1
HEADS_LR_FACTOR: 1.0
IMS_PER_BATCH: 64
MAX_EPOCH: 60
MOMENTUM: 0.9
NESTEROV: True
OPT: Adam
SCHED: CosineAnnealingLR
STEPS: [40, 90]
WARMUP_FACTOR: 0.1
WARMUP_ITERS: 2000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0005
WEIGHT_DECAY_BIAS: 0.0005
TEST:
AQE:
ALPHA: 3.0
ENABLED: False
QE_K: 5
QE_TIME: 1
EVAL_PERIOD: 10
FLIP_ENABLED: False
IMS_PER_BATCH: 128
METRIC: cosine
PRECISE_BN:
DATASET: Market1501
ENABLED: False
NUM_ITER: 300
RERANK:
ENABLED: False
K1: 20
K2: 6
LAMBDA: 0.3
ROC_ENABLED: False
[03/22 11:49:04 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml
[03/22 11:49:04 fastreid.utils.env]: Using a generated random seed 4471883
[03/22 11:49:08 fastreid.engine.defaults]: Model:
Baseline(
(backbone): ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): IBN(
(IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(se): Identity()
)
)
(NL_1): ModuleList()
(NL_2): ModuleList(
(0): Non_local(
(g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
)
(1): Non_local(
(g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1))
)
)
(NL_3): ModuleList(
(0): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
(1): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
(2): Non_local(
(g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(W): Sequential(
(0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
(phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1))
)
)
(NL_4): ModuleList()
)
(heads): EmbeddingHead(
(pool_layer): GeneralizedMeanPoolingP(Parameter containing:
tensor([3.], device='cuda:0', requires_grad=True), output_size=1)
(bottleneck): Sequential(
(0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(classifier): CircleSoftmax(in_features=2048, num_classes=0, scale=64, margin=0.35)
)
)
[03/22 11:49:08 fastreid.utils.checkpoint]: Loading checkpoint from logs/market1501/sbs_R50-ibn/model_best.pth
WARNING [03/22 11:49:09 fastreid.utils.checkpoint]: Skip loading parameter 'heads.classifier.weight' to the model due to incompatible shapes: (751, 2048) in the checkpoint but (0, 2048) in the model! You might want to double check if this is expected.
[03/22 11:49:09 fastreid.utils.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
heads.classifier.weight
[03/22 11:49:09 fastreid.engine.defaults]: Prepare testing set
[03/22 11:49:09 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format:
subset # ids # images # cameras
|:---------|:--------|:-----------|:------------|
| query | 750 | 3368 | 6 |
| gallery | 751 | 15913 | 6 |
[03/22 11:49:09 fastreid.evaluation.evaluator]: Start inference on 19281 images
[03/22 11:49:12 fastreid.evaluation.evaluator]: Inference done 11/151. 0.2056 s / batch. ETA=0:00:28
[03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference time: 0:00:30.367810 (0.207999 s / batch per device, on 1 devices)
[03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:30 (0.205810 s / batch per device, on 1 devices)
[03/22 11:49:47 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format:
[03/22 11:49:47 fastreid.evaluation.testing]: Evaluation results in csv format:
| Dataset | Rank-1 | Rank-5 | Rank-10 | mAP | mINP | metric |
|:-----------|:---------|:---------|:----------|:------|:-------|:---------|
| Market1501 | 93.94 | 97.57 | 98.28 | 81.89 | 48.57 | 87.91 |

@sijun-zhou
Copy link
Author

@L1aoXingyu
hi, L1aoXingyu. If you have time, could u plz have a look for this problems?
Thanks in advance!

@gmt710
Copy link

gmt710 commented Mar 22, 2021

hello, have you loaded pretrained model successfully ?
you can join wechat group.
#354

@L1aoXingyu
Copy link
Member

L1aoXingyu commented Mar 23, 2021

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo.
If you use 2 GPUs, you need to tune batch size twice.

@sijun-zhou
Copy link
Author

sijun-zhou commented Mar 23, 2021

hello, have you loaded pretrained model successfully ?
you can join wechat group.
#354

hi gmt710 ,
you can have a look for my train log pasted above. It shows that the training using pretrain model.
"[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth".

And i pasted the snippet of the above log here. You can have a check, including missing keys and keys that not used:

######################################################
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint:
NL_2.0.g.{weight, bias}
NL_2.0.W.0.{weight, bias}
NL_2.0.W.1.{weight, bias, running_mean, running_var}
NL_2.0.theta.{weight, bias}
NL_2.0.phi.{weight, bias}
NL_2.1.g.{weight, bias}
NL_2.1.W.0.{weight, bias}
NL_2.1.W.1.{weight, bias, running_mean, running_var}
NL_2.1.theta.{weight, bias}
NL_2.1.phi.{weight, bias}
NL_3.0.g.{weight, bias}
NL_3.0.W.0.{weight, bias}
NL_3.0.W.1.{weight, bias, running_mean, running_var}
NL_3.0.theta.{weight, bias}
NL_3.0.phi.{weight, bias}
NL_3.1.g.{weight, bias}
NL_3.1.W.0.{weight, bias}
NL_3.1.W.1.{weight, bias, running_mean, running_var}
NL_3.1.theta.{weight, bias}
NL_3.1.phi.{weight, bias}
NL_3.2.g.{weight, bias}
NL_3.2.W.0.{weight, bias}
NL_3.2.W.1.{weight, bias, running_mean, running_var}
NL_3.2.theta.{weight, bias}
NL_3.2.phi.{weight, bias}
[03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model:
fc.{weight, bias}
######################################################

@sijun-zhou
Copy link
Author

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo.
If you use 2 GPUs, you need to tune batch size twice.

@L1aoXingyu Hi, L1aoXingyu,
I have tested with 1 GPU, which got nearly the same result as you posted in the model zone. Thank you very much!

BTW. I don't quite understand what does "you need to tune batch size twice" mean, if I want to use 2 GPUs. Could you plz give me a more specific guidelines or description? Thanks a lot!

@L1aoXingyu
Copy link
Member

L1aoXingyu commented Mar 23, 2021

It means if you want to train a model with 2 GPUs, you need to tune the batch size from 64 to 128.

@sky186
Copy link

sky186 commented Mar 28, 2021

@L1aoXingyu
最新代码训练多卡训练测试问题
1、2卡训练,batch to 256,训练没有问题,但是测试的时候,返回的结果是空的, 单卡测试正常,
2、超参数问题 Freeze 和 warmup 是 迭代数?, 根据自己的数据量和batch 计算出 iter, 是不是通常计算到10个epoch的迭代数, 因为超参数的其他好像是 epoch 数量, 就这两个参数好像 是迭代数, 有歧义,可以说明一下?

@L1aoXingyu
Copy link
Member

@sky186

  1. 这个我明天测试一下;
  2. freeze 和 warmup 按 iter 设置更加合理,在一些比较大数据集的训练 setting 里面,比如 face recognition,总的 epoch 就跑 16次,所以不可能按照 epoch 去设置 warmup,可能设置的 warmup 次数小于 1 个 epoch,所以更加合理的方式是直接设置一个 training iter,配置文件里面也很清楚是 WARMUP_ITERMAX_EPOCH

@sky186
Copy link

sky186 commented Mar 29, 2021

@L1aoXingyu
您好,请问最新的代码 数据处理 到提取特征部分和之前有哪里不同吗?
因为之前的版本抽取了一个提取特征的代码接口, 正确,测试结果正确
这里我换成最新版本训练的模型和config 提取特征后,测试结果完全不正确,其他设置都是一致的,代码有点多不知道怎么找那些可能修改。

@L1aoXingyu
Copy link
Member

L1aoXingyu commented Mar 29, 2021

@sky186 是不是 model 没有 load 进去呢?
另外我测试了一下,多卡测试是可以跑的,多卡测试时,只会在主进程返回结果

@sky186
Copy link

sky186 commented Mar 30, 2021

@L1aoXingyu
1、嗯是的,经检查,模型参数的加载这边没有真的加载成功,做了修改,现在好了,超级感谢~
2、谢谢您的回复, 多卡的时候测试结果返回空, 在 defaults.py/ def test(cls,cfg,model,evaluators=None ) 这里有个测试的results , 多卡的时候这里返回是空的。 您说的主进程返回结果,大概是在哪里尼

@L1aoXingyu
Copy link
Member

@sky186 你从哪里拿的测试返回结果?


这里的代码表示非主进程,返回空的 {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants