Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During the evaluation phase, a warning error occurred stating that only support batch size equal to 1 #107

Open
EricZavier opened this issue Dec 12, 2023 · 8 comments

Comments

@EricZavier
Copy link

QQ图片20231212111948
During the evaluation phase, a warning error occurred stating that only support batch size equal to 1. Here is the command I used:
CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py train \
--conf_files ./configs/seem/samvitb_unicl_lang_v1.yaml
--overrides
FP16 True
COCO.INPUT.IMAGE_SIZE 1024
MODEL.DECODER.HIDDEN_DIM 512
MODEL.ENCODER.CONVS_DIM 512
MODEL.ENCODER.MASK_DIM 512
TEST.BATCH_SIZE_TOTAL 4
TRAIN.BATCH_SIZE_TOTAL 16
TRAIN.BATCH_SIZE_PER_GPU 4
SOLVER.MAX_NUM_EPOCHS 1
SOLVER.BASE_LR 0.0001
SOLVER.FIX_PARAM.backbone True
SOLVER.FIX_PARAM.lang_encoder True
SOLVER.FIX_PARAM.pixel_decoder True
MODEL.DECODER.COST_SPATIAL.CLASS_WEIGHT 5.0
MODEL.DECODER.COST_SPATIAL.MASK_WEIGHT 2.0
MODEL.DECODER.COST_SPATIAL.DICE_WEIGHT 2.0
MODEL.DECODER.TOP_SPATIAL_LAYERS 10
MODEL.DECODER.SPATIAL.ENABLED True
MODEL.DECODER.GROUNDING.ENABLED True
FIND_UNUSED_PARAMETERS True
ATTENTION_ARCH.SPATIAL_MEMORIES 32
MODEL.DECODER.SPATIAL.MAX_ITER 5
ATTENTION_ARCH.QUERY_NUMBER 3
STROKE_SAMPLER.MAX_CANDIDATE 10
MODEL.BACKBONE.PRETRAINED ./xdecoder_data/pretrained/sam_vit_b_01ec64.pth
WEIGHT True
RESUME_FROM ./xdecoder_data/pretrained/focalb_lang_unicl.pt

@CrazyLenmon
Copy link

I got the same question.

@EricZavier
Copy link
Author

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

@CrazyLenmon
Copy link

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

@EricZavier
Copy link
Author

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

@MaureenZOU
Copy link
Collaborator

Evaluation with 1-gpu is because if we concatenate images in a single batch, e.g. one image with [512, 1024], another image with [1024, 512], the concatenated batch would be [2, 1024, 1024], padding so much zero will largely influence the performance.

@MaureenZOU
Copy link
Collaborator

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/
  2. For the VOC len(batch_size) problem, please change the config e.g. , you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

@EricZavier
Copy link
Author

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/
  2. For the VOC len(batch_size) problem, please change the config e.g.
    , you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

Thank you for your patient answer.If I have 4 GPU devices, should I use VOC TEST BATCH_ SIZE_ Total set to 4?

@MaureenZOU
Copy link
Collaborator

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/

  2. For the VOC len(batch_size) problem, please change the config e.g.

    , you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

Thank you for your patient answer.If I have 4 GPU devices, should I use VOC TEST BATCH_ SIZE_ Total set to 4?

Yes, exactly, that is how many total test size, and on each gpu it would automatically load len(batch) = BATCH_SIZE_TOTAL/NUM_GPUS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants