Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training setups (tested with different GPUs) #47

Open
emilyemliyM opened this issue Mar 18, 2022 · 6 comments
Open

Training setups (tested with different GPUs) #47

emilyemliyM opened this issue Mar 18, 2022 · 6 comments
Labels
good first issue Good for newcomers

Comments

@emilyemliyM
Copy link

Dear author,

Thanks for the sharing code.

I'm trying to reproduce the metrics from the paper, but haven't been successful yet.
I would like to ask about some training parameters and hardware equipment for the experiment?
Regarding the indicators such as iou in the paper, do you mean miou or just the iou of the moving class?

Thanks!

@Chen-Xieyuanli
Copy link
Member

Chen-Xieyuanli commented Mar 18, 2022

Hey @mengshiyu0109, the training parameters used for the paper are as default. We tested on Quard4000, 5000, 6000, RTX2080ti, and TITAN and got similar results.

IoU reported in our paper is the one for moving objects only.

Note that the 62 IoU performance was got by adding KNN and semantics. Without semantics, the performance is around 58 IoU on the test set. You may first check whether you enable the KNN in the config file or not.

@MaxChanger could you please also share your setups of training LMNet here?

@MaxChanger
Copy link
Contributor

Yeah,
Hi @mengshiyu0109, I have trained and tested LMNet on 3*2080Ti and 3090, and can generally achieve similar accuracy as reported in the paper.
Maybe, I think you can try to set the batch_size in salsanext_mos.yml to 24, and then use 3*2080Ti or more GPU cards with slightly smaller memory (guarantee that bs=24).

In addition, the IoU in the paper should refer specifically to MovingIoU, but saving checkpoints during training is based on mean_IoU (average static and moving).

By the way, there may be non-deterministic in this code, you can set the following flags

def set_seed(seed=1024):
    random.seed(seed)
    # os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

@emilyemliyM
Copy link
Author

Yeah, Hi @mengshiyu0109, I have trained and tested LMNet on 32080Ti and 3090, and can generally achieve similar accuracy as reported in the paper. Maybe, I think you can try to set the batch_size in salsanext_mos.yml to 24, and then use 32080Ti or more GPU cards with slightly smaller memory (guarantee that bs=24).

In addition, the IoU in the paper should refer specifically to MovingIoU, but saving checkpoints during training is based on mean_IoU (average static and moving).

By the way, there may be non-deterministic in this code, you can set the following flags

def set_seed(seed=1024):
    random.seed(seed)
    # os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

really thanks,
I still in the training mode, BTW, I just focus on the moving class iou,however, it just about 20% I got, So I haven't try the test part,

According the reply, I will try again, now.
I have more confidence about the topic now, since I have tried several method But I can not got the beautiful metrics about moving class.

Thanks.

@Chen-Xieyuanli
Copy link
Member

@MaxChanger Thanks for the report!

@mengshiyu0109 you may first check whether you can generate similar results with our pre-trained model to see whether the setup is correct or not.

@Chen-Xieyuanli Chen-Xieyuanli added the good first issue Good for newcomers label Mar 18, 2022
@Chen-Xieyuanli Chen-Xieyuanli changed the title traininr parameters,thanks Training setups (tested with different GPUs) Mar 18, 2022
@emilyemliyM
Copy link
Author

@MaxChanger Thanks for the report!

@mengshiyu0109 you may first check whether you can generate similar results with our pre-trained model to see whether the setup is correct or not.

thanks!!
Thanks a lot for your reply. I would like to ask, during the training process, what is the value of miou you obtained during training? Then go to start the test.

@MaxChanger
Copy link
Contributor

Hi, @mengshiyu0109. During my training, best_val_iou in tensorboard should be around 0.84 in epoch ~120 (or I guess greater than 0.82 should be fine). Also, the non-deterministic may cause some fluctuations.
After this, you can use python infer.py xxxx to generate predicted labels and use python utils/evaluate_mos.py xxx to evaluate. The moving IoU in valid set should be around 0.60 (0.59~0.618).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants