Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have a trained model (checkpoint) for this? #62

Closed
LiuTingWed opened this issue Oct 14, 2019 · 18 comments
Closed

Do you have a trained model (checkpoint) for this? #62

LiuTingWed opened this issue Oct 14, 2019 · 18 comments
Assignees

Comments

@LiuTingWed
Copy link

hi ! this is an amazing work !
I just wanna use this project to do some researches but my PC too slow.
Can you support a trained model for this ?
thanks.

@zhizhangxian
Copy link
Collaborator

Do you mean search or retrain?

@LiuTingWed
Copy link
Author

yes, I am so appriciate if you can support

@zhizhangxian
Copy link
Collaborator

zhizhangxian commented Oct 19, 2019

But Now the search result is not very good. if you want, maybe I can offer a baiduyun drive

@LiuTingWed
Copy link
Author

Never mind, thanks :-)

@LiuTingWed
Copy link
Author

I am so interesting about this paper and your work implement to Pytorch.
But as the figure that you provide implementation result see, it's curve is almost match the paper result.
Why you said the search result is not very good ?

@zhizhangxian
Copy link
Collaborator

miou is lower than paper reports

@LiuTingWed
Copy link
Author

Oh,too strange,
do you figure out this issue should be?
do you finally search the cell architecture as same as the paper?

@zhizhangxian
Copy link
Collaborator

No, now we dont solve this issue.
The architect after search is different from paper reports obviously, If run darts, I dont get the same result as paper too...

@LiuTingWed
Copy link
Author

what a pity!
If you don't mind, please tell me the reason when you figure out.
By the way,what GPU you use to train, P100?

@zhizhangxian
Copy link
Collaborator

in search V100 *1
in retrain 2080ti * 8

@zhizhangxian
Copy link
Collaborator

Hey, boy, we have know get a better result for search, it has 0.34miou

baiduyun drive:https://pan.baidu.com/s/1ASRyzK_0m9CvhfN3yHZZ5Q
passwd:px6y

@LiuTingWed
Copy link
Author

thanks :-)

@Sunshine-Ye
Copy link

Sunshine-Ye commented Jan 11, 2020

in search V100 *1
in retrain 2080ti * 8

hi ! thanks for doing such amazing work !
If it is convenient, can you tell me how long do you need to finish retrain with 2080ti * 8?
and how high performance can be achieved with the derived model in the paper ?
@zhizhangxian

@zhizhangxian
Copy link
Collaborator

I trained for about 1M iters on autodeeplab-M I remembered, not used SDP, and got 79.8miou without MS

@zhizhangxian
Copy link
Collaborator

total train time is about 20 days

@Sunshine-Ye
Copy link

thanks for your quick reply!
1.SDP means the Scheduled Drop Path method?MS means using multi_scale in train?79.8 miou is evaluated under multi-scale or not?
2.what kind of training do you use:python train.py or CUDA_VISIBLE_DEVICES=0,1,2,···,n python -m torch.distributed.launch --nproc_per_node=n train_distributed.py ?
3.20 days for retrain is a bit too long, it Is caused by the training code or the model itself? can you share the direction of optimization?
4.I have tried to retrain,but the retrain default args and train_distributed.py are not properly set. Although I've tuned the code, I don't know if the parameters are set correctly. can you give me some advice?
look like to your reply, sincerely.

@zhizhangxian
Copy link
Collaborator

  1. yes, not under ms I remember, I adapt the retrain code from chenxi's deeplab v3 reproduce, he didnot use MS
  2. if you mean retrain, you should use CUDA_VISIBLE_DEVICES=0,1,2,···,n python -m torch.distributed.launch --nproc_per_node=n train_distributed.py to use distributed training
  3. It is just because too much iterations(more than 1M in paper) and only 30K in deeplab v3plus, you can not get a good result if train only several thousands iters without imageNet pretrain, the retrain configurations are same as deeplab v3/v3+
  4. I retrained it with another code(adapted from chenxi's code), I should have fixed the bugs in our project, but these days I am busy in some other things, but maybe I can pay some attention on it after March...
    Thanks, good luck!

@Sunshine-Ye
Copy link

Sunshine-Ye commented Jan 13, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants