Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce DDAIG #6

Closed
KaiyangZhou opened this issue Jul 9, 2020 · 10 comments
Closed

Reproduce DDAIG #6

KaiyangZhou opened this issue Jul 9, 2020 · 10 comments

Comments

@KaiyangZhou
Copy link
Owner

KaiyangZhou commented Jul 9, 2020

I received an email saying the current code cannot reproduce the results of DDAIG on PACS. I haven't run DDAIG using Dassl so I'm not sure if there is an issue.

I've attached the original log files which contain the information on versions of libraries, the environmental setting, and the exact parameters used in the paper. Hope this could help. Please check this google drive link. As DDAIG was done in early 2019, at that time I was using torch=0.4.1 and numpy=1.14.5. Not sure if this will cause an issue. If there is really an issue with reproduction, it's also possible that there was sth wrong when I transferred DDAIG's code to this public Dassl repo (I'll double check this).

Please note that DDAIG was named ddap in the log files. Some parameters' names are different from Dassl's, this is because the original code was a baby-version of Dassl. But they should be easy to understand.

I'll find time and resources to run DDAIG using this code (pls bear with me).

@FrancescoCappio
Copy link

Hi!
From what I understand DDAIG code based on Dassl does not use the validation splits of training sets. Is this correct? The paper in fact does not mention any use of the validation split to perform model selection (unless I missed something). I am asking this because looking at your log files I noticed that DDAP code used to evaluate the performance after each epoch also on the validation set. Was this evaluation used to choose the best model?

@KaiyangZhou
Copy link
Owner Author

You're correct. The validation set was not used for training. Only the training split is used, which follows this paper.

For DDAP (DDAIG) we evaluated lambda on the test domain (as shown in table 5), just wanted to show the real impact on test data. In another paper we used the val set for hyperparam selection.

We reported performance just using the last-epoch model. It's kinda weird to use val performance as a metric for model selection in DG as val data come from source domains and a higher val result might mean overfitting, so ...

The val performance is only printed in the old version code. In Dassl, you need to set TEST.SPLIT = val in order to evaluate a model on the validation set.

@FrancescoCappio
Copy link

I noticed 2 differences in the parameters between your logs and standard parameters here in Dassl: DDAIG.LMDA and DDAIG.WARMUP, however I don't think the difference in performance is caused by these small differences as, even using your values, I can't reproduce your performance. I also noticed that ddap.lmda_p takes different values in your logs: 0.1 for art_paiting and photo and 0.5 for sketch and cartoon. Should I also use these different values for my experiments?

@KaiyangZhou
Copy link
Owner Author

Yes, try setting INPUT.PIXEL_MEAN=[0.5, 0.5, 0.5] (same for PIXEL_STD), and using exactly the same hyperparams in the log files

@FrancescoCappio
Copy link

I tried setting both (pixel_mean and pixel_std) but I still cannot reproduce your performance

@KaiyangZhou
Copy link
Owner Author

KaiyangZhou commented Jul 13, 2020

so what results did you get exactly?

can you show the std as well?

@FrancescoCappio
Copy link

I am attaching logs for run 0 here. Results are:

run art cartoon sketch photo
run 0 79.39 75.85 75.64 95.27
run 1 79.05 75.04 69.12 94.07

art_painting_log.txt
cartoon_log.txt
photo_log.txt
sketch_log.txt

@KaiyangZhou
Copy link
Owner Author

@FrancescoCappio cool, thx, I'll have a look!

@KaiyangZhou
Copy link
Owner Author

KaiyangZhou commented Jul 22, 2020

To follow the old version code, you need to use input mean of 0 and input std of 1 so that the pixel value will be ranged between [0, 1] (I gave the wrong information, so sorry about that).

The pixel value range is important as the FCN's output is squashed in [-1, 1]. See this.

I ran this Dassl code and used a higher lmda=1.5 (to reproduce the effect when using imagenet statistics, we need to accordingly increase lmda).

I did experiment only on the art and sketch domains as they are the most challenging ones, I got

run art sketch
1 84.62 76.43
2 82.13 75.48

for the cartoon and photo domains, lmda values might be different

hope this would help

@KaiyangZhou
Copy link
Owner Author

just tried using mean of 0 and std of 1 to make the pixel range fall in [0, 1]

on art, I used the default lmda=0.3, I got

run1: 83.20
run2: 85.11
run3: 82.86

avg: 83.72
std: 0.99

more runs should be done to reduce the variance to get a fairer number

different domains might need a different lmda, that's the tricky part, like for sketch, a higher weight is favored, e.g. 0.5 or 0.7

I've updated the config files to add the new pixel mean and std values

I'm closing this issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants