Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about results from with my own dataset #75

Open
xhh232018 opened this issue Jul 5, 2018 · 8 comments
Open

Questions about results from with my own dataset #75

xhh232018 opened this issue Jul 5, 2018 · 8 comments
Labels
good first issue Good for newcomers

Comments

@xhh232018
Copy link

xhh232018 commented Jul 5, 2018

Hi, Jiahui! After an one-week training on a GTX 1080TI, I found some interesting results from my own dataset.
There are 2 kinds of images in my dataset. One is the images with clearly texture like this:
7147
image
The inpainting results of this kinds of images are semantic plausible:
7182_ip
demo16
demo17
7147_ip
Also, there are some images like this that contains more information and structure:
image01
However, the result of this image from my pre-trained model is quite blurry and bad:
2_ip
16
Here are my hypothesis:

  1. The second kind of images is minority in my training set. Should I have to increase the ratio to 1:1? The total number of images in my dataset is 8000.
  2. According to how to set hyper-parameter #21 and How many epoch #53, I should do fine-tuning for my model. Could you please give my suggestions of changes of the hyper-parameters?
    Here are the screenshots from Tensorboard:
    selection_070
    selection_069
    selection_065
    selection_064
@JiahuiYu
Copy link
Owner

JiahuiYu commented Jul 5, 2018

Hi, first thanks for your interest in our work and sharing some of your results. Here are some answers that may help:

  1. The balance of data is important. So it may help if you can increase samples of the second case. You can either collect more examples or do data augmentations like random flipping/rotation/adjusting colors.
  2. To fine-tune a pre-trained model, you do not have to change the hyper-parameters.
  3. More data samples will help since in your case you only have 8k images. Usually I work on at least 30k images (up to 10 millions of images).

@xhh232018
Copy link
Author

OK, I will try to create more training samples. Also, I mean that the pre-trained model is what I trained based my own dataset and your default hyper-parameters setting not the one you provided. Should I change the hyper-parameters if I want to refine it?

@JiahuiYu
Copy link
Owner

JiahuiYu commented Jul 6, 2018

You don't need to change hyper-parameters in my understanding, unless you find some failure cases like ones addressed in issue #53 and #21.

@xhh232018
Copy link
Author

OK. Thanks for your help. I'll try to train a new model based on more training samples ASAP and I will give you my latest results after several days.

@xhh232018
Copy link
Author

@JiahuiYu ,Sorry to bother you again. I want to re-implement the deepfill v2 based on your deepfill v1 since deepfill v1 cannot handle irregular masked images. Thus, I want to confirm that the gated convolution layers are only used in the coarse network. Should I have to replace the vanilla convolution layers with gated convolution layers in the refinement network? Thanks for your help.

@JiahuiYu
Copy link
Owner

Gated convolution are used in both networks. I think it is important to use gated convolution in refinement network as well.

@xhh232018
Copy link
Author

@JiahuiYu Thanks for your help. I see you mentioned it in your paper. Sorry to bother you again, I still need to confirm some changes in Deepfill V2 implementation:
1.You said the input is only the masked image and the encoder-decoder structure is the same as Deepfill V1. This is my understanding of the Gating convolution Layer:
1420426555

Is it right? Therefore, I do not need to concatenate the ones and the mask for the input like this:

image

  1. My implementation is based on Deepfill V1 and you said all vanilla convolution layers are changed to gated convolution layers. How about the last layer since its activation function is none?

image
Should I keep it or change it to gated convolution?

3.In your paper, you said the contextual attention layer is the same as V1. Therefore, should the input for contextual attention layer include the binary mask? (In my opinion, I will put the mask into that layer)

4.The gan loss of deepfill V1 is based on neural gym and you use this setting https://github.com/pfnet-research/sngan_projection/blob/master/updater.py to calculate the gan loss. Can I define this kind of loss in neuralgym?
Thanks for your help and looking forward to your reply.

@JiahuiYu
Copy link
Owner

JiahuiYu commented Jul 14, 2018

@xhh232018 Hi first thanks for your interest and I saw you already carefully read the paper and code. I appreciate. For your questions:

  1. The fig3 in paper shows that masks are also concatenated as input. Also I concatenated the ones. The reason is addressed in issue padding type for generator #40. Sorry I forget to mention concatenating ones in the paper.

  2. Keep the last convolution as it is instead of gated convolution (both the coarse network and refinement network).

  3. The implementation of contextual attention layer need binary mask to indicate which pixels are missing and need to be reconstructed. So your understanding is correct.

  4. Actually I already released the implementation of sn-gan loss in dev branch in neuralgym. :)

@an1018 an1018 mentioned this issue Sep 28, 2018
Repository owner locked as resolved and limited conversation to collaborators Oct 23, 2018
@JiahuiYu JiahuiYu added the good first issue Good for newcomers label Aug 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants