Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A doubt about the SOC process #62

Closed
rose-jinyang opened this issue Feb 8, 2021 · 14 comments
Closed

A doubt about the SOC process #62

rose-jinyang opened this issue Feb 8, 2021 · 14 comments

Comments

@rose-jinyang
Copy link

Hello
How are you?
Thanks for contributing to MODNet project.
I have a question about the SOC process.
I know that the SOC process is to refine alpha matte by using the predicted segmentation on unlabeled samples.

image

If so, I think that this is based on the supposition that the predicted segmentation result on even unlabeled(unseen) samples are always good but the alpha mattes are NOT good relatively.
Then what is the guarantee that the segmentation result on unlabeled(unseen) images by MODNet is always good?

@ZHKKKe
Copy link
Owner

ZHKKKe commented Feb 8, 2021

Hi, thanks for your attention.

The key idea behinds SOC is consistency. For unlabeled data, we do not know whether the predicted semantic mask and the predicted alpha matte are correct. However, based on the smoothness assumption in semi-/self-supervised learning, forcing them to have the consistent predictions can help the network learn from the unlabeled data. If you want to learn more about the theory behined SOC, I suggest you refer to the papers about the consistency-based semi-supervised learning.

@yarkable
Copy link
Contributor

yarkable commented Feb 28, 2021

As mentioned in paper, SOC strategy can boost the results. So may I ask that is the dataset for SOC training&testing being a sequence frame from a video? If not, could you give some advice on which type of data is suitable for SOC ? @ZHKKKe

@ZHKKKe
Copy link
Owner

ZHKKKe commented Mar 2, 2021

@yarkable
Nope. You can use the data from any specific domain for SOC, e.g., the frames from WebCam, the photos shot by a certain phone...
You just need to ensure (1) all samples for SOC come from the same data domain; (2) the difference between the data used for SOC and the labeled data for training is not too big (e.g., you cannot use cartoon images for SOC...).

@yarkable
Copy link
Contributor

yarkable commented Mar 2, 2021

Got it! Thanks.

@xafha
Copy link

xafha commented Mar 13, 2021

@ZHKKKe
对于SOC策略还有有些不理解。
比如以预测的matte作为真值,那么预测的semantic应该与其保持一致,有以下情况:
当matte预测结果是正确的,那迫使semantic与其保持一致是合理,若matte预测的结果是错误的,同样迫使semantic一起保持一致。由于在unlabel数据上训练,并不知道预测的matte是否正确,那么对于本身预测就是错的matte,还要迫使semantic与其一起?如何理解这是否合理呢,请帮忙解释一下吧,谢谢。
迫使matte与semantic保持一致,我理解是相似的过程。

还有请问我在本地CPU上基于SOC策略训练loss是正常的,但当我进行多卡GPU训练,损失值变为NAN,请问你能给我些建议吗?我使用的是Keras版本框架。

@ZHKKKe
Copy link
Owner

ZHKKKe commented Mar 15, 2021

@xafha

你好,对于你的问题:
Q1: 当matte预测结果是正确的,那迫使semantic与其保持一致是合理,若matte预测的结果是错误的,同样迫使semantic一起保持一致。由于在unlabel数据上训练,并不知道预测的matte是否正确,那么对于本身预测就是错的matte,还要迫使semantic与其一起
SOC的背后原理来自于semi-/self-supervised learning中的consistency constraint. 概括来说,预测的matte和预测的semantics会存在不一致(可以理解为有较高的熵)。而添加一致性约束会让他们的输出趋于一致(熵减),这个过程中,对于一个收敛的网络,大多数的像素会往正确的方向优化,因此,即便一些像素的错误预测会带来扰动,最终SOC还是会让网络往更好的方向收敛。关于这方面的信息可以参阅一些semi-supervised learning的论文,例如Mean Teacher, Dual Student 。如果还有疑问请随时提出。

Q2: 本地CPU上基于SOC策略训练loss是正常的,但当我进行多卡GPU训练,损失值变为NAN,请问你能给我些建议吗
对于这个问题我不是很确定。在我的设置下,SOC时batch size=1。 如果你使用多卡,请检查是否在读取数据上出错。

@xafha
Copy link

xafha commented Mar 18, 2021

@ZHKKKe
你好,请问代码中的gt_matte是从DataLoader中获取的,gt_matte是否将0~255的数据做了归一化处理,比如x/255.0这样?
我比较疑惑的地方是gt_semantic是用来表示语义信息,那么应该是0或者1,从代码中来看gt_semantic应该是平滑后的gt_matte,数据范围是0.0~1.0之间?
gt_semantic = F.interpolate(gt_matte, scale_factor=1/16, mode='bilinear')
gt_semantic = blurer(gt_semantic)
期待回答,谢谢

@yarkable
Copy link
Contributor

@xafha I think It's not a value which is 0 or 1, but a value between 0 and 1. Because it can get smoother value.

@ZHKKKe
Copy link
Owner

ZHKKKe commented Mar 19, 2021

@xafha
gt_semantic的范围是0~1之间的。blurer的作用是删除gt_semantic中的细节部分。
语义信息可以表示为分类(像素值为0或1)或者回归(像素值为0到1),这两种表示对应的损失函数不同,但是最后的效果差不多。

@xafha
Copy link

xafha commented Mar 23, 2021

@xafha
gt_semantic的范围是0~1之间的。blurer的作用是删除gt_semantic中的细节部分。
语义信息可以表示为分类(像素值为0或1)或者回归(像素值为0到1),这两种表示对应的损失函数不同,但是最后的效果差不多。

`## 1. bgr2rgb
Image = cv2.cvtColor(cv2.imread(image_name), cv2.COLOR_BGR2RGB)
matte = cv2.imread(matte_name, 0)

2. TODO: 插值方法 + padding value, 512x512

image = self._aspect_preserving_resize(image, cv2.INTER_LINEAR, (127, 127, 127))
matte = self._aspect_preserving_resize(matte, cv2.INTER_LINEAR, (0, 0, 0))

3. TODO: augmentation

image, matte = self._random_flip(image, matte)

4. normalize

image = image.astype(np.float32) / 255.0
image = (image - 0.5 ) / 0.5
matte = matte.astype('float32') / 255.0

5. trimap

trimap = self.gen_trimap(matte)

6. gaussianblur semantic

semantic = cv2.resize(matte, target_size, interpolation=cv2.INTER_LINEAR)
semantic = cv2.GaussianBlur(semantic, (self.kernel_size, self.kernel_size), 0)

7.detial

boundaries = (trimap < 0.5) + (trimap > 0.5)
boundaries = np.where(boundaries, False, True).astype('float32')


1. calculate the semantic loss(16x)

loss = tf.square(gt_semantic - pred_semantic)
loss = tf.reduce_mean(loss)

2. calculate the detail loss

gt_detail = boundaries * gt_matte
pred_boundary_detail = boundaries * pred_detail
loss = tf.abs(gt_detail - pred_boundary_detail)
loss = tf.reduce_mean(loss)

3. calculate the matte loss

pred_boundary_matte = boundaries * pred_matte
gt_detial = boundaries * gt_matte
matte_l1_loss = tf.abs(gt_matte - pred_matte) + 4.0 * tf.abs(gt_detial - pred_boundary_matte)
matte_compositional_loss = tf.abs(image * gt_matte - image * pred_matte) + \
                          4.0 * tf.abs(image * gt_detial - image * pred_boundary_matte)
loss = matte_l1_loss + matte_compositional_loss
loss = tf.reduce_mean(loss)`

@ZHKKKe

请问你帮忙看一下有什么问题吗,我训练后的pred_semantic是纯灰色的图,得到的pred_detail, pred_matte也不是alpha图像。

@ZHKKKe
Copy link
Owner

ZHKKKe commented Apr 1, 2021

@xafha
I tried to read your code. However, I am sorry that it is difficult to find certain problems in this way.
I recommend you to visualize all the main intermediate variables for debugging.

@amirgoren
Copy link

amirgoren commented Apr 8, 2021

@xafha

Did you solve the nan issue on cuda?
I'm running with the Pytorch version and facing the same issue.

BTW
@ZHKKKe
How does the loss curve during SOC stage should look like?
Can you tell what size of the dataset you used during SOC? (and for how many epochs?)

@ZHKKKe
Copy link
Owner

ZHKKKe commented Apr 9, 2021

@amirgoren
For your questions:
Q1: How does the loss curve during SOC stage should look like?
The training loss will decrease slowly.

Q2: Can you tell what size of dataset you used during SOC? (epoch?)
We use all frames from 400 video clips for SOC. We train about 10 epochs.

@upperblacksmith
Copy link

@ZHKKKe
对于SOC策略还有有些不理解。
比如以预测的matte作为真值,那么预测的semantic应该与其保持一致,有以下情况:
当matte预测结果是正确的,那迫使semantic与其保持一致是合理,若matte预测的结果是错误的,同样迫使semantic一起保持一致。由于在unlabel数据上训练,并不知道预测的matte是否正确,那么对于本身预测就是错的matte,还要迫使semantic与其一起?如何理解这是否合理呢,请帮忙解释一下吧,谢谢。
迫使matte与semantic保持一致,我理解是相似的过程。

还有请问我在本地CPU上基于SOC策略训练loss是正常的,但当我进行多卡GPU训练,损失值变为NAN,请问你能给我些建议吗?我使用的是Keras版本框架。

老哥,你的这个loss为NAN的问题解决了吗

@ZHKKKe ZHKKKe closed this as completed Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants