A doubt about the SOC process #62

rose-jinyang · 2021-02-08T08:19:53Z

Hello
How are you?
Thanks for contributing to MODNet project.
I have a question about the SOC process.
I know that the SOC process is to refine alpha matte by using the predicted segmentation on unlabeled samples.

If so, I think that this is based on the supposition that the predicted segmentation result on even unlabeled(unseen) samples are always good but the alpha mattes are NOT good relatively.
Then what is the guarantee that the segmentation result on unlabeled(unseen) images by MODNet is always good?

ZHKKKe · 2021-02-08T13:37:31Z

Hi, thanks for your attention.

The key idea behinds SOC is consistency. For unlabeled data, we do not know whether the predicted semantic mask and the predicted alpha matte are correct. However, based on the smoothness assumption in semi-/self-supervised learning, forcing them to have the consistent predictions can help the network learn from the unlabeled data. If you want to learn more about the theory behined SOC, I suggest you refer to the papers about the consistency-based semi-supervised learning.

yarkable · 2021-02-28T08:01:36Z

As mentioned in paper, SOC strategy can boost the results. So may I ask that is the dataset for SOC training&testing being a sequence frame from a video? If not, could you give some advice on which type of data is suitable for SOC ? @ZHKKKe

ZHKKKe · 2021-03-02T03:07:54Z

@yarkable
Nope. You can use the data from any specific domain for SOC, e.g., the frames from WebCam, the photos shot by a certain phone...
You just need to ensure (1) all samples for SOC come from the same data domain; (2) the difference between the data used for SOC and the labeled data for training is not too big (e.g., you cannot use cartoon images for SOC...).

yarkable · 2021-03-02T09:54:24Z

Got it! Thanks.

xafha · 2021-03-13T06:35:46Z

@ZHKKKe
对于SOC策略还有有些不理解。
比如以预测的matte作为真值，那么预测的semantic应该与其保持一致，有以下情况：
当matte预测结果是正确的，那迫使semantic与其保持一致是合理，若matte预测的结果是错误的，同样迫使semantic一起保持一致。由于在unlabel数据上训练，并不知道预测的matte是否正确，那么对于本身预测就是错的matte，还要迫使semantic与其一起？如何理解这是否合理呢，请帮忙解释一下吧，谢谢。
迫使matte与semantic保持一致，我理解是相似的过程。

还有请问我在本地CPU上基于SOC策略训练loss是正常的，但当我进行多卡GPU训练，损失值变为NAN，请问你能给我些建议吗？我使用的是Keras版本框架。

ZHKKKe · 2021-03-15T06:01:41Z

@xafha

你好，对于你的问题：
Q1: 当matte预测结果是正确的，那迫使semantic与其保持一致是合理，若matte预测的结果是错误的，同样迫使semantic一起保持一致。由于在unlabel数据上训练，并不知道预测的matte是否正确，那么对于本身预测就是错的matte，还要迫使semantic与其一起
SOC的背后原理来自于semi-/self-supervised learning中的consistency constraint. 概括来说，预测的matte和预测的semantics会存在不一致（可以理解为有较高的熵）。而添加一致性约束会让他们的输出趋于一致（熵减），这个过程中，对于一个收敛的网络，大多数的像素会往正确的方向优化，因此，即便一些像素的错误预测会带来扰动，最终SOC还是会让网络往更好的方向收敛。关于这方面的信息可以参阅一些semi-supervised learning的论文，例如Mean Teacher, Dual Student 。如果还有疑问请随时提出。

Q2: 本地CPU上基于SOC策略训练loss是正常的，但当我进行多卡GPU训练，损失值变为NAN，请问你能给我些建议吗
对于这个问题我不是很确定。在我的设置下，SOC时batch size=1。如果你使用多卡，请检查是否在读取数据上出错。

xafha · 2021-03-18T09:37:31Z

@ZHKKKe
你好，请问代码中的gt_matte是从DataLoader中获取的，gt_matte是否将0～255的数据做了归一化处理，比如x/255.0这样？
我比较疑惑的地方是gt_semantic是用来表示语义信息，那么应该是0或者1,从代码中来看gt_semantic应该是平滑后的gt_matte，数据范围是0.0～1.0之间？
gt_semantic = F.interpolate(gt_matte, scale_factor=1/16, mode='bilinear')
gt_semantic = blurer(gt_semantic)
期待回答，谢谢

yarkable · 2021-03-18T09:51:23Z

@xafha I think It's not a value which is 0 or 1, but a value between 0 and 1. Because it can get smoother value.

ZHKKKe · 2021-03-19T11:32:58Z

@xafha
gt_semantic的范围是0~1之间的。blurer的作用是删除gt_semantic中的细节部分。
语义信息可以表示为分类（像素值为0或1）或者回归（像素值为0到1），这两种表示对应的损失函数不同，但是最后的效果差不多。

xafha · 2021-03-23T12:27:26Z

@xafha
gt_semantic的范围是0~1之间的。blurer的作用是删除gt_semantic中的细节部分。
语义信息可以表示为分类（像素值为0或1）或者回归（像素值为0到1），这两种表示对应的损失函数不同，但是最后的效果差不多。

`## 1. bgr2rgb
Image = cv2.cvtColor(cv2.imread(image_name), cv2.COLOR_BGR2RGB)
matte = cv2.imread(matte_name, 0)

2. TODO: 插值方法 + padding value, 512x512

image = self._aspect_preserving_resize(image, cv2.INTER_LINEAR, (127, 127, 127))
matte = self._aspect_preserving_resize(matte, cv2.INTER_LINEAR, (0, 0, 0))

3. TODO: augmentation

image, matte = self._random_flip(image, matte)

4. normalize

image = image.astype(np.float32) / 255.0
image = (image - 0.5 ) / 0.5
matte = matte.astype('float32') / 255.0

5. trimap

trimap = self.gen_trimap(matte)

6. gaussianblur semantic

semantic = cv2.resize(matte, target_size, interpolation=cv2.INTER_LINEAR)
semantic = cv2.GaussianBlur(semantic, (self.kernel_size, self.kernel_size), 0)

7.detial

boundaries = (trimap < 0.5) + (trimap > 0.5)
boundaries = np.where(boundaries, False, True).astype('float32')

1. calculate the semantic loss(16x)

loss = tf.square(gt_semantic - pred_semantic)
loss = tf.reduce_mean(loss)

2. calculate the detail loss

gt_detail = boundaries * gt_matte
pred_boundary_detail = boundaries * pred_detail
loss = tf.abs(gt_detail - pred_boundary_detail)
loss = tf.reduce_mean(loss)

3. calculate the matte loss

pred_boundary_matte = boundaries * pred_matte
gt_detial = boundaries * gt_matte
matte_l1_loss = tf.abs(gt_matte - pred_matte) + 4.0 * tf.abs(gt_detial - pred_boundary_matte)
matte_compositional_loss = tf.abs(image * gt_matte - image * pred_matte) + \
                          4.0 * tf.abs(image * gt_detial - image * pred_boundary_matte)
loss = matte_l1_loss + matte_compositional_loss
loss = tf.reduce_mean(loss)`

@ZHKKKe

请问你帮忙看一下有什么问题吗，我训练后的pred_semantic是纯灰色的图，得到的pred_detail, pred_matte也不是alpha图像。

ZHKKKe · 2021-04-01T04:51:54Z

@xafha
I tried to read your code. However, I am sorry that it is difficult to find certain problems in this way.
I recommend you to visualize all the main intermediate variables for debugging.

amirgoren · 2021-04-08T13:15:20Z

@xafha

Did you solve the nan issue on cuda?
I'm running with the Pytorch version and facing the same issue.

BTW
@ZHKKKe
How does the loss curve during SOC stage should look like?
Can you tell what size of the dataset you used during SOC? (and for how many epochs?)

ZHKKKe · 2021-04-09T02:41:56Z

@amirgoren
For your questions:
Q1: How does the loss curve during SOC stage should look like?
The training loss will decrease slowly.

Q2: Can you tell what size of dataset you used during SOC? (epoch?)
We use all frames from 400 video clips for SOC. We train about 10 epochs.

upperblacksmith · 2021-04-22T05:28:53Z

@ZHKKKe
对于SOC策略还有有些不理解。
比如以预测的matte作为真值，那么预测的semantic应该与其保持一致，有以下情况：
当matte预测结果是正确的，那迫使semantic与其保持一致是合理，若matte预测的结果是错误的，同样迫使semantic一起保持一致。由于在unlabel数据上训练，并不知道预测的matte是否正确，那么对于本身预测就是错的matte，还要迫使semantic与其一起？如何理解这是否合理呢，请帮忙解释一下吧，谢谢。
迫使matte与semantic保持一致，我理解是相似的过程。

还有请问我在本地CPU上基于SOC策略训练loss是正常的，但当我进行多卡GPU训练，损失值变为NAN，请问你能给我些建议吗？我使用的是Keras版本框架。

老哥，你的这个loss为NAN的问题解决了吗

ZHKKKe closed this as completed Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A doubt about the SOC process #62

A doubt about the SOC process #62

rose-jinyang commented Feb 8, 2021

ZHKKKe commented Feb 8, 2021

yarkable commented Feb 28, 2021 •

edited

ZHKKKe commented Mar 2, 2021

yarkable commented Mar 2, 2021

xafha commented Mar 13, 2021

ZHKKKe commented Mar 15, 2021

xafha commented Mar 18, 2021

yarkable commented Mar 18, 2021

ZHKKKe commented Mar 19, 2021

xafha commented Mar 23, 2021

ZHKKKe commented Apr 1, 2021

amirgoren commented Apr 8, 2021 •

edited

ZHKKKe commented Apr 9, 2021

upperblacksmith commented Apr 22, 2021

A doubt about the SOC process #62

A doubt about the SOC process #62

Comments

rose-jinyang commented Feb 8, 2021

ZHKKKe commented Feb 8, 2021

yarkable commented Feb 28, 2021 • edited

ZHKKKe commented Mar 2, 2021

yarkable commented Mar 2, 2021

xafha commented Mar 13, 2021

ZHKKKe commented Mar 15, 2021

xafha commented Mar 18, 2021

yarkable commented Mar 18, 2021

ZHKKKe commented Mar 19, 2021

xafha commented Mar 23, 2021

2. TODO: 插值方法 + padding value, 512x512

3. TODO: augmentation

4. normalize

5. trimap

6. gaussianblur semantic

7.detial

1. calculate the semantic loss(16x)

2. calculate the detail loss

3. calculate the matte loss

ZHKKKe commented Apr 1, 2021

amirgoren commented Apr 8, 2021 • edited

ZHKKKe commented Apr 9, 2021

upperblacksmith commented Apr 22, 2021

yarkable commented Feb 28, 2021 •

edited

amirgoren commented Apr 8, 2021 •

edited