OhemCrossEntropyLoss explanation #3434

EthanAbitbol3 · 2023-08-04T10:01:51Z

问题确认 Search before asking

我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

Hello and thank you for your work! I had a question regarding the hyperparameters of OhemCrossEntropy. By reading the code and documentation, I understood that there are 3 hyperparameters, including "threshold," which determines from what probability we consider an example difficult or not (e.g., thresh = 0.8, if we predict 0.6 = difficult example, 0.9 = valid example).

However, I'm having trouble understanding "min_kept." I don't quite grasp how it works. What is the difference between setting it to 10000 or 130000?

Regarding the "ignore_index," I'm confused because I'm getting an error with 255, whereas my annotations are binary, either 0 (background) or 1 (person), and there is no 255.

It would be amazing if someone could shed some light on the subject for me!

Asthestarsfalll · 2023-08-04T13:45:42Z

The purpose of min_kept in the code is to guarantee that there are sufficient pixels available for computing loss. I think it's particularly useful during the early stages of model training when the model is unable to generate predictions that contain clear difficult examples(They might be meaningless, just some random predictions. without min_kept, only fewer pixels are able to exceed threshold, and further impact the convergence speed of model).

Regarding the "ignore_index," I'm confused because I'm getting an error with 255, whereas my annotations are binary, either 0 (background) or 1 (person), and there is no 255.

The tensor label must have a data type of int64 to avoid errors with the value 255. The ignore_index parameter is employed to exclude certain classes, such as background and irrelevant task-related classes, from being used in loss computation.

EthanAbitbol3 · 2023-08-04T14:03:43Z

Thank you for your help !
I have a better understanding now

ToddBear · 2023-08-07T06:56:12Z

以上回答已经充分解答了问题，如果有新的问题欢迎随时提交issue，或者在此条issue下继续回复～
我们开启了飞桨套件的ISSUE攻关活动，欢迎感兴趣的开发者参加：PaddlePaddle/PaddleOCR#10223

bit-scientist · 2023-08-29T06:14:52Z

@Asthestarsfalll thanks for the explanation of ignore_index. Let me ask you a couple of qustions here.

What if the model trained with n (# classes) + bg has better performance than the model trained with n only?
Is it always safe to set bg to 255 (i.e. ignore it)?
If the model trained with bg ignored, then how is it going to behave when tested against an image that has background in it? Is it going to treat the background pixels just as ordinary pixels that may have equal probability to be classified as one of n classes?

Thank you in advance!

Asthestarsfalll · 2023-08-29T09:52:37Z

@Asthestarsfalll thanks for the explanation of ignore_index. Let me ask you a couple of qustions here.

What if the model trained with n (# classes) + bg has better performance than the model trained with n only?

Is it always safe to set bg to 255 (i.e. ignore it)?

If the model trained with bg ignored, then how is it going to behave when tested against an image that has background in it? Is it going to treat the background pixels just as ordinary pixels that may have equal probability to be classified as one of n classes?

Thank you in advance!

I think it should be the same(do not use bg) as pervious works for a fair comparision. The reason is hard to explain, I guess it's due to the prediction of foreground and backgroud are mutual exclusive after softmax/argmax, and the area of bg are larger than fg, so that the metics(miou/pixel acc) getting better. if not use bg for train, the bg will not be calculated in metrics.
Setting label to 255 means exclude some irrelevant task-related classes in loss computation. So it's contingent on you task, if you don't want to predict bg, do it.
The bg will be ignored as well.

EthanAbitbol3 added the question Further information is requested label Aug 4, 2023

ToddBear assigned Asthestarsfalll Aug 7, 2023

ToddBear mentioned this issue Aug 7, 2023

CV套件建设专项活动 #3333

Closed

ToddBear closed this as completed Aug 7, 2023

bit-scientist mentioned this issue Sep 1, 2023

Bad predictions even though good validation result. #3483

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OhemCrossEntropyLoss explanation #3434

OhemCrossEntropyLoss explanation #3434

EthanAbitbol3 commented Aug 4, 2023

Asthestarsfalll commented Aug 4, 2023

EthanAbitbol3 commented Aug 4, 2023

ToddBear commented Aug 7, 2023

bit-scientist commented Aug 29, 2023

Asthestarsfalll commented Aug 29, 2023

OhemCrossEntropyLoss explanation #3434

OhemCrossEntropyLoss explanation #3434

Comments

EthanAbitbol3 commented Aug 4, 2023

问题确认 Search before asking

请提出你的问题 Please ask your question

Asthestarsfalll commented Aug 4, 2023

EthanAbitbol3 commented Aug 4, 2023

ToddBear commented Aug 7, 2023

bit-scientist commented Aug 29, 2023

Asthestarsfalll commented Aug 29, 2023