Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OhemCrossEntropyLoss explanation #3434

Closed
1 task done
EthanAbitbol3 opened this issue Aug 4, 2023 · 5 comments
Closed
1 task done

OhemCrossEntropyLoss explanation #3434

EthanAbitbol3 opened this issue Aug 4, 2023 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@EthanAbitbol3
Copy link

问题确认 Search before asking

  • 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

Hello and thank you for your work! I had a question regarding the hyperparameters of OhemCrossEntropy. By reading the code and documentation, I understood that there are 3 hyperparameters, including "threshold," which determines from what probability we consider an example difficult or not (e.g., thresh = 0.8, if we predict 0.6 = difficult example, 0.9 = valid example).

However, I'm having trouble understanding "min_kept." I don't quite grasp how it works. What is the difference between setting it to 10000 or 130000?

Regarding the "ignore_index," I'm confused because I'm getting an error with 255, whereas my annotations are binary, either 0 (background) or 1 (person), and there is no 255.

It would be amazing if someone could shed some light on the subject for me!

@EthanAbitbol3 EthanAbitbol3 added the question Further information is requested label Aug 4, 2023
@Asthestarsfalll
Copy link
Contributor

The purpose of min_kept in the code is to guarantee that there are sufficient pixels available for computing loss. I think it's particularly useful during the early stages of model training when the model is unable to generate predictions that contain clear difficult examples(They might be meaningless, just some random predictions. without min_kept, only fewer pixels are able to exceed threshold, and further impact the convergence speed of model).

Regarding the "ignore_index," I'm confused because I'm getting an error with 255, whereas my annotations are binary, either 0 (background) or 1 (person), and there is no 255.

The tensor label must have a data type of int64 to avoid errors with the value 255. The ignore_index parameter is employed to exclude certain classes, such as background and irrelevant task-related classes, from being used in loss computation.

@EthanAbitbol3
Copy link
Author

Thank you for your help !
I have a better understanding now

@ToddBear
Copy link
Collaborator

ToddBear commented Aug 7, 2023

以上回答已经充分解答了问题,如果有新的问题欢迎随时提交issue,或者在此条issue下继续回复~
我们开启了飞桨套件的ISSUE攻关活动,欢迎感兴趣的开发者参加:PaddlePaddle/PaddleOCR#10223

@ToddBear ToddBear closed this as completed Aug 7, 2023
@bit-scientist
Copy link

@Asthestarsfalll thanks for the explanation of ignore_index. Let me ask you a couple of qustions here.

  1. What if the model trained with n (# classes) + bg has better performance than the model trained with n only?
  2. Is it always safe to set bg to 255 (i.e. ignore it)?
  3. If the model trained with bg ignored, then how is it going to behave when tested against an image that has background in it? Is it going to treat the background pixels just as ordinary pixels that may have equal probability to be classified as one of n classes?

Thank you in advance!

@Asthestarsfalll
Copy link
Contributor

@Asthestarsfalll thanks for the explanation of ignore_index. Let me ask you a couple of qustions here.

  1. What if the model trained with n (# classes) + bg has better performance than the model trained with n only?
  2. Is it always safe to set bg to 255 (i.e. ignore it)?
  3. If the model trained with bg ignored, then how is it going to behave when tested against an image that has background in it? Is it going to treat the background pixels just as ordinary pixels that may have equal probability to be classified as one of n classes?

Thank you in advance!

  1. I think it should be the same(do not use bg) as pervious works for a fair comparision. The reason is hard to explain, I guess it's due to the prediction of foreground and backgroud are mutual exclusive after softmax/argmax, and the area of bg are larger than fg, so that the metics(miou/pixel acc) getting better. if not use bg for train, the bg will not be calculated in metrics.
  2. Setting label to 255 means exclude some irrelevant task-related classes in loss computation. So it's contingent on you task, if you don't want to predict bg, do it.
  3. The bg will be ignored as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants