Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about perferance. #4

Closed
XuZhengzhuo opened this issue Dec 9, 2021 · 3 comments
Closed

A question about perferance. #4

XuZhengzhuo opened this issue Dec 9, 2021 · 3 comments

Comments

@XuZhengzhuo
Copy link

A great job. Your work solves a wider range of LT problems.

But I m confused with TADE performance on the vanilla LT test set.

Actually, with the same backbone and training strategy, the following methods adopt almost the same loss, but the top-1 ACC varies, for example on CIFAR100-LT-IR-100:

  • ICLR'21 logit adjustment [43.89% cf. origin paper Tab.3 ]
  • CVPR'21 LADE without test prior [45.6% cf. this paper Tab.8(a)]
  • NeurIPS'20 Balanced Softmax which can be rewritten as Eq.3 in this paper [46.1% cf. this paper Tab.8(a)]

In such a situation, TADE should get the best performance when the expert E2 (Eq.3 in this paper) mainly works. If so, it should not outperform the above methods by a large margin, right?

However, the TADE's top-1 ACC is 49.8% (cf. this paper Tab.8(a)) and the weight of experts is [0.40 0.35 0.24] (cf. this paper Tab. 12). The E1 mainly works.

So I just wondering how to explain the improvement of TADE on the vanilla test dataset?

@Vanint
Copy link
Owner

Vanint commented Dec 9, 2021

Hi, thanks very much for your attention to our work.

In such a situation, TADE should get the best performance when the expert E2 (Eq.3 in this paper) mainly works.

Yes. As shown in the below table (c.f. Table 11 in the appendix), the uniform expert E2 performs the best.

image

However, the TADE's top-1 ACC is 49.8% (cf. this paper Tab.8(a)) and the weight of experts is [0.40 0.35 0.24] (cf. this paper Tab. 12). The E1 mainly works.

This is quite an interesting phenomenon. We also considered this question before, when we saw the learned weights are not equal on the uniform test distribution of some datasets. It does not meet our initial expectation that the three weights should be roughly equal and the overall performance is the same as the average ensemble (i.e., without using our test-time self-supervised aggregation strategy). However, as shown in the below table (c.f. Table 13 in appendix), the performance on the uniform test distribution of CIFAR100-LT-100 is improved by 0.4% by our test aggregation strategy.

image

Therefore, we thought it is not a technical issue that generally leads to performance degradation. Here, we give one potential speculation for this phenomenon. As shown in the above first table, the average performance of the forward expert E1 on all classes is higher than that of the backward expert E3 on CIFAR100-LT-100. Considering this difference, the optimal weighting scheme on the uniform test distribution may not necessarily be the average ensemble; instead, it may mean a better trade-off among these three experts. Following this perspective, although there is no data number imbalance on the uniform test distribution, the results show that our test-time aggregation strategy can adaptively achieve a better trade-off among experts that leads to better overall performance. Such a phenomenon, demonstrating a potential advantage of our test-time aggregation strategy on the uniform test distribution, also surprises us.

In addition, since the performance improvement on the uniform test distribution by our test-time aggregation strategy is slight, it is okay to directly use the average ensemble (w/o the test-time aggregation) in practice, if you know the actual test distribution is uniform in advance. Happy to discuss further.

@XuZhengzhuo
Copy link
Author

Thank you for your patient and detailed reply! This may be worth further study. It seems that just a multi-expert architecture can improve LT without special aggregation strategies. It is amazing. Anyway, many thanks again!

@Vanint
Copy link
Owner

Vanint commented Dec 10, 2021

It seems that just a multi-expert architecture can improve LT.

Yes, this has been demonstrated. You may refer to a recent survey of deep long-tailed learning (https://arxiv.org/pdf/2110.04596.pdf) for more related work of ensemble-based LT (c.f. Sec 3.3.4 and Sec 4 in this survey). Note that the proposed skill-diverse multi-expert framework in this paper has shown superiority to a simple multi-expert architecture like RIDE, which the previous state-of-the-art method.

without special aggregation strategies.

Aggregation is still important, especially when you face the test-agnostic LT scenarios that are more practical. Also, I agree with your opinion that this is worth further studying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants