Thanks for the excellent work!
In the codes for equal-token-number calculation, is it correct to use the mean number of visual output tokens across all layers? I think it would be more appropriate to use the mean number of visual input tokens across layers instead. For instance, if pruning occurs after layer 0 (which reduces to N visual tokens), the original calculation method will record the token number in layer 0 as N, not the initial 576.
Thanks for the excellent work again!
Thanks for the excellent work!
In the codes for equal-token-number calculation, is it correct to use the mean number of visual output tokens across all layers? I think it would be more appropriate to use the mean number of visual input tokens across layers instead. For instance, if pruning occurs after layer 0 (which reduces to N visual tokens), the original calculation method will record the token number in layer 0 as N, not the initial 576.
Thanks for the excellent work again!