You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What will happen if H2O also uses Clustering via Pooling when comparing? It seems that Clustering via Pooling can improve the effectiveness of such drop token methods.
The text was updated successfully, but these errors were encountered:
As we stated in the paper, the generated answers are very query-dependent. So evicting KV during generation may introduce losses of information. Given a high-level example, if a user gives the model a book, the first question is about the first chapter, and the model evicts other parts. The user queries about the last chapter, the model will have very limited knowledge about the answer.
Pooling is a very interesting observation since the model will perform perfectly on easier tasks like the original haystack task without pooling. But when you switch to more challenging tasks, the method with pooling is significantly better than the one without pooling.
Just a guess.
What will happen if H2O also uses Clustering via Pooling when comparing? It seems that Clustering via Pooling can improve the effectiveness of such drop token methods.
The text was updated successfully, but these errors were encountered: