Benchmark Results #31
Replies: 5 comments 2 replies
-
Beta Was this translation helpful? Give feedback.
-
|
Reread on LiveBench
|
Beta Was this translation helpful? Give feedback.
-
|
Scatter plot showing the scaling of test time compute with different approaches with gpt-4o-mini on AIME 2024
You can see the original illustration here |
Beta Was this translation helpful? Give feedback.
-
|
Results on the FRAMES benchmark with the memory plugin.
|
Beta Was this translation helpful? Give feedback.
-
|
Results on AIME 2024 benchmark with optillm (eval script) AIME (2024) pass@1
|
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Entropy Decoding and CoT Decoding on GSM8k with Qwen2.5-0.5B-Instruct Model

Beta Was this translation helpful? Give feedback.
All reactions