Are agentic systems in scope for your leaderboard?

I suspect it is easy to beat the scores you are getting (and maybe even get closer to 100 %) by multi-turn and agentic systems like [LLMLingua-2](https://llmlingua.com/llmlingua2.html) or [GraphReader](https://arxiv.org/pdf/2406.14550).

Aggregating tricks, and understanding how to get to an acceptable performance by an LLM, seems important to someone building a system in production.

Would you consider accepting submissions of such agentic systems in your leaderboard? In particular, if you do, it would be interesting to include information on total tokens consumed/number of consecutive steps taken as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are agentic systems in scope for your leaderboard? #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Are agentic systems in scope for your leaderboard? #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions