LLM Eval Benchmark v1.0.0

Latest

changyufei222 released this 10 Jun 11:38

· 2 commits to main since this release

v1.0.0

de0a528

Fixed-set Direct-versus-RAG benchmark release with controlled model protocols, promoted result summaries, stability analysis, reproducibility guidance, bilingual navigation, and citation metadata.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Eval Benchmark v1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!