MindTrial: Evaluate and compare AI language models (LLMs) on text-based tasks. Supports multiple providers (OpenAI, Google, Anthropic, DeepSeek), custom tasks in YAML, and HTML/CSV reports.
nlp opensource openai customizable yaml-configuration golang-cli mozilla-public-license html-reports csv-reports anthropic ai-tool deepseek llm-evaluation-framework google-gemini-ai llm-benchmarking language-models-ai llm-comparison ai-benchmark ai-evaluation-tools ai-model-comparison
-
Updated
Mar 21, 2025 - Go