The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
-
Updated
Mar 27, 2025 - Python
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Evaluate your LLM's response with Prometheus and GPT4 💯
🤠 Agent-as-a-Judge and DevAI dataset
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
CodeUltraFeedback: aligning large language models to coding preferences
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
xVerify: Efficient Answer Verifier for Large Language Model Evaluations
Harnessing Large Language Models for Curated Code Reviews
Root Signals Python SDK
A set of examples demonstrating how to evaluate Generative AI augmented systems using traditional information retrieval and LLM-As-A-Judge validation techniques
A comprehensive study of the LLM-as-a-judge paradigm in a controlled setup that reveals new results about its strengths and weaknesses.
The official repository for our ACL 2024 paper: Are LLM-based Evaluators Confusing NLG Quality Criteria?
LLM-as-judge evals as Semantic Kernel Plugins
A set of tools to create synthetically-generated data from documents
Explore techniques to use small models as jailbreaking judges
Notebooks for evaluating LLM based applications using the Model (LLM) as a judge pattern.
Add a description, image, and links to the llm-as-a-judge topic page so that developers can more easily learn about it.
To associate your repository with the llm-as-a-judge topic, visit your repo's landing page and select "manage topics."