Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Jupyter Notebook 1,095 70 Updated Jan 7, 2025

PacktPublishing / LLM-Engineers-Handbook

The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

Python 3,036 632 Updated Mar 8, 2025

dottxt-ai / prompts

A prompting library

Python 155 9 Updated Sep 27, 2024

huggingface / dataspeech

Python 352 54 Updated Sep 3, 2024

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Python 915 237 Updated Oct 31, 2024

arcee-ai / DistillKit

An Open Source Toolkit For LLM Distillation

Python 554 66 Updated Jan 7, 2025

shengliu66 / ICV

Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Python 167 9 Updated Feb 13, 2025

YorkieDev / DocuMind

Python 14 3 Updated Jul 28, 2024

kyegomez / tree-of-thoughts

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

Python 4,472 365 Updated Oct 29, 2024

Nicolas-Yax / PhyloLM

Genetics for Language Models

Jupyter Notebook 13 2 Updated Jul 1, 2024

huggingface / cosmopedia

Python 504 44 Updated Nov 20, 2024

andyrdt / refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 196 45 Updated Oct 1, 2024

magpie-align / magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Python 663 60 Updated Mar 17, 2025

cognitivecomputations / spectrum

Python 112 20 Updated Sep 24, 2024

lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Python 770 94 Updated Mar 20, 2025

JinjieNi / MixEval

The official evaluation suite and dynamic data release for MixEval.

Python 233 40 Updated Nov 10, 2024

UpstageAI / evalverse

The Universe of Evaluation. All about the evaluation for LLMs.

Python 223 25 Updated Jul 9, 2024

FailSpy / abliterator

Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens

Python 439 63 Updated Jun 11, 2024

symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Go 159 7 Updated Mar 27, 2025

huggingface / llm-swarm

Manage scalable open LLM inference endpoints in Slurm clusters

Python 253 25 Updated Jul 11, 2024

prometheus-eval / prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Python 893 55 Updated Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maxime Labonne mlabonne

Achievements

Achievements

Block or report mlabonne

Stars

Sumandora / remove-refusals-with-transformers

SmallDoges / small-thoughts

mlfoundations / evalchemy

vossenwout / llm-rpg

MinishLab / semhash

MITDeepLearning / introtodeeplearning

OpenCoder-llm / OpenCoder-llm

AnswerDotAI / fastdata

arcee-ai / DAM

huggingface / evaluation-guidebook