Memebench

Memebench is a public preference benchmark for LLM-generated memes. Models are given a real news headline, asked to make a classic image-macro meme, and ranked from blind public A/B votes.

Live benchmark: memebench.net

Memebench is not a scientific benchmark. It measures one narrow thing: whether a model can make a news meme that survives contact with public voters.

How it works

Recent English-language headlines are collected from public RSS feeds and filtered for meme potential.
Eligible models receive the same headline context and the same meme-template snapshot.
Each model has to inspect templates, render drafts through the benchmark tools (that use imgflip for meme rendering), and submit exactly one meme.
Visitors vote blind between two memes for the same headline.
Accepted votes are converted into pairwise comparisons and fitted into a rolling leaderboard.

The leaderboard uses a Bradley-Terry preference model over recent votes, with separate handling for ties, “both bad” votes, and model-accountable generation failures.

See docs/BENCHMARK.md for the full methodology.

Reading the leaderboard

A higher rank means voters recently preferred that model's submitted memes more often after accounting for opponent strength and model-accountable non-submissions.

It does not mean the model is better at general reasoning, coding, factual accuracy, safety, long-form writing, or anything else outside this benchmark.

Content note: generated memes may be crude, political, dark, rude, wrong, tasteless, or simply unfunny. “Both bad” is part of the voting interface for a reason.

Repository layout

apps/web              Public Svelte app and admin UI
apps/api              HTTP API
apps/worker           Daily headline, generation, and ranking jobs
packages/core         Shared server-side config, DB, repositories, and jobs
packages/contracts    Shared API schemas and TypeScript types
packages/e2e          Mock-stack end-to-end tests
packages/e2e-mocks    Mock provider services for E2E and local E2E runs
docs/                 Project documentation

Running locally

Requirements: Node.js 24.x, pnpm 10.x, and Docker with Docker Compose.

cp .env.example .env
./local.sh up

Then open:

public site: http://localhost:5173
leaderboard: http://localhost:5173/leaderboard
API health check: http://localhost:3000/healthz

The local stack can run without real production secrets. Live meme generation requires API credentials for OpenRouter and imgflip.

For local code checks, the main commands are:

See docs/DEVELOPMENT.md for setup details, local stack modes, and test commands.

Documentation

Contributing

Memebench is currently a one-person side project. Issues and small focused pull requests are welcome, especially for bugs, documentation, tests, UI polish, and benchmark-methodology clarity.

Please keep changes scoped and run the relevant checks before opening a pull request.

Support

Memebench has recurring upkeep costs: server hosting, storage, and daily model inference.

Support is possible through Buy Me a Coffee. You don't get anything for your money though, beyond me continuing to being able to run this page :)

License

This repository is licensed under the MIT License, except for the vendored font files under apps/web/static/fonts; see the font-specific license files there.

AI usage transparency note

Memebench is developed with the help of agentic AI coding tools. All code is reviewed, edited, tested, and maintained by me. All decisions and any mistakes are mine.

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
.github		.github
.husky		.husky
apps		apps
docker		docker
docs		docs
packages		packages
scripts		scripts
test		test
.dependency-cruiser.cjs		.dependency-cruiser.cjs
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.env.production.example		.env.production.example
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
docker-compose.e2e.yml		docker-compose.e2e.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
knip.json		knip.json
local.sh		local.sh
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memebench

How it works

Reading the leaderboard

Repository layout

Running locally

Documentation

Contributing

Support

License

AI usage transparency note

About

Uh oh!

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Memebench

How it works

Reading the leaderboard

Repository layout

Running locally

Documentation

Contributing

Support

License

AI usage transparency note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages