This project is a typescript based HTTP API to provide an interface for exploring poloclub/diffusiondb dataset.
DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users.
The API is deployed at https://diffusiondb-api.fly.dev
Returns a list of images that match the query. The query is a string that is matched against the prompt text. The limit is the number of images returned.
The query uses SQLITE3 full-text search syntax. See https://www.sqlite.org/fts5.html#full_text_query_syntax for more information.
returns the total number of images in the dataset (and only works if the database is properly configured)
- key: Unique image name
- p: Prompt
- se: Random seed
- c: CFG Scale (guidance scale)
- st: Steps
- sa: Sampler
node server.js
flyctl deploy
- deploy api to fly via github actions
- openapi spec on /docs, info on /
- have a help page
- link to a blog entry about it
You shouldn't need to create your own. The sqlite database is checked into the repo using LFS.
Grab the dataset from poloclub/diffusiondb.
git clone https://huggingface.co/datasets/poloclub/diffusiondb
After downloading the 2TB of data, you only need the json files...
python build_sqlite.py data.db ~/path/to/jsons
"id TEXT PRIMARY KEY",
"p text",
"se integer",
"c integer",
"st integer",
"sa text",
CREATE VIRTUAL TABLE dreams_fts USING fts5 (p, content=dreams, tokenize = 'porter ascii'
INSERT INTO dreams_fts (p) SELECT (p) FROM dreams
select * from prompts where rowid in (SELECT rowid FROM prompts_fts WHERE prompts_fts MATCH "unicorn" ORDER BY rank limit 50);