Skip to content

anotherjesse/diffusiondb-api

Repository files navigation

DiffusionDB-API

This project is a typescript based HTTP API to provide an interface for exploring poloclub/diffusiondb dataset.

DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users.

DiffusionDB 2 Million Images

About the API

The API is deployed at https://diffusiondb-api.fly.dev

Methods

GET /search?q={QUERY}&l={LIMIT}

Returns a list of images that match the query. The query is a string that is matched against the prompt text. The limit is the number of images returned.

The query uses SQLITE3 full-text search syntax. See https://www.sqlite.org/fts5.html#full_text_query_syntax for more information.

GET /stats

returns the total number of images in the dataset (and only works if the database is properly configured)

Data Fields

  • key: Unique image name
  • p: Prompt
  • se: Random seed
  • c: CFG Scale (guidance scale)
  • st: Steps
  • sa: Sampler

local dev

node server.js

deploying

flyctl deploy

Ideas

  • deploy api to fly via github actions
  • openapi spec on /docs, info on /
  • have a help page
  • link to a blog entry about it

SQLITE database

You shouldn't need to create your own. The sqlite database is checked into the repo using LFS.

download the original data

Grab the dataset from poloclub/diffusiondb.

git clone https://huggingface.co/datasets/poloclub/diffusiondb

After downloading the 2TB of data, you only need the json files...

build the sqlite database

python build_sqlite.py data.db ~/path/to/jsons

Dream Schema

"id TEXT PRIMARY KEY",
"p text",
"se integer",
"c integer",
"st integer",
"sa text",

Create FTS5 index

CREATE VIRTUAL TABLE dreams_fts USING fts5 (p, content=dreams, tokenize = 'porter ascii'
INSERT INTO dreams_fts (p) SELECT (p) FROM dreams

query FTS5 index

select * from prompts where rowid in (SELECT rowid FROM prompts_fts WHERE prompts_fts MATCH "unicorn" ORDER BY rank limit 50);

About

nodejs http api for diffusiondb

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published