Since Danbooru2021 haven't been updated for a while, I decided to extract the latest (by 2023-11-30) dataset from the public cloud storage of Danbooru and work on the data processing pipeline with PostgresSQL.
You could download the compressed dataset from huggingface.
- provide a parquet version of the dataset