AIGTatt

AI-generated tattoos for tattoo recognition experiments.

This repository contains the code and notebooks behind AIGTatt: From AI-Generated Tattoos to Enhanced Biometric Recognition. The main idea is simple: public tattoo datasets are scarce, so this project uses generative AI to create synthetic tattoo images, turns them into many realistic variations, and trains tattoo retrieval models on that synthetic data.

The project has two main parts:

Generate a synthetic tattoo dataset
- Generate tattoo prompts with GPT.
- Generate tattoo images with SDXL / fine-tuned SDXL through Replicate.
- Crop the generated tattoos.
- Create real-world-style variations with rotation, scale, brightness, noise, and other augmentations.
Train and evaluate tattoo retrieval models
- Train an embedding model with EfficientNetV2 or Swin backbones.
- Use ArcFace loss to learn identity-like tattoo embeddings.
- Evaluate on tattoo retrieval tasks using verification, open-set identification, and closed-set identification metrics.

Project Context

The report introduces AIGTatt, a fully synthetic tattoo dataset built with diffusion models. In the final study, the dataset contains:

250 unique tattoo images
5,000 generated variations
5,250 total images

The retrieval models were trained on the synthetic data and tested against real-world tattoo databases such as WebTattoo and BIVTatt. The best runs reached strong closed-set retrieval performance, including top-20 identification rates around 95% on WebTattoo and 98% on BIVTatt.

The goal is not to claim synthetic data fully replaces real tattoo data. The point is to show that AI-generated tattoos can help bootstrap biometric tattoo recognition systems when real datasets are limited, private, or difficult to share.

Repository Structure

.
|-- README.md
|-- requirements.txt
|-- prompts.txt
|-- images-to-train/
|-- images-to-train.zip
|-- gen-prompts.ipynb
|-- gen-images-sdxl.ipynb
|-- gen-images-sdxl-fine.ipynb
|-- fine-tune-model.ipynb
|-- process-images.ipynb
`-- tattoo-retrieval-working/
    |-- README.md
    |-- train.py
    |-- test.py
    |-- run_training.sh
    |-- run_testing.sh
    |-- train/
    `-- utils/
        |-- callbacks.py
        |-- data_module.py
        |-- dataset.py
        |-- losses.py
        |-- model.py
        |-- preprocessors/
        `-- ploters/

What Each Part Does

Prompt generation

gen-prompts.ipynb creates tattoo prompts using the OpenAI API. The prompts follow a controlled structure, for example:

In the style of TOK, a unique, intricate tattoo on the forearm describing an eternal madness.

The generated prompts are stored in prompts.txt.

Image generation

There are two image-generation notebooks:

gen-images-sdxl.ipynb generates tattoos with base SDXL.
gen-images-sdxl-fine.ipynb generates tattoos with the fine-tuned SDXL tattoo model.

The fine-tuned SDXL model was trained using a small curated set of tattoo images from different artists and styles. The purpose was to make outputs more tattoo-like, realistic, and consistent than generic image-generation models.

Fine-tuning

fine-tune-model.ipynb shows the Replicate fine-tuning flow for SDXL. It uses images-to-train.zip as the training image bundle.

Image processing and augmentation

process-images.ipynb prepares generated images for retrieval training:

Crops or zooms into the tattoo area.
Creates multiple image variations.
Simulates real-world capture conditions such as rotation, scaling, brightness shifts, and noise.

The expected training layout is one folder per tattoo identity:

train/
|-- Tattoo-ID-1/
|   |-- original.png
|   |-- cropped.png
|   |-- variation_0.png
|   `-- ...
|-- Tattoo-ID-2/
|   `-- ...
`-- ...

Retrieval training

The retrieval code lives in tattoo-retrieval-working/.

The main training entry point is:

python train.py \
  --data_dir ./train \
  --runs_dir ./output \
  --backbone efficientnet_v2_s \
  --num_features 512 \
  --M 0.5 \
  --batch_size 64 \
  --max_epochs 100 \
  --val_split 0.2

Supported backbones:

mobilenet_v3_large
resnet101
densenet121
efficientnet_v2_s
swin_s

The model turns each tattoo image into an embedding vector. During training, ArcFace encourages images from the same tattoo identity to be close together and images from different identities to be separated.

Retrieval testing

The testing entry point is:

python test.py \
  --output_dir ./test-output \
  --images_dir ./test \
  --checkpoint_folder ./output/efficientnet_v2_s_0.5_0.5_512 \
  --csv_file ./results.csv \
  --backbone efficientnet_v2_s \
  --num_features 512 \
  --M 0.5

The test dataset should also be organized as one folder per tattoo identity:

test/
|-- Tattoo-ID-1/
|   |-- image1.jpg
|   `-- image2.jpg
|-- Tattoo-ID-2/
|   `-- ...
`-- ...

The evaluation code reports metrics such as:

Verification EER
FMR / FNMR-style operating points
Open-set identification EER
Closed-set Rank-1, Rank-10, and Rank-20 identification rates

Setup

Create an environment and install the base dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

The retrieval code expects Python 3.8+ and uses PyTorch Lightning, TorchVision, PyEER, and segmentation-models-pytorch.

Some notebooks need extra packages that are not fully captured in requirements.txt, including:

openai
replicate
python-dotenv
opencv-python
imgaug
spacy
scikit-learn

For the API-based notebooks, create a local .env file with your own keys:

OPENAI_API_KEY=...
OPENAI_ORGANIZATION=...
OPENAI_PROJECT=...
REPLICATE_API_KEY=...

Do not commit API keys or personal tokens.

Typical Workflow

Curate tattoo style images.
Fine-tune SDXL on those images.
Generate tattoo prompts.
Generate synthetic tattoo images from the prompts.
Crop the tattoo regions.
Generate variations for each tattoo identity.
Train the retrieval model on the synthetic dataset.
Test the trained model on real tattoo datasets.
Compare retrieval metrics across backbones and hyperparameters.

Report Results Summary

The report compared several generative models for tattoo generation:

DALL-E 3
Stable Diffusion 2.1
Stable Diffusion 3
SDXL
Fine-tuned SDXL

The fine-tuned SDXL model was selected because it produced the strongest qualitative results for tattoo-specific prompts, especially for realism, accuracy, and visual quality.

For retrieval, the main tested backbones were:

EfficientNetV2
Swin

The strongest closed-set Rank-1 results reported were:

WebTattoo: 83.29% with Swin
BIVTatt: 91.70% with Swin

When considering the top 20 candidates, performance increased to approximately:

WebTattoo: 95%
BIVTatt: 98%

These results suggest that fully synthetic tattoo data can be useful for training tattoo retrieval systems, although larger and more diverse datasets would likely improve generalization.

Known Rough Edges

This is a research codebase, so a few parts are still rough:

Some shell scripts contain hard-coded local paths and should be edited before running.
The notebooks were written for experimentation, not as a clean production pipeline.
requirements.txt is minimal and does not include every notebook dependency.
Some evaluation code appears to come from an older model variant, so check tensor outputs carefully before extending the testing callbacks.
The top-level README is the main project overview; tattoo-retrieval-working/README.md is the original retrieval-only README.

Citation / Reference

If you use this project, cite the related report:

Diogo Carvalho.
AIGTatt: From AI-Generated Tattoos to Enhanced Biometric Recognition.

The dataset link referenced in the report:

https://bit.ly/4bxk052

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIGTatt

Project Context

Repository Structure

What Each Part Does

Prompt generation

Image generation

Fine-tuning

Image processing and augmentation

Retrieval training

Retrieval testing

Setup

Typical Workflow

Report Results Summary

Known Rough Edges

Citation / Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images-to-train		images-to-train
tattoo-retrieval-working		tattoo-retrieval-working
tmp		tmp
.gitignore		.gitignore
README.md		README.md
fine-tune-model.ipynb		fine-tune-model.ipynb
gen-images-sdxl-fine.ipynb		gen-images-sdxl-fine.ipynb
gen-images-sdxl.ipynb		gen-images-sdxl.ipynb
gen-prompts.ipynb		gen-prompts.ipynb
images-to-train.zip		images-to-train.zip
process-images.ipynb		process-images.ipynb
prompts.txt		prompts.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AIGTatt

Project Context

Repository Structure

What Each Part Does

Prompt generation

Image generation

Fine-tuning

Image processing and augmentation

Retrieval training

Retrieval testing

Setup

Typical Workflow

Report Results Summary

Known Rough Edges

Citation / Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages