Text2SVG

🌐 Available languages: English | Русский

A comprehensive toolkit for SVG mining, optimization, captioning, and text-to-SVG generation. The project consists of three main components working together to create a complete pipeline from raw SVG collection to AI-powered SVG generation.

Text2SVG
├── svg-corpus         
│   ├── mining         # Web scraping SVG files
│   ├── optimization   # Two-stage SVG optimization
│   └── captioning     # VLM-based SVG description generation
├── llm4svg            # LLM for SVG generation fine-tuning
└── svg-encoder        # Encoder of SVG

Quick Start

git clone https://github.com/CTLab-ITMO/Text2SVG
cd Text2SVG
pip install .
apt update && apt install libcairo2

SVG Corpus Pipeline

Mining

A tool for mining SVG files from websites using URLs from HuggingFace datasets. Collects approximately 4 million URLs per month, considering deduplication.

Installation

Ensure, you have run quick start installation steps.

cd Text2SVG/svg-corpus/mining
pip install -r requirements.txt

Basic Usage

You can choose any dataset from the HuggingFace collection that includes a column named url containing URLs.

python3 main.py --dataset nhagar/fineweb_urls --output-dir /path/to/output

Advanced Options

Depending on your system you can change arguments:

--max-concurrency (default: 500): Controls the maximum number of simultaneous HTTP requests. This is the key parameter for balancing speed vs. system resources.
--batch-size (default: 50000): Number of URLs processed together before yielding control.
--timeout (default: 1): Maximum time in seconds to wait for each HTTP request. Aggressive timeouts improve throughput by skipping slow sites.

python3 main.py \
  --dataset nhagar/fineweb_urls \
  --column url \
  --output-dir /path/to/output \
  --max-concurrency 500 \
  --batch-size 50000 \
  --timeout 1 \
  --start-offset 0 \
  --debug

Output Format

SVGs are saved in JSONL format, organized by MD5 hash prefixes:

{
  "url": "https://example.com/page",
  "timestamp": "2024-03-21T10:30:45.123456",
  "svg": "<svg>...</svg>"
}

Example Dataset:

Small part of collected SVGs is available via link.

Optimization

Two-stage optimization pipeline for cleaning and normalizing SVG files:

Stage 1: Path conversion, coordinate normalization to [0, 256] scale
Stage 2: SVGO optimization for size reduction and cleanup

Python API

from optimization import optimize_svg_string

# Basic optimization
result = optimize_svg_string(svg_string)

# With custom options
result = optimize_svg_string(
    svg_string=svg_text,
    stage1_opts={
        "normalize_scale": 256.0,
        "cubic_only": True,
        "normalize_to_int": False
    },
    stage2_config_path='./svgo.config.mjs',
    quiet=True
)

Command Line Interface

optimize_dir \
  --input_dir ./raw_svgs \
  --output_dir ./optimized_svgs \
  --cubic_only \
  --normalize_points \
  --normalize_scale 256 \
  --num_threads 4 \
  --svgo_config ./svgo.config.mjs

Optimization Example

Before optimization (648 characters):

<svg version="1.1" xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<title>power</title>
<path d="M10.625 1.681c0-0.345-0.28-0.625-0.625-0.625s-0.625 0.28-0.625 0.625v8.125c0 0.345 0.28 0.625 0.625 0.625s0.625-0.28 0.625-0.625v-8.125z"></path>
<path d="M7.12 2.881c0.318-0.135 0.466-0.502 0.33-0.82s-0.502-0.466-0.82-0.33c-3.156 1.343-5.38 4.436-5.38 8.075 0 4.845 3.905 8.75 8.75 8.75s8.75-3.905 8.75-8.75c0-3.639-2.225-6.732-5.38-8.075-0.318-0.135-0.685 0.013-0.82 0.33s0.013 0.685 0.33 0.82c2.719 1.157 4.62 3.814 4.62 6.925 0 4.155-3.345 7.5-7.5 7.5s-7.5-3.345-7.5-7.5c0-3.111 1.9-5.768 4.62-6.925z"></path>
</svg>

After optimization (343 characters, 47% reduction):

<svg viewBox="0 0 256 256">
  <path d="M136 22Q135 14 128 14T120 22V126Q121 133 128 134 135 133 136 126z"/>
  <path d="M91 37Q98 33 95 26 92 20 85 22C44 39 16 79 16 126 16 188 66 238 128 238S240 188 240 126C240 79 212 39 171 22Q164 20 161 26 158 33 165 37C200 52 224 86 224 126 224 179 181 222 128 222S32 179 32 126C32 86 56 52 91 37"/>
</svg>

Captioning

Vision-language model (VLM) based SVG description generation using LMDeploy. The pipeline supports any vision-language model available through LMDeploy, including InternVL series, Qwen2-VL, and other compatible models.

Installation

Ensure you have run the quick start installation steps, then install the captioning dependencies:

cd Text2SVG/svg-corpus/captioning
pip install lmdeploy timm

Usage

python caption_intern.py \
  --dataset VectorGraphics/svg-corpus \
  --output detailed_captions.jsonl \
  --model OpenGVLab/InternVL3-8B \
  --batch-size 32 \
  --resolution 224 \
  --temperature 0.1 \
  --tp 1 \
  --max-samples 100_000

Output Format

Captions are saved in JSONL format:

{
  "hash": "a1b2c3d4...",
  "caption": "A blue circular icon with a power symbol",
  "source": "InternVL3-2B",
  "prompt": "Describe this object in 10 words or less, concise style. Response in English."
}

LLM Fine-tuning

Current Generation Examples

Training

Fine-tuning is powered by the Unsloth framework.

Prerequisites

cd Text2SVG/llm4svg
pip install unsloth transformers datasets wandb

Training Configuration

The training script (run_training.py) uses the following key parameters:

Base Model: Qwen2.5-Coder-32B-Instruct (configurable)
Dataset: VectorGraphics/svg-corpus
LoRA Configuration: r=128, alpha=32, targeting all attention layers
Training: 1 epoch, batch size 20, gradient accumulation 8 steps
Optimization: AdamW 8-bit with cosine scheduler
Memory: 4-bit quantization with gradient checkpointing

Running Training

python3 run_training.py

Customization

Modify these parameters in run_training.py:

# Model selection
model_name = "Qwen2.5-Coder-32B-Instruct"

# LoRA configuration
r = 128                   # LoRA rank
lora_alpha = 32           # LoRA alpha
lora_dropout = 0          # LoRA dropout

# Training parameters
per_device_train_batch_size = 20
gradient_accumulation_steps = 8
learning_rate = 5e-5
num_train_epochs = 1

Inference

Generate SVG images from text descriptions using the fine-tuned model:

python3 run_eval.py

Example Usage

Edit the prompt in run_eval.py:

inputs = tokenizer([
    inf_prompt.format(
        "A blue circular button with power symbol",  # Your caption
        "",  # Leave empty for generation
    )
], return_tensors="pt").to("cuda")

SVG Encoder

A specialized encoder for vector graphics that creates semantic embeddings of SVG files. The encoder is based on ModernBERT architecture but trained specifically on SVG corpus for better understanding of vector graphics structure and semantics.

Architecture

The SVG encoder follows ModernBERT's architecture with several key adaptations:

Custom tokenizer trained on SVG corpus with discrete tokens for SVG tags, attributes, and coordinate values
Context length support up to 8,192 tokens
Available in two model sizes: base and large

Training Process

Tokenizer Training

The SVG-specific tokenizer was trained using the transformers library on a large SVG corpus. Key features:

Vocabulary size: 50,368 tokens
Special tokens for SVG tags (<svg>, <path>, <rect>, etc.)
Dedicated tokens for SVG attributes (viewBox, fill, stroke, etc.)
Optimized for SVG coordinate values and path data

Model Training

Training follows ModernBERT's staged approach with progressive context length scaling:

Stage	Context Length	Tokens (Base)	Tokens (Large)
Phase 1	1024	12 billion	30 billion
Phase 2	8192	3 billion	9 billion
Phase 3	8192	2 billion	3 billion

Training configurations are available in svg-encoder/training/configs/.

Evaluation Tasks

The model quality is evaluated using svg-super-glue benchmark, which includes three tasks:

Task	Description	Challenge	Metric
AB-test Classification	Classify SVG images containing single Latin letters ('a' or 'b')	Similar geometric structure within font domain	Accuracy
Is-optimized Detection	Determine if two SVGs are structurally identical (before/after optimization)	Visually identical but different vector representations	Accuracy
Multi-class Classification	Classify SVGs into semantic categories (love, food, phone, photography, sun)	Semantic understanding across different styles	Accuracy

Performance Results

Current validation results on tasks:

Model	AB-test	Is-optimized	Multi-class
ModernBERT-base	90.3	88.0	33.2
SVG-BERT-base	96.7	96.0	50.8
SVG-BERT-large	TBD	TBD	60.6

Model Validation

Run evaluation on svg-super-glue tasks:

cd svg-encoder/evaluation/svg-super-glue/model_validation
python train_sweeps.py

Configure hyperparameter sweeps in sweep_config.yaml:

method: grid
parameters:
  learning_rate:
    values: [1e-5, 2e-5, 5e-5]
  batch_size:
    values: [16, 32]
  num_train_epochs:
    values: [3, 5]
  task_name:
    values: ["ab-test", "is_optimized", "multi-class-classification"]

Technical Stack

PyTorch - Core framework
Transformers & Tokenizers (HuggingFace) - Model architecture and custom tokenizer
ModernBERT - Base architecture and training scripts
Weights & Biases - Experiment tracking and visualization
Datasets - Data loading and processing

Training Infrastructure

Training was conducted on:

CPU: AMD EPYC 9654 (96 cores)
GPU: NVIDIA RTX Ada 6000
Memory: Optimized for large-scale SVG corpus processing

Acknowledgements

ModernBERT for providing the base architecture
HuggingFace for transformers and tokenizers libraries
Weights & Biases for experiment tracking

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
examples/optimization		examples/optimization
imgs		imgs
llm4svg		llm4svg
svg-corpus		svg-corpus
svg-encoder		svg-encoder
.gitignore		.gitignore
README.md		README.md
README.ru.md		README.ru.md
requirements.txt		requirements.txt
setup.py		setup.py

CTLab-ITMO/Text2SVG

Folders and files

Latest commit

History

Repository files navigation

Text2SVG

Quick Start

SVG Corpus Pipeline

Mining

Installation

Basic Usage

Advanced Options

Output Format

Optimization

Python API

Command Line Interface

Optimization Example

Captioning

Installation

Usage

Output Format

LLM Fine-tuning

Current Generation Examples

Training

Prerequisites

Training Configuration

Running Training

Customization

Inference

Example Usage

SVG Encoder

Architecture

Training Process

Tokenizer Training

Model Training

Evaluation Tasks

Performance Results

Model Validation

Technical Stack

Training Infrastructure

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages