#Introduction
This notebook will fine-tune the [Stable Diffusion v1.4](https://github.com/CompVis/stable-diffusion) model to hallucinate Kanji-like symbols for words that do not have established Kanji. It includes data preprocessing, where KanjiVG SVG files are converted into black-and-white pixel images and mapped to their corresponding English meanings using KANJIDIC2. The notebook then performs LoRA fine-tuning on a pre-trained Stable Diffusion v1.4 model to specialize it for Kanji generation. Finally, it showcases inference results, allowing users to generate and visualize new Kanji characters for given English words.

Note: it is recommended to use a GPU/TPU to run this notebook (instead of a CPU), as the execution can take a very long time to finish on a CPU.

##Step 1: Install necessary dependencies

In [None]:
pip install torch accelerate datasets safetensors pillow cairosvg

In [None]:
pip install git+https://github.com/huggingface/diffusers

## Step 2: Log in to Hugging Face

In [None]:
!huggingface-cli login # Make sure to have the Huggingface API KEY/ACCESS TOKEN

## Step 3: Mount Google Drive and navigate to a project folder for file access in Colab

In [None]:
import os
from google.colab import drive

# Mount Google Drive to access persistent storage across Colab sessions
drive.mount("/content/drive")

# Navigate to the project directory in Google Drive
os.chdir("/content/drive/MyDrive/path/to/project/directory")

## Step 4: Prepare Training Data

The following dataset files are used


*   KANJIDIC2 file ([kanjidic2.xml](https://www.edrdg.org/kanjidic/kanjidic2.xml.gz))
*   Associated SVG file for the KANJIDIC2 file ([kanjivg-20220427.xml](https://github.com/KanjiVG/kanjivg/releases/download/r20220427/kanjivg-20220427.xml.gz))

Make sure to place these files in the project directory specified above.


In [None]:
import xml.etree.ElementTree as ET
import re
import cairosvg
import shutil
from PIL import Image

Generate SVG files from kanjivg-20220427.xml file

In [None]:
# Directory for SVG output
svg_folder = "kanji_svg"
os.makedirs(svg_folder, exist_ok=True)

# SVG definitions
kanji_header = '<svg xmlns="http://www.w3.org/2000/svg" ' \
               'width="128" height="128" ' \
               'viewBox="0 0 128 128">'

kanji_style = 'style="fill:none;' \
              'stroke:#000000;' \
              'stroke-width:3;' \
              'stroke-linecap:round;' \
              'stroke-linejoin:round;">'

# Process kanji from XML
kanjivg_root = ET.parse("kanjivg-20220427.xml").getroot()
for kanji in kanjivg_root:
    kanji_id = kanji.attrib.get("id")
    if kanji_id:
        # Create SVG content
        svg_content = f"{kanji_header}\n"
        for stroke_group in kanji.findall(".//g"):
            stroke_group_str = ET.tostring(
                stroke_group,
                encoding="utf-8",
                method="xml"
            ).decode("utf-8")
            svg_content += f"<g {kanji_style}{stroke_group_str}</g>\n"
        svg_content += "</svg>"

        # Save to file
        raw_path = os.path.join(svg_folder, f"{kanji_id}.svg")
        svg_file_path = raw_path.replace("kvg:kanji_", "")
        with open(svg_file_path, "w", encoding="utf-8") as svg_file:
            svg_file.write(svg_content)

Convert SVG files to PNG, then transform PNG to JPG

In [None]:
# Create folder for JPG output
jpg_folder = "kanji_jpg"
os.makedirs(jpg_folder, exist_ok=True)

# Create folder for PNG output
png_folder = "kanji_png"
os.makedirs(png_folder, exist_ok=True)

# Process each SVG file
for svg in os.listdir(svg_folder):
    if svg.endswith(".svg"):
        # Set file paths
        svg_path = os.path.join(svg_folder, svg)
        jpg_path = os.path.join(jpg_folder, svg.replace(".svg", ".jpg"))
        png_path = svg_path.replace("svg", "png")

        # Convert SVG to PNG
        cairosvg.svg2png(url=svg_path, write_to=png_path)

        # Convert PNG to JPG with white background
        with Image.open(png_path) as img:
            with Image.new("RGB", img.size, "WHITE") as background:
                background.paste(img, (0, 0), img)
                background.save(jpg_path, "JPEG")

Process kanjivg-20220427.xml file to create a mapping between kanji characters and their corresponding filenames

In [None]:
# Regular expression to extract kanji element literal
kanji_literal_pattern = re.compile(r'kvg:element="([^"]+)"')
literal_to_filename = {}
is_processing_kanji = False

# Process XML to map kanji literals to filenames
with open("kanjivg-20220427.xml", "r", encoding="utf-8") as kanjivg_file:
    for current_line in kanjivg_file:
        if "<kanji" in current_line:
            is_processing_kanji = True

        if is_processing_kanji:
            kanji_id_match = re.search(r'id="([^"]+)"', current_line)
            kanji_literal_match = kanji_literal_pattern.search(current_line)

            if kanji_literal_match:
                kanji_literal = kanji_literal_match.group(1)
                kanji_filename = kanji_id_match.group(1).replace("kvg:", "")
                literal_to_filename[kanji_literal] = kanji_filename
                is_processing_kanji = False

Extract English meanings for each kanji from kanjidic2.xml and create a corresponding metadata.jsonl file

In [None]:
# Parse Kanjidic2.xml and prepare metadata file
kanjidic2_root = ET.parse("kanjidic2.xml").getroot()
metadata_file_path = os.path.join(jpg_folder, "metadata.jsonl")

with open(metadata_file_path, "w") as metadata:
    # Process each kanji character
    for character in kanjidic2_root.findall("character"):
        literal = character.find("literal").text
        meanings = []

        # Extract English meanings only
        for meaning in character.findall(".//reading_meaning/rmgroup/meaning"):
            if "r_type" not in meaning.attrib and "m_lang" not in meaning.attrib:
                meanings.append(meaning.text)

        concat_meanings = ", ".join(meanings)

        # Create JSON entry for characters with mapping
        if literal in literal_to_filename:
            file_name = literal_to_filename[literal]
            file_to_text_str = (
                f'{{"file_name": "{literal_to_filename[literal]}.jpg", '
                f'"text": "{concat_meanings} Kanji"}}\n'
            )
            metadata.write(file_to_text_str)

Remove unnecessary images in the JPG folder

In [None]:
# Remove JPG files that don't have metadata entries
for jpg_file in os.listdir(jpg_folder):
    if jpg_file.endswith(".jpg"):
        with open(metadata_file_path, "r") as metadata:
            # Check if file is referenced in metadata
            if jpg_file not in metadata.read():
                # Delete files without metadata entries
                os.remove(os.path.join(jpg_folder, jpg_file))

Publish dataset to Hugging Face

In [None]:
from datasets import load_dataset

# Move processed images to train folder and publish dataset to Hugging Face Hub
shutil.move(jpg_folder, "train")
dataset = load_dataset("imagefolder", "train", split="train")
dataset.push_to_hub("Akirashindo39/KANJIDIC2")

## Step 5: Fine-tune  Stable Diffusion v1.4 model using LoRA (Low-Rank Adaptation)

Run the [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) script with the necessary parameters. Note that the below command can take around 2 hours to finish on a T4 GPU.

In [None]:
# Launch LoRA fine-tuning for text-to-image model with accelerate
!accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --dataset_name="Akirashindo39/KANJIDIC2" \
  --image_column="image" \
  --caption_column="text" \
  --resolution=512 \
  --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=1 \
  --checkpointing_steps=2000 \
  --learning_rate=1e-04 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
  --validation_prompt="A kanji meaning Elon Musk" \
  --push_to_hub

## Step 6: Generate new Kanji

Load Stable Diffusion v1.4, apply a fine-tuned Kanji model, and run on GPU.

In [None]:
from diffusers import StableDiffusionPipeline
import torch

torch.cuda.empty_cache()

# Define model path here
model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
    use_safetensors=True
).to("cuda")
pipe.unet.load_attn_procs(model_path)

# Ensure pipeline is on GPU
pipe.to("cuda")

Generate and save a custom kanji character

In [None]:
new_kanji_meaning = "internet" # Enter new kanji meaning here
prompt = f"a Kanji meaning {new_kanji_meaning}"
image = pipe(prompt).images[0]
image.save(f"{new_kanji_meaning}-kanji-v1-4.png")