ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion

Sungho Koh • SeungJu Cha • Hyunwoo Oh • Kwanyoung Lee • Dong-Jin Kim

Hanyang University

NeurIPS 2025

🔍 Overview

ScaleDiff is a training-free and model-agnostic framework for extending pretrained diffusion models (SDXL, FLUX) to ultra-high resolutions (up to 4K) efficiently.
It integrates NPA, LFM, and SG into a single upsample–diffuse–denoise pipeline.

Component	Role
NPA	Efficient attention mechanism removing redundant patch overlap
LFM	Latent–RGB frequency mixing for fine details
SG	Structure alignment for global consistency

⚙️ Installation

git clone https://github.com/KSH00906/ScaleDiff.git
cd ScaleDiff
conda create -n scalediff python=3.13
conda activate scalediff
pip install -r requirements.txt

🚀 Usage

SDXL

from SDXL.pipeline_scalediff_sdxl import CustomStableDiffusionXLPipeline 
import torch

# Load pretrained SDXL model
ckpt_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = CustomStableDiffusionXLPipeline.from_pretrained(
    ckpt_path, 
    torch_dtype=torch.float16
).to("cuda")
pipe.vae.enable_tiling()

# Generate high-resolution image
prompt = "a woman"
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"

generator = torch.Generator(device='cuda').manual_seed(77)

images = pipe(
    prompt, 
    negative_prompt=negative_prompt,
    height=1024, 
    width=1024,
    generator=generator,
    num_inference_steps=50, 
    guidance_scale=7.5,
    restart_ratio=0.6,      # Controls restart schedule
    scale_factor=0.125,     # Upsampling scale factor
    upsample_stage=2,       # Number of upsampling stages
)

for i, image in enumerate(images):
    image.save(f"{prompt}_{i}.png")

Or run the example script:

cd SDXL
python run_scalediff_sdxl.py

FLUX.1-dev

import torch
from FLUX.pipeline_scalediff_flux import FluxPipeline
from FLUX.transformer_scalediff_flux import FluxTransformer2DModel

# Load pretrained FLUX model
ckpt = "black-forest-labs/FLUX.1-dev"
transformer = FluxTransformer2DModel.from_pretrained(
    ckpt, 
    torch_dtype=torch.bfloat16, 
    subfolder='transformer'
)
pipe = FluxPipeline.from_pretrained(
    ckpt, 
    torch_dtype=torch.bfloat16, 
    transformer=transformer
).to('cuda')
pipe.vae.enable_tiling()

# Generate high-resolution image
prompt = "a woman"

generator = torch.Generator(device="cuda").manual_seed(77)

images = pipe(
    prompt + ", highly detailed, 4k resolution, best quality",
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=30,
    max_sequence_length=256,
    generator=generator,
    restart_ratio=0.6,
    scale_factor=0.25,
    upsample_stage=2,
    query_random_jitter=True,
    t5_to_cpu=True,
)

for i, image in enumerate(images):
    image.save(f"{prompt}_{i}.png")

Or run the example script:

cd FLUX
python run_scalediff_flux.py        # For FLUX.1-dev
python run_scalediff_flux_schnell.py # For FLUX.1-schnell (faster)

Key Parameters

Parameter	Description	Default
`height`, `width`	Starting resolution	1024
`restart_ratio`	Noise addition step ratio	0.4 (SDXL) 0.6 (FLUX)
`scale_factor`	Downsampling ratio for frequency decomposition	0.125 (SDXL) 0.25 (FLUX)
`upsample_stage`	Number of progressive upsampling stages. Output resolution: height × 2^upsample_stage	2
`query_random_jitter`	Reduce boundary artifacts with minimal computation cost (FLUX only)	True
`t5_to_cpu`	Offload T5 encoder to CPU to save VRAM (FLUX only)	True

📝 Citation

@article{koh2025scalediff,
      title={ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion}, 
      author={Sungho Koh and SeungJu Cha and Hyunwoo Oh and Kwanyoung Lee and Dong-Jin Kim},
      journal={arXiv preprint arXiv:2510.25818},
      year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FLUX		FLUX
SDXL		SDXL
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion

🔍 Overview

⚙️ Installation

🚀 Usage

SDXL

FLUX.1-dev

Key Parameters

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

KSH00906/ScaleDiff

Folders and files

Latest commit

History

Repository files navigation

ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion

🔍 Overview

⚙️ Installation

🚀 Usage

SDXL

FLUX.1-dev

Key Parameters

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages