Flux with Remote Encode #11091

hlky · 2025-03-17T16:47:08Z

What does this PR do?

import torch
from diffusers import FluxImg2ImgPipeline
from diffusers.utils import load_image
from diffusers.utils.constants import ENCODE_ENDPOINT_FLUX, DECODE_ENDPOINT_FLUX
from diffusers.utils.remote_utils import remote_decode, remote_encode
device = "cuda"
pipe = FluxImg2ImgPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", vae=None, torch_dtype=torch.bfloat16)
pipe = pipe.to(device)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))

init_image = remote_encode(
    endpoint=ENCODE_ENDPOINT_FLUX,
    image=init_image,
    scaling_factor=0.3611,
    shift_factor=0.1159,
)

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
latents = pipe(
    prompt=prompt, image=init_image, strength=0.95, output_type="latent"
).images

image = remote_decode(
    endpoint=DECODE_ENDPOINT_FLUX,
    tensor=latents,
    scaling_factor=0.3611,
    shift_factor=0.1159,
    partial_postprocess=True,
    output_type="pt",
    return_type="pil",
    height=1024,
    width=1024,
)
image.save("wizard.png")

import torch
from diffusers import FluxInpaintPipeline
from diffusers.utils import load_image
from diffusers.utils.constants import ENCODE_ENDPOINT_FLUX, DECODE_ENDPOINT_FLUX
from diffusers.utils.remote_utils import remote_decode, remote_encode

pipe = FluxInpaintPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", vae=None, torch_dtype=torch.bfloat16
).to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url)
mask = load_image(mask_url)

mask_condition = pipe.mask_processor.preprocess(mask, height=1024, width=1024)
init_image = pipe.image_processor.preprocess(init_image, height=1024, width=1024)
masked_image = init_image * (mask_condition < 0.5)

init_image = remote_encode(
    endpoint=ENCODE_ENDPOINT_FLUX,
    image=init_image,
    scaling_factor=0.3611,
    shift_factor=0.1159,
)

masked_image = remote_encode(
    endpoint=ENCODE_ENDPOINT_FLUX,
    image=masked_image,
    scaling_factor=0.3611,
    shift_factor=0.1159,
)

latents = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask,
    masked_image_latents=masked_image,
    output_type="latent",
).images

image = remote_decode(
    endpoint=DECODE_ENDPOINT_FLUX,
    tensor=latents,
    scaling_factor=0.3611,
    shift_factor=0.1159,
    partial_postprocess=True,
    output_type="pt",
    return_type="pil",
    height=1024,
    width=1024,
)
image.save("cat.png")

Img2Img
Inpaint

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

hlky · 2025-03-17T16:49:47Z

cc @yiyixuxu Should we make output_type="latent" unpack? The packed latents cannot be passed to latents as input and we need to pass height and width to remote_encode for packed latents.

diffusers/src/diffusers/pipelines/flux/pipeline_flux_img2img.py

Lines 1013 to 1020 in 33d10af

    
           if output_type == "latent": 
        
               image = latents 
        
           else: 
        
               latents = self._unpack_latents(latents, height, width, self.vae_scale_factor) 
        
               latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor 
        
               image = self.vae.decode(latents, return_dict=False)[0] 
        
               image = self.image_processor.postprocess(image, output_type=output_type)

HuggingFaceDocBuilderDev · 2025-03-17T16:54:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky · 2025-03-17T16:55:29Z

cc @vladmandic Here is Flux img2img

yiyixuxu

awesome!

hlky · 2025-03-18T07:51:35Z

cc @vladmandic Inpaint done

hlky · 2025-03-20T09:47:39Z

Merging to not unblock downstream.

Gentle ping @yiyixuxu on #11091 (comment) for a follow up PR.

Flux img2img remote encode

5f8f9fa

yiyixuxu approved these changes Mar 18, 2025

View reviewed changes

Flux inpaint

39f3437

hlky added 2 commits March 18, 2025 07:52

-copied from

77772ef

Merge branch 'main' into flux-remote-encode

df5714a

hlky merged commit 9f2d5c9 into huggingface:main Mar 20, 2025
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flux with Remote Encode #11091

Flux with Remote Encode #11091

Uh oh!

hlky commented Mar 17, 2025 •

edited

Loading

Uh oh!

hlky commented Mar 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2025

Uh oh!

hlky commented Mar 17, 2025

Uh oh!

yiyixuxu left a comment

Uh oh!

hlky commented Mar 18, 2025

Uh oh!

Uh oh!

hlky commented Mar 20, 2025

Uh oh!

Uh oh!

Flux with Remote Encode #11091

Flux with Remote Encode #11091

Uh oh!

Conversation

hlky commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

hlky commented Mar 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2025

Uh oh!

hlky commented Mar 17, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

hlky commented Mar 18, 2025

Uh oh!

Uh oh!

hlky commented Mar 20, 2025

Uh oh!

Uh oh!

hlky commented Mar 17, 2025 •

edited

Loading