# **AutoEncoder Flux 2**

## Overview
This notebook was developed in Google Colab specifically for validating the AutoEncoder functionality.

## Hardware Specifications
- **GPU**: NVIDIA T4
- **Platform**: Google Colab

All tests and validations were performed using this configuration.

In [1]:
!uv pip install git+https://github.com/FredyRivera-dev/Flux2-from-scratch.git

[2mUsing Python 3.12.12 environment at: /usr[0m
[2K[2mResolved [1m77 packages[0m [2min 1.49s[0m[0m
[2K[2mPrepared [1m2 packages[0m [2min 3.70s[0m[0m
[2K[2mInstalled [1m2 packages[0m [2min 9ms[0m[0m
 [32m+[39m [1mflux[0m[2m==0.1.0 (from git+https://github.com/FredyRivera-dev/Flux2-from-scratch.git@d2f490fa1f1018457b83da0685cb1a2688f1d66e)[0m
 [32m+[39m [1mftfy[0m[2m==6.3.1[0m


In [1]:
from flux.utils import load_ae

ae = load_ae("./vae/ae.safetensors", "cuda:0")
ae.eval()

Automatically downloading the autoencoder


ae.safetensors:   0%|          | 0.00/336M [00:00<?, ?B/s]

Loading ./vae/ae.safetensors for the AutoEncoder weights


AutoEncoder(
  (encoder): Encoder(
    (quant_conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
    (conv_in): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (down): ModuleList(
      (0): Module(
        (block): ModuleList(
          (0-1): 2 x ResnetBlock(
            (norm1): GroupNorm(32, 128, eps=1e-06, affine=True)
            (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (norm2): GroupNorm(32, 128, eps=1e-06, affine=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
        )
        (attn): ModuleList()
        (downsample): Downsample(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2))
        )
      )
      (1): Module(
        (block): ModuleList(
          (0): ResnetBlock(
            (norm1): GroupNorm(32, 128, eps=1e-06, affine=True)
            (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  

In [5]:
import torch
from PIL import Image
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((1024, 1024)),  # Resize to expected resolution
    transforms.ToTensor(),           # Convert to tensor [0, 1]
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize to [-1, 1]
])

## Images that we want to use for the encoder test

image_paths = [
    'imagen1.png',
    'imagen2.png',
    'imagen3.png',
]

images = []
for path in image_paths:
    img = Image.open(path).convert('RGB')
    img_tensor = transform(img)
    images.append(img_tensor)

batch = torch.stack(images).to(device='cuda')  # Shape: (3, 3, 1024, 1024)
print(f"Batch shape: {batch.shape}")

with torch.no_grad():
    latents = ae.encode(batch)
    print(f"Latents shape: {latents.shape}")
    print(f"Latents dtype: {latents.dtype}")

    # Decode the latents
    decoded = ae.decode(latents)
    print(f"Decoded shape: {decoded.shape}")

def tensor_to_pil(tensor):
    """Convert a tensor [-1, 1] to PIL Image"""
    # Denormalize from [-1, 1] to [0, 1]
    tensor = (tensor + 1) / 2
    # Clamp to ensure values in [0, 1]
    tensor = torch.clamp(tensor, 0, 1)
    # Convert to numpy and PIL
    tensor = tensor.cpu().permute(1, 2, 0).numpy()
    tensor = (tensor * 255).astype('uint8')
    return Image.fromarray(tensor)

for i in range(decoded.shape[0]):
    decoded_img = tensor_to_pil(decoded[i])
    output_path = f'image_{i+1}_decode.png'
    decoded_img.save(output_path)
    print(f"Saved: {output_path}")

print("All decoded images have been saved!")

Batch shape: torch.Size([3, 3, 1024, 1024])
Latents shape: torch.Size([3, 128, 64, 64])
Latents dtype: torch.float32
Decoded shape: torch.Size([3, 3, 1024, 1024])
Saved: image_1_decode.png
Saved: image_2_decode.png
Saved: image_3_decode.png
All decoded images have been saved!
