# Flux Playbook

### Note:
The tutorial is supposed to work in a NeMo container (> 24.09). We provide the basic usage of Flux training and inference pipeline as an example. Please note that the full Flux model contains 12 billion parameter and require certain VRAM to run the inference script it in full size. 

Important: The Flux checkpoint from Huggingface requires per-user authentication to get access. Please set your own HF token with proper access before running the inference section of this notebook, otherwise, the model will be randomly initialized and therefore, output images will be random noise.

##### Launch a NeMo docker container 
```
docker run --gpus all -it --rm -v <your_nemo_dir>:/opt/NeMo --shm-size=8g \
     -p 8888:8888 --ulimit memlock=-1 --ulimit \
      stack=67108864 nvcr.io/nvidia/nemo:xx.xx
```
Mounting your own version of NeMo repo is optional, it's only needed when you have customized changes outside this notebook for testing purpose.

### Flux Training with Mock Dataset

For illustration purpose, we first take a look at how to run the pre-defined unit test recipe where number of transformer layers of Flux is set to 1. In this recipe, all modules are initialized randomly so no pre-downloaded checkpoint is needed. We also provide a mock data module which generates image and text embeds directly, so text and image encoders are not required.

Let's take a look at the configs in this recipe.

```
@run.cli.factory(target=llm.train)
def unit_test() -> run.Partial:
    '''
    Basic functional test, with mock dataset,
    text/vae encoders not initialized, ddp strategy,
    frozen and trainable layers both set to 1
    '''
    recipe = flux_training()

    # Set params of following modules to Null when image and text provided in the datamodule are embeddings
    recipe.model.flux_params.t5_params = None 
    recipe.model.flux_params.clip_params = None
    recipe.model.flux_params.vae_config = None
    recipe.model.flux_params.device = 'cuda'

    # Set number of layers of Flux
    recipe.model.flux_params.flux_config = run.Config(
        FluxConfig,
        num_joint_layers=1,
        num_single_layers=1,
    )

    recipe.data.global_batch_size = 1
    recipe.trainer.strategy.ddp = run.Config(
        DistributedDataParallelConfig,
        check_for_nan_in_grad=True,
        grad_reduce_in_fp32=True,
    )
    recipe.trainer.max_steps=10
    return recipe
```

In NeMo-2, such pre-defined recipe can work easily as following:

In [None]:
!torchrun /opt/NeMo/scripts/flux/flux_training.py --yes --factory unit_test

To keep the playbook simple, we use the least number of layers above. You can change the config in pre-defined recipes to test locally with different number of layers, number of devices, etc. We also provdied other pre-defined recipes in the script for reference..

### Flux Inference
From this point, please download the [Flux-1.dev checkpoint][flux] from HF and save it locally before proceeding, or set your own Hugging Face token with proper access to download it automatically. Otherwise, the notebook will just run randomly initialized dummy model and the results will be just for illustration because it will be pure noise!


**Troubleshooting:**
- If you encounter "safetensors header too large" errors, the model files were not fully downloaded
- Verify file sizes: `ae.safetensors` should be several GB, not just a few KB
- Use `file /temp/FLUX.1-dev/ae.safetensors` to check if it's a proper safetensors file

[flux]: https://huggingface.co/black-forest-labs/FLUX.1-dev


In [None]:
####  Download checkpoints using Hugging Face Hub
from huggingface_hub import snapshot_download

# Download the entire model repository
# Replace <HF_token> with your actual Hugging Face token
HF_TOKEN = "<HF_token>"  # Replace with your token
model_path = snapshot_download(
    repo_id="black-forest-labs/FLUX.1-dev",
    token=HF_TOKEN,
    local_dir="/temp/FLUX.1-dev",
    local_dir_use_symlinks=False
)

print(f"Model downloaded to: {model_path}")

In [None]:

# Verify the download
!ls -la /temp/FLUX.1-dev/
!file /temp/FLUX.1-dev/ae.safetensors


When you have downloaded the checkpoint, specify the path below and run follows.
Note that this model contains 12B parameters, it requires significant RAM in GPU or it runs Out Of Memory

In [None]:
#### Optional Cell, only makes sense if your machine has enough device memory and you downloaded valid checkpoint from previous steps
CHECKPOINT_PATH="/temp/FLUX.1-dev"

# Verify the checkpoint path exists and contains the required files
!ls -la ${CHECKPOINT_PATH}/
!ls -la ${CHECKPOINT_PATH}/ae.safetensors

# Run inference with the downloaded checkpoint
!torchrun /opt/NeMo/scripts/flux/flux_infer.py \
  --flux_ckpt ${CHECKPOINT_PATH}/transformer \
  --clip_version ${CHECKPOINT_PATH}/text_encoder \
  --t5_version ${CHECKPOINT_PATH}/text_encoder_2 \
  --vae_ckpt ${CHECKPOINT_PATH}/ae.safetensors \
  --do_convert_from_hf \
  --prompts "A cat holding a sign that says hello world" \
  --inference_steps 30

For test purpose, load random weights only and reduce the number of layers to avoid OOM, the output will be just noise in this case.

In [None]:
!torchrun /opt/NeMo/scripts/flux/flux_infer.py --clip_version None --t5_version None --vae_ckpt None --num_joint_layers 4 --num_single_layers 8 --prompts  "A cat holding a sign that says hello world" --inference_steps 30

# 