### Optimizing and Deploying AI Models with Pruna and Hugging Face

`Goal`: Create an end-to-end tutorial to optimize the black-forest-labs/FLUX.1-dev model using Pruna and deploy it on the Hugging Face Hub.

`Model`:[black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)

`Dataset`: [data-is-better-together/open-image-preferences-v1-binarized](https://huggingface.co/datasets/data-is-better-together/open-image-preferences-v1-binarized)

To complete the tutorial, you need to install the pruna SDK along with a few third-party libraries via pip. It is recommended to run this notebook in a new virtual environment.


In [None]:
pip install pruna 

In [None]:
pip install datasets huggingface_hub gradio diffusers

You will need to login on the Hugging Face Hub for using the model weights. Run the cell below to do the same.

In [2]:
from huggingface_hub import notebook_login

notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Smash Configuration:

In order to optimize the model, we need to define the methods which can help to improve the performance. To know more, you can view the [SmashConfig guide](https://docs.pruna.ai/en/stable/docs_pruna/user_manual/configure.html).

We will select a quantizer to lower memory requirements and cacher for intermediate results of computations to speed up subsequent operations.

We will also upload the smashed model to the Hugging Face Hub

In [None]:
import torch
from diffusers import FluxPipeline
from pruna import smash, SmashConfig

# 1. Load the original FLUX pipeline
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
).to("cuda")

# 2. Configure Pruna smash
smash_config = SmashConfig()
smash_config["quantizer"] = "hqq_diffusers" 
smash_config["cacher"] = "pab"

# 3. Smash the model
smashed_pipe = smash(model=pipe, smash_config=smash_config)

# 4. Push smashed pipeline to the Hub
smashed_pipe.save_to_hub("AINovice2005/smashed-FLUX.1-schnell-pruna")


Load Dataset

In [None]:
from datasets import load_dataset

# load the binarized Open Image Preferences prompts
ds = load_dataset("data-is-better-together/open-image-preferences-v1-binarized", split="train")

# preview 10 examples
for example in ds.select(range(50)):
    print(example["prompt"])


Evaluate the model

In [None]:
import torch
from datasets import load_dataset
from torch.utils.data import Dataset, DataLoader

from diffusers import AutoPipelineForText2Image
from pruna.engine.pruna_model import PrunaModel
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task

# 1. Load only the first 50 examples from the dataset
hf_dataset = load_dataset(
    "data-is-better-together/open-image-preferences-v1-binarized",
    split="train[:50]"
)

# 2. Custom PyTorch dataset to extract prompts from the 'chosen' field
class OpenImagesChosenPromptDataset(Dataset):
    def __init__(self, hf_dataset):
        self.samples = hf_dataset

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        return {"prompt": self.samples[idx]["chosen"]}

eval_dataset = OpenImagesChosenPromptDataset(hf_dataset)
eval_dataloader = DataLoader(eval_dataset, batch_size=1)

# 3. Load the Pruna-wrapped model from local path
loaded_model = PrunaModel.from_hub(
    "AINovice2005/smashed-FLUX.1-schnell-pruna",  # Path to your saved model directory
).to("cuda" if torch.cuda.is_available() else "cpu")

# 4. Create the EvaluationAgent with task defined
agent = EvaluationAgent(
    pipeline=loaded_model,
    task=Task.TEXT2IMAGE,
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu")
)

# 5. Run evaluation with CLIP score
results = agent.evaluate(
    dataloader=eval_dataloader,
    num_batches=50,
    metrics=["clip_score"],
    save_path="./eval_results.json"
)

# 6. Print results
print("📊 Evaluation Results:")
for metric, score in results.items():
    print(f"{metric}: {score}")


Gradio Demo

Now, we will deploy the smashed model on gradio.

In [None]:
import gradio as gr
from diffusers import DiffusionPipeline

# Load the HiDream model
pipe = DiffusionPipeline.from_pretrained("FLUX.1-schnell-smashed")

# Define the generation function
def generate(prompt):
    return pipe(prompt).images[0]

# Create the Gradio interface
gr.Interface(fn=generate, inputs="text", outputs="image").launch()
