### Optimizing and Deploying AI Models with Pruna and Hugging Face

`Goal`: Create an end-to-end tutorial to optimize the black-forest-labs/FLUX.1-dev model using Pruna and deploy it on the Hugging Face Hub.

`Model`:[black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)

`Dataset`: [data-is-better-together/open-image-preferences-v1-binarized](https://huggingface.co/datasets/data-is-better-together/open-image-preferences-v1-binarized)

To complete the tutorial, you need to install the pruna SDK along with a few third-party libraries via pip. It is recommended to run this notebook in a new virtual environment.


In [None]:
pip install pruna 

In [None]:
pip install datasets huggingface_hub gradio diffusers

You will need to login on the Hugging Face Hub for using the model weights. Run the cell below to do the same.

In [1]:
from huggingface_hub import notebook_login

notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Token has not been saved to git credential helper.


Smash Configuration:

In order to optimize the model, we need to define the methods which can help to improve the performance. To know more, you can view the [SmashConfig guide](https://docs.pruna.ai/en/stable/docs_pruna/user_manual/configure.html).

We will select a quantizer to lower memory requirements and cacher for intermediate results of computations to speed up subsequent operations.

We will also upload the smashed model to the Hugging Face Hub

In [None]:
import torch
from diffusers import FluxPipeline
from pruna import smash, SmashConfig

# 1. Load the original FLUX pipeline
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
).to("cuda")

# 2. Configure Pruna smash
smash_config = SmashConfig()
smash_config["quantizer"] = "hqq_diffusers" 
smash_config["cacher"] = "pab"

# 3. Smash the model
smashed_pipe = smash(model=pipe, smash_config=smash_config)

# 4. Push smashed pipeline to the Hub
smashed_pipe.save_to_hub("AINovice2005/smashed-FLUX.1-schnell-pruna")


Load Dataset

In [None]:
from datasets import load_dataset

# load the binarized Open Image Preferences prompts
ds = load_dataset("data-is-better-together/open-image-preferences-v1-binarized", split="train")

# preview 10 examples
for example in ds.select(range(50)):
    print(example["prompt"])


Evaluate the model

Now, we can evaluate the model using the Pruna Evaluation Agent.

In [11]:
from diffusers import DiffusionPipeline
from pruna import PrunaModel
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task

# Step 1: Load the smashed FLUX model from Hugging Face
pipe = DiffusionPipeline.from_pretrained("AINovice2005/smashed-FLUX.1-schnell-pruna")
pipe.to("cuda")  # or "cpu"

# Step 2: Save to local directory
save_path = "saved_model"
pipe.save_pretrained(save_path)

# Step 3: Load the saved model using PrunaModel.from_pretrained
smashed_model = PrunaModel.from_pretrained(model_path=save_path)

# Step 4: Set up evaluation task
metrics = ["clip_score", "psnr"]
datamodule = PrunaDataModule.from_string("data-is-better-together/open-image-preferences-v1-binarized")
task = Task(metrics, datamodule=datamodule, device="cuda")  # or "cpu"

# Step 5: Run evaluation
eval_agent = EvaluationAgent(task)
results = eval_agent.evaluate(smashed_model)

# Step 6: Print results
print(results.results)


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: Pipeline <class 'diffusers.pipelines.flux.pipeline_flux.FluxPipeline'> expected ['scheduler', 'text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'transformer', 'vae'], but only {'vae', 'tokenizer', 'text_encoder_2', 'text_encoder', 'scheduler', 'tokenizer_2'} were passed.

Gradio Demo

Now, we can also deploy the smashed model on gradio.

In [None]:
import gradio as gr
import torch
from diffusers import DiffusionPipeline


model_path = "AINovice2005/smashed-FLUX.1-schnell-pruna"
pipe = DiffusionPipeline.from_pretrained(model_path)
pipe.to("cuda" if torch.cuda.is_available() else "cpu")

# Inference function
def generate_image(prompt):
    result = pipe(prompt, num_inference_steps=25, guidance_scale=7.5)
    return result.images[0]

# Create Gradio interface
demo = gr.Interface(
    fn=generate_image,
    inputs=gr.Textbox(lines=2, placeholder="Enter your prompt here...", label="Prompt"),
    outputs=gr.Image(type="pil"),
    title="FLUX Smashed Text-to-Image",
    description="Generate high-quality images using a smashed FLUX model optimized with Pruna."
)

# Launch the app
if __name__ == "__main__":
    demo.launch()


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: Pipeline <class 'diffusers.pipelines.flux.pipeline_flux.FluxPipeline'> expected ['scheduler', 'text_encoder', 'text_encoder_2', 'tokenizer', 'tokenizer_2', 'transformer', 'vae'], but only {'vae', 'tokenizer', 'text_encoder_2', 'text_encoder', 'scheduler', 'tokenizer_2'} were passed.