# HuggingFace Diffusers - Tutorial 

## 1. what is Diffusers 

- Library to use pretrained diffusion models for generating images, audio and videos 
- Can do **inference** and **fine tuning**


## 2. Installation 


**tested on Python 3.8+, PyTorch 1.4+, and Flax 0.4.1+.**


Two options 


### (a) google colab 

```
# from source 
pip install accelerate
pip install git+https://github.com/huggingface/diffusers
```

### (b) local install (in virtual env) 

```
### Remove old env if it exists
conda remove --name xDiffusers --all

###Create new environment with Python 3.11
conda create --name xDiffusers python=3.11

### Activate it
conda activate xDiffusers


### PyTorch with CUDA (if you have an NVIDIA GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

### Core Hugging Face + diffusers stack
pip install diffusers transformers accelerate safetensors

### IMPORTANT INSTALL DIFFUSERS > 0.35 
pip uninstall -y diffusers
pip install -U "diffusers>=0.35.0" 


### Flux .1
pip install --upgrade diffusers

### DeepFloyd IF
pip install sentencepiece protobuf
pip install git+https://github.com/huggingface/transformers

### ControlNet (optional but very useful)
pip install opencv-python

#utils 
pip install pillow matplotlib
pip install jupyter notebook tqdm
### jupyter 
conda install jupyter
conda install ipykernel
python -m ipykernel install --user --name xDiffusers --display-name "Python (xDiffusers)"

```

## 3. Basic Usage : pipelines 


A **diffusion model** has several parts that work together to turn text or images into new outputs:

**Text Encoder** → converts your prompt into numbers (embeddings).

**Scheduler** → decides how random noise is cleaned step by step.

**UNet / DiT** → the workhorse that removes noise and shapes the image.

**VAE** → compresses images into a smaller “latent” space and decodes them back to pixels.

👉 The **pipeline** packages all these into one class so you don’t have to call each part yourself.

**A pipeline is the “all-in-one tool” that chains text encoder, scheduler, UNet, and VAE so you can generate images with a single call.**

In [1]:
from diffusers import StableDiffusionXLPipeline
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32  # SDXL benefits from fp16 on GPU

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=dtype,
    add_watermarker=False,   # optional: disable invisible watermark
)
pipe = pipe.to(device)

# (Optional) memory helpers on GPU
#if device == "cuda":
#    pipe.enable_attention_slicing()
#    pipe.enable_vae_slicing()

    
prompt = "black and white drawing of a Rwandan female fruitseller with a basket on her head on a street in Rwanda, with some fruits visible, clean lines, minimal shading"    
  

negative = "blurry, low quality, text, watermark, extra fingers, extra limbs"

image = pipe(
    prompt=prompt,
    negative_prompt=negative,
    num_inference_steps=30,
    guidance_scale=7.0,
).images[0]

image.save("sdxl_fruitseller_rw_01.png")
print('done')

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

done
