# Hugging Face

## Overview
**Hugging Face** is a platform and company that provides tools, libraries, and a hub for building, sharing, and using **machine learning (ML) and artificial intelligence (AI) models**. It is especially well known for its work in **natural language processing (NLP)**, but it also supports **computer vision** and **audio** tasks.

---

## What Hugging Face Provides

### 1. Model Hub
- A large online repository of **pretrained AI models**
- Models for:
  - Text generation
  - Translation
  - Question answering
  - Image classification
  - Speech recognition
- Many models are **open-source and free to use**

---

### 2. Transformers Library
- A popular Python library called `transformers`
- Allows easy use of advanced models like:
  - BERT
  - GPT-style models
  - T5
  - RoBERTa
- Works with frameworks like **PyTorch**, **TensorFlow**, and **JAX**

---

### 3. Datasets Library
- Provides ready-to-use **datasets** for ML tasks
- Supports:
  - Text datasets
  - Image datasets
  - Audio datasets
- Helps with efficient loading and preprocessing

---

### 4. Spaces
- A feature for hosting **interactive AI demos**
- Often built using:
  - Gradio
  - Streamlit
- Lets users test models directly in the browser

---

## Common Use Cases
- Chatbots and virtual assistants
- Text summarization
- Language translation
- Sentiment analysis
- Image and speech processing

---

## Why Hugging Face Is Important
- Makes AI **accessible to beginners**
- Encourages **open-source collaboration**
- Reduces time needed to build AI applications
- Widely used in **research, education, and industry**

---

## Simple Analogy
Hugging Face is like **GitHub for AI**, where:
- Models replace code repositories
- Datasets replace sample data
- The community shares and improves AI tools


# Connecting Hugging Face

You'll need to log in to the HuggingFace hub if you've not done so before.

1. If you haven't already done so, create a **free** HuggingFace account at https://huggingface.co and navigate to Settings from the user menu on the top right. Then Create a new API token, giving yourself write permissions.  

**IMPORTANT** when you create your HuggingFace API key, please be sure to select WRITE permissions for your key by clicking on the WRITE tab, otherwise you may get problems later. Not "fine-grained" but "write".

2. Back here in colab, press the "key" icon on the side panel to the left, and add a new secret:  
  In the name field put `HF_TOKEN`  
  In the value field put your actual token: `hf_...`  
  Ensure the notebook access switch is turned ON.

3. Execute the cell below to log in. You'll need to do this on each of your colabs. It's a really useful way to manage your secrets without needing to type them into colab.

In [1]:
# Install Dependency packages if not

# !pip install -q transformers datasets diffusers
# !pip install nvidia-cufile


In [1]:
import os
from huggingface_hub import login

hf_token = os.getenv("HUGGING_FACE_WRITE_TOKEN")

In [2]:
login(hf_token)

In [4]:
from IPython.display import display
from diffusers import AutoPipelineForText2Image
import torch

[2025-12-20 15:19:30,680] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)


/usr/bin/ld: cannot find -lcufile: No such file or directory
collect2: error: ld returned 1 exit status


In [None]:
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
# pipe.to("cuda")

## Inference Steps in Text-to-Image Generation

In the context of text-to-image generation, inference steps refer to how many iterations the model goes through to generate the final image. During each inference step, the model progressively refines the image based on the initial noise (or random input), bringing it closer to what the prompt describes.

## How Inference Works in Steps:

1. **Initial Noise**: The process begins with a random noise pattern that looks like static on a TV.

2. **Step-by-Step Refinement:**

   * In the first few steps, the model will start to recognize rough shapes and patterns. The image might still be hard to interpret.

    * As more steps are completed, the model "learns" to refine and smooth out those shapes, adding detail and structure that aligns with the prompt (e.g., turning the noise into a recognizable class of data scientists).

3. **End of Process:** By the final steps, the image should have transformed from a noisy static image into a detailed scene based on the given prompt.


* **More inference steps** → better image quality (but slower).

* **Fewer inference steps** → quicker generation (but lower quality).

In [None]:
prompt = "A class of data scientists learning AI engineering in a vibrant high-energy pop-art style"

image = pipe(prompt=prompt, num_inference_steps=30).images[0]

display(image)

In [None]:
from diffusers import DiffusionPipeline
import torch

base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
base.to("cuda")
refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", text_encoder_2=base.text_encoder_2, vae=base.vae, torch_dtype=torch.float16, use_safetensors=True, variant="fp16",)
refiner.to("cuda")

# Define how many steps and what % of steps to be run on each experts (80/20) here
n_steps = 40
high_noise_frac = 0.8

prompt = "A class of data scientists learning AI engineering in a vibrant high-energy pop-art style"

# run both experts
image = base(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]

display(image)

In [5]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [4]:
from transformers import pipeline
from datasets import load_dataset
# import soundfile as sf
import torch
from IPython.display import Audio

synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts", device='cuda')
embeddings_dataset = load_dataset("matthijs/cmu-arctic-xvectors", split="validation", trust_remote_code=True)
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
speech = synthesiser("Hi to an artificial intelligence engineer, on the way to mastery!", forward_params={"speaker_embeddings": speaker_embedding})

Audio(speech["audio"], rate=speech["sampling_rate"])

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/585M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/232 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/585M [00:00<?, ?B/s]

spm_char.model:   0%|          | 0.00/238k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/433 [00:00<?, ?B/s]

Device set to use cuda


config.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/50.7M [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

cmu-arctic-xvectors.py: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/50.6M [00:00<?, ?B/s]

spkrec-xvect.zip:   0%|          | 0.00/17.9M [00:00<?, ?B/s]

Generating validation split: 0 examples [00:00, ? examples/s]

In [None]:
import torch
from diffusers import FluxPipeline
from IPython.display import display
from datetime import datetime

start = datetime.now()

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
prompt = "A class of data scientists learning AI engineering in a vibrant high-energy pop-art style"

# Generate the image using the GPU
image = pipe(
    prompt,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=generator
).images[0]

display(image)

stop = datetime.now()


In [None]:
# Cost estimate for Colab

seconds = (stop-start).total_seconds()
units_per_hour = 5.37
estimated_units = (5.37 / 3600) * seconds
estimated_cost = estimated_units * (9.99/100)
print(f"This took {seconds:.1f} seconds and cost an estimated ${estimated_cost:.3f}")

# But there's a catch - you pay for all the time the kernel is active, not just while it's actually calculating!

---

## Conclusion
Hugging Face plays a key role in modern AI development by providing easy access to powerful models, datasets, and tools that help developers and researchers build intelligent applications efficiently.
