# Practical Generative AI Course for Beginners

**Real-life Examples with Hugging Face and Python**

## Course Overview

This course is designed for beginners who want to understand and implement generative AI models using Python and the Hugging Face ecosystem. By the end of this course, you'll be able to build and deploy various generative AI applications.

## Prerequisites

- Basic Python programming knowledge
- Familiarity with pip and virtual environments
- Understanding of basic machine learning concepts (helpful but not required)

### Module 1: Introduction to Generative AI

#### Learning Objectives:

- Understand what generative AI is and how it differs from discriminative AI
- Learn about key generative AI applications and use cases
- Set up your Python environment with necessary libraries

#### Practical Example: Text Generation with GPT-2


In [8]:
# Install required libraries
#!pip install transformers torch


In [4]:
# Install required libraries
# pip install transformers torch

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Generate text
prompt = "Artificial intelligence will"
inputs = tokenizer(prompt, return_tensors="pt")
output_sequences = model.generate(
    inputs["input_ids"],
    max_length=50,
    num_return_sequences=1,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print(generated_text)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Artificial intelligence will be able to do things like search for and find people, and to do things like find out who's in the right place at the right time.

"We're going to be able to do things like that, and


### Module 2: Working with Transformers and Hugging Face

#### Learning Objectives:

- Understand transformer architecture fundamentals
- Navigate the Hugging Face Hub and ecosystem
- Learn to fine-tune pre-trained models for specific tasks

#### Practical Example: Sentiment Analysis with DistilBERT


In [7]:
# pip install transformers datasets torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

# Load pre-trained model for sentiment analysis
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

# Create a pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Analyze sentiments
texts = [
    "I absolutely loved this product! It exceeded all my expectations.",
    "The service was terrible and the staff was rude.",
    "It was okay, not great but not bad either."
]

for text in texts:
    result = sentiment_analyzer(text)
    print(f"Text: {text}")
    print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Device set to use cpu


Text: I absolutely loved this product! It exceeded all my expectations.
Sentiment: POSITIVE, Score: 0.9999

Text: The service was terrible and the staff was rude.
Sentiment: NEGATIVE, Score: 0.9997

Text: It was okay, not great but not bad either.
Sentiment: POSITIVE, Score: 0.9977



### Module 3: Image Generation with Diffusion Models

#### Learning Objectives:

- Understand diffusion models and their applications
- Learn to use Stable Diffusion via Hugging Face
- Implement text-to-image generation

#### Practical Example: Text-to-Image Generation with Stable Diffusion


In [24]:
## !pip install diffusers transformers accelerate torch

In [None]:

# pip install diffusers transformers accelerate torch

import torch
from diffusers import StableDiffusionPipeline

# Replace with your actual Hugging Face token
hf_token = "YOUR_HUGGINGFACE_TOKEN"

# Load the model (requires authentication for some models)
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_auth_token=hf_token
)
pipe = pipe.to("cuda")  # Use "cpu" if you don't have a GPU

# Generate an image from a text prompt
prompt = "A serene landscape with mountains and a lake at sunset, digital art"
image = pipe(prompt).images[0]

# Save the generated image
image.save("generated_landscape.png")



In [None]:
# Display image if in notebook
from IPython.display import display
display(image)

### Module 4: Fine-tuning Language Models

#### Learning Objectives:

- Understand the concept of fine-tuning pre-trained models
- Learn techniques for efficient fine-tuning (LoRA, adapters)
- Fine-tune a language model on custom data

#### Practical Example: Fine-tuning GPT-2 for Custom Text Generation


### Explanatory Notes for GPT-2 Fine-tuning Code

#### Overall Structure and Purpose

This code demonstrates how to fine-tune a pre-trained GPT-2 language model on a custom dataset for improved text generation. It's organized into three main functions:

- **`fine_tune_text_generator()`:** Handles the model fine-tuning process
- **`generate_text()`:** Uses the fine-tuned model to generate new text
- **`main()`:** Orchestrates the workflow

#### Key Components Explained

- **Model Setup**
  - The code loads a pre-trained GPT-2 model and tokenizer from Hugging Face
  - It checks for GPU availability and moves the model to the appropriate device
  - Since GPT-2 doesn't have a built-in padding token, it uses the end-of-sequence token as padding

- **Dataset Preparation**
  - Loads the WikiText dataset (specifically `wikitext-2-raw-v1`)
  - Tokenizes the text data with appropriate parameters:
    - Truncates texts to 512 tokens maximum
    - Adds padding
    - Includes attention masks
  - For demonstration purposes, only 1000 examples are used from the training set

- **Training Configuration**
  - Uses `DataCollatorForLanguageModeling` for causal (next token prediction) language modeling
  - Sets up training arguments including:
    - Number of epochs (default 3)
    - Batch size (default 8)
    - Logging and checkpoint saving frequencies
  - The `Trainer` class handles the training loop, optimization, and logging

- **Text Generation**
  - The `generate_text()` function creates new text from prompts using the fine-tuned model
  - Generation parameters control the output:
    - `temperature:` Controls randomness (0.8 provides a good balance)
    - `top_k` and `top_p:` Limit token selection to improve quality
    - `no_repeat_ngram_size:` Prevents exact repetition of phrases
    - `num_return_sequences:` Creates multiple outputs from the same prompt

- **Workflow in `main()`**
  - Fine-tunes the model (with an option to load a previously fine-tuned model)
  - Demonstrates generation with three different prompts
  - Outputs multiple samples for each prompt

#### Implementation Notes

- The code is flexible, allowing customization of dataset, model size, and training parameters
- It demonstrates proper handling of devices (CPU/GPU)
- For efficiency, it includes options to skip re-training if you've already fine-tuned a model
- The generation showcases how to control text generation parameters for different creative outputs

This is a complete pipeline for fine-tuning and using GPT-2 for text generation, following best practices from the Hugging Face ecosystem.


In [None]:
#   use GOOGLE COLAB GPU 
import os
import torch
from datasets import load_dataset
from transformers import (
    GPT2Tokenizer, 
    GPT2LMHeadModel,
    Trainer, 
    TrainingArguments,
    DataCollatorForLanguageModeling
)

def fine_tune_text_generator(
    dataset_name="wikitext",
    dataset_config="wikitext-2-raw-v1",
    model_name="gpt2",
    output_dir="./fine-tuned-gpt2",
    num_epochs=3,
    batch_size=8
):
    """
    Fine-tune a GPT-2 model on a text dataset for improved text generation.
    
    Args:
        dataset_name: Name of the dataset to use from Hugging Face datasets
        dataset_config: Specific configuration of the dataset
        model_name: Pre-trained model to fine-tune
        output_dir: Directory to save the fine-tuned model
        num_epochs: Number of training epochs
        batch_size: Batch size for training
    """
    # Check if GPU is available
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")
    
    print(f"Loading {model_name} model and tokenizer...")
    # Load pre-trained model and tokenizer
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    model = GPT2LMHeadModel.from_pretrained(model_name)
    
    # Move model to the appropriate device
    model = model.to(device)
    
    # Add padding token (needed for batched training)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        model.config.pad_token_id = model.config.eos_token_id
    
    print(f"Loading {dataset_name} dataset...")
    # Load and prepare dataset
    dataset = load_dataset(dataset_name, dataset_config)
    
    # Tokenization function
    def tokenize_function(examples):
        return tokenizer(
            examples["text"],
            truncation=True,
            max_length=512,
            padding="max_length",
            return_attention_mask=True  # Explicitly request attention mask
        )
    
    print("Tokenizing dataset...")
    # Apply tokenization to dataset
    tokenized_dataset = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=["text"]
    )
    
    # Use only a small subset for this example
    train_dataset = tokenized_dataset["train"].select(range(1000))
    
    # Data collator for language modeling
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False  # We're doing causal language modeling, not masked
    )
    
    # Set up training arguments
    training_args = TrainingArguments(
        output_dir=output_dir,
        overwrite_output_dir=True,
        num_train_epochs=num_epochs,
        per_device_train_batch_size=batch_size,
        save_steps=1000,
        save_total_limit=2,
        logging_steps=100,
        logging_dir=os.path.join(output_dir, "logs"),
    )
    
    # Set up trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        data_collator=data_collator,
    )
    
    print("Starting training...")
    # Train model
    trainer.train()
    
    print(f"Saving model to {output_dir}...")
    # Save model
    trainer.save_model(output_dir)
    tokenizer.save_pretrained(output_dir)
    
    return model, tokenizer

def generate_text(
    model,
    tokenizer,
    prompt="Once upon a time",
    max_length=100,
    num_samples=3,
    temperature=0.8
):
    """
    Generate text using a fine-tuned GPT-2 model.
    
    Args:
        model: Fine-tuned model
        tokenizer: Tokenizer for the model
        prompt: Starting text for generation
        max_length: Maximum length of generated text
        num_samples: Number of different samples to generate
        temperature: Controls randomness (lower = more deterministic)
    
    Returns:
        List of generated text samples
    """
    # Determine device
    device = model.device
    
    # Encode prompt with attention mask
    inputs = tokenizer(
        prompt, 
        return_tensors="pt",
        padding=True,
        return_attention_mask=True
    )
    
    # Move inputs to the same device as model
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Generate text
    outputs = model.generate(
        inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=max_length,
        num_return_sequences=num_samples,
        temperature=temperature,
        top_k=50,
        top_p=0.95,
        no_repeat_ngram_size=2,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )
    
    # Decode and return generated texts
    generated_texts = []
    for output in outputs:
        generated_text = tokenizer.decode(output, skip_special_tokens=True)
        generated_texts.append(generated_text)
    
    return generated_texts

def main():
    # Check if GPU is available
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")
    
    # Fine-tune model (comment out to skip fine-tuning if already done)
    model, tokenizer = fine_tune_text_generator(num_epochs=1)  # Reduced epochs for demo
    
    # Or load an already fine-tuned model
    # model_path = "./fine-tuned-gpt2"
    # tokenizer = GPT2Tokenizer.from_pretrained(model_path)
    # model = GPT2LMHeadModel.from_pretrained(model_path)
    # model = model.to(device)  # Make sure to move loaded model to the correct device
    
    # Generate text with different prompts
    prompts = [
        "The future of artificial intelligence",
        "Climate change poses",
        "In the world of quantum computing"
    ]
    
    for prompt in prompts:
        print(f"\n\nPrompt: {prompt}")
        print("-" * 50)
        
        generated_texts = generate_text(model, tokenizer, prompt=prompt)
        
        for i, text in enumerate(generated_texts, 1):
            print(f"Sample {i}:")
            print(text)
            print("-" * 50)

if __name__ == "__main__":
    main()

### Module 5: Building Conversational AI with Hugging Face

#### Learning Objectives:

- Understand dialog systems architecture
- Implement a simple chatbot using pre-trained models
- Learn about retrieval-augmented generation (RAG)

#### Practical Example: Simple Chatbot with Blenderbot


In [None]:
# pip install transformers torch

from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration

# Load pre-trained model and tokenizer
model_name = "facebook/blenderbot-400M-distill"
tokenizer = BlenderbotTokenizer.from_pretrained(model_name)
model = BlenderbotForConditionalGeneration.from_pretrained(model_name)

# Function to generate responses
def generate_response(input_text):
    # Tokenize input text
    inputs = tokenizer(input_text, return_tensors="pt")
    
    # Generate response
    response_ids = model.generate(
        inputs["input_ids"],
        max_length=100,
        num_beams=4,
        early_stopping=True
    )
    
    # Decode response
    response = tokenizer.decode(response_ids[0], skip_special_tokens=True)
    return response

# Simple conversation loop
def chat():
    print("Bot: Hello! I'm a chatbot. Type 'exit' to end our conversation.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            print("Bot: Goodbye!")
            break
        response = generate_response(user_input)
        print(f"Bot: {response}")

# Run the chatbot
chat()

In [29]:
# pip install transformers pillow torch

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests

# Load model and processor
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Function to generate captions for images
def generate_caption(image_path_or_url):
    # Load image
    if image_path_or_url.startswith('http'):
        image = Image.open(requests.get(image_path_or_url, stream=True).raw)
    else:
        image = Image.open(image_path_or_url)
    
    # Process image and generate caption
    inputs = processor(image, return_tensors="pt")
    output = model.generate(**inputs, max_length=30)
    caption = processor.decode(output[0], skip_special_tokens=True)
    
    return caption

# Example usage
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
caption = generate_caption(image_url)
print(f"Generated caption: {caption}")

preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Generated caption: two cats sleeping on a couch


### Module 7: Deploying Generative AI Models

#### Learning Objectives:

- Learn strategies for optimizing and deploying generative AI models


In [None]:
import streamlit as st
from transformers import pipeline
import torch
import time

# Page configuration
st.set_page_config(
    page_title="AI Text Generator",
    page_icon="âœ¨",
    layout="wide"
)

# Custom CSS for better styling
st.markdown("""
<style>
    .main {
        padding: 1rem 1rem;
    }
    .title {
        font-size: 2.5rem;
        font-weight: bold;
        margin-bottom: 1rem;
    }
    .subtitle {
        font-size: 1.2rem;
        margin-bottom: 2rem;
    }
    .stTextInput>div>div>input {
        padding: 0.5rem;
        font-size: 1rem;
    }
    .output-container {
        background-color: #f0f2f6;
        border-radius: 0.5rem;
        padding: 1.5rem;
        margin-top: 1rem;
    }
    .params-section {
        background-color: #f8f9fa;
        border-radius: 0.5rem;
        padding: 1rem;
        margin-bottom: 1rem;
    }
</style>
""", unsafe_allow_html=True)

@st.cache_resource
def load_model(model_name="gpt2"):
    """Load the model and tokenizer (cached to avoid reloading)"""
    try:
        device = 0 if torch.cuda.is_available() else -1
        return pipeline('text-generation', model=model_name, device=device)
    except Exception as e:
        st.error(f"Error loading model: {str(e)}")
        return None

def main():
    # Header
    st.markdown('<p class="title">AI Text Generator</p>', unsafe_allow_html=True)
    st.markdown('<p class="subtitle">Generate creative text using the power of GPT-2!</p>', unsafe_allow_html=True)
    
    # Sidebar for model selection
    with st.sidebar:
        st.header("Model Settings")
        model_option = st.selectbox(
            "Select model",
            options=["gpt2", "gpt2-medium", "gpt2-large", "distilgpt2"],
            index=0,
            help="Larger models provide better quality but take longer to generate"
        )
        
        st.subheader("About")
        st.markdown("""
        This app uses Hugging Face's transformers library to generate text with GPT-2 models.
        
        - **gpt2**: 124M parameters (fastest)
        - **distilgpt2**: 82M parameters (smaller, slightly lower quality)
        - **gpt2-medium**: 355M parameters (better quality, slower)
        - **gpt2-large**: 774M parameters (best quality, slowest)
        """)
    
    # Load the model
    with st.spinner("Loading the model... This might take a moment."):
        generator = load_model(model_option)
    
    if not generator:
        st.warning("Failed to load the model. Please try again.")
        return
        
    # Input section
    st.header("Enter your prompt")
    prompt = st.text_area(
        "Type a starting prompt for the AI to continue",
        value="Once upon a time in a land far away,",
        height=100,
        max_chars=500,
        help="The AI will continue from this text"
    )
    
    # Parameters section
    with st.expander("Advanced Parameters", expanded=False):
        col1, col2 = st.columns(2)
        
        with col1:
            max_length = st.slider(
                "Maximum length",
                min_value=10,
                max_value=512,
                value=150,
                step=10,
                help="Maximum number of tokens to generate"
            )
            
            num_sequences = st.slider(
                "Number of sequences",
                min_value=1,
                max_value=5,
                value=1,
                help="Number of different completions to generate"
            )
        
        with col2:
            temperature = st.slider(
                "Temperature",
                min_value=0.1,
                max_value=1.5,
                value=0.7,
                step=0.1,
                help="Higher values increase randomness, lower values make output more deterministic"
            )
            
            top_p = st.slider(
                "Top-p (nucleus sampling)",
                min_value=0.1,
                max_value=1.0,
                value=0.9,
                step=0.05,
                help="Controls diversity via nucleus sampling"
            )
    
    # Generation button
    if st.button("Generate Text", type="primary"):
        with st.spinner("Generating text... Please wait."):
            try:
                # Track generation time
                start_time = time.time()
                
                # Generate text
                results = generator(
                    prompt,
                    max_length=max_length,
                    num_return_sequences=num_sequences,
                    temperature=temperature,
                    top_p=top_p,
                    do_sample=True,
                    no_repeat_ngram_size=2
                )
                
                # Calculate generation time
                generation_time = time.time() - start_time
                
                # Display results
                st.success(f"Text generated successfully in {generation_time:.2f} seconds!")
                
                for i, result in enumerate(results):
                    generated_text = result['generated_text']
                    
                    # Split into prompt and new text for highlighting
                    new_text = generated_text[len(prompt):]
                    
                    with st.container():
                        st.markdown("### Result " + ("" if num_sequences == 1 else f"{i+1}"))
                        st.markdown('<div class="output-container">', unsafe_allow_html=True)
                        
                        # Display with the prompt and generated text in different styles
                        st.markdown(f"**Prompt:** {prompt}", unsafe_allow_html=True)
                        st.markdown(f"**Generated:** {new_text}", unsafe_allow_html=True)
                        
                        # Add copy button
                        st.text_area(
                            "Full generated text (copy from here)",
                            value=generated_text,
                            height=100,
                            label_visibility="collapsed"
                        )
                        st.markdown('</div>', unsafe_allow_html=True)
                        
            except Exception as e:
                st.error(f"An error occurred during text generation: {str(e)}")
    
    # Information section at the bottom
    st.markdown("---")
    st.markdown("""
    ### Tips for better results:
    
    - Provide a detailed and clear prompt
    - Adjust temperature to control randomness (higher = more creative but potentially less coherent)
    - Experiment with different models for different quality levels
    - For coherent stories, use a lower temperature (0.3-0.7)
    - For creative ideas or brainstorming, use a higher temperature (0.7-1.0)
    """)

if __name__ == "__main__":
    main()