# **Introduction**
---

## Pretrained Models Benefits
- Great starting point for development
- Save time and effort
- Require less data for specific use cases

## Business Use Case Example
**Scenario**: Travel agency chat application

**Requirements**:
- Specific response format and style
- Company-specific tone of voice
- Marketing alignment with brand voice

## Fine-Tuning vs Training From Scratch

### Fine-Tuning Benefits
- **Less time** required
- **Fewer compute resources** needed
- **Less data** required for customization
- Builds on existing pretrained model capabilities

### When to Fine-Tune
- Need specific response format or style
- Want consistent tone of voice
- Require domain-specific behavior
- Have limited training data or resources

## Implementation
Fine-tune base models from Azure AI Foundry model catalog → integrate with chat applications

## Key Takeaway
Fine-tuning customizes pretrained language models to specific needs with less time, data, and compute resources compared to training from scratch, making it ideal for businesses needing specialized response formats and company-specific tone of voice.

# **Understand when to fine-tune a language model**
---

## Model Optimization Strategies

### Three Main Approaches
1. **Prompt Engineering**: Quick and easy way to improve model behavior
2. **Retrieval Augmented Generation (RAG)**: Ground responses in specific data
3. **Fine-Tuning**: Train base model on custom dataset

## Strategy Selection Framework

### Context Optimization (What the model needs to know)
**Use**: RAG and Prompt Engineering

**Goal**: Provide factual, grounded information

**Example**: Travel booking catalog with hotel information

### Model Optimization (How the model needs to act)
**Use**: Fine-tuning (with or without other strategies)

**Goal**: Consistent style, format, and tone

**Example**: Company-specific tone of voice

## Prompt Engineering

### Capabilities
- Change question formatting
- Update system messages
- Quick implementation

### Limitations
- May not lead to consistent results
- Model might ignore instructions
- Behavior can vary

### Few-Shot Learning
**Technique**: Provide examples of desired output
- **One-shot**: Single example
- **Few-shot**: Multiple examples

**Issue**: Model doesn't always follow specified format despite examples

## When to Use RAG

### Best For
- Factual responses required
- Grounding in specific data sources
- Domain-specific knowledge bases
- Real-time or updated information

### Example Use Case
Customer questions about available hotels in travel catalog

## When to Fine-Tune

### Best For
- Specific style and format requirements
- Consistent tone of voice needed
- Behavioral consistency required
- Prompt engineering alone insufficient

### Key Advantage
**Maximizes consistency** of model behavior across all responses

## Combined Strategies

### Optimization Combinations
Can combine multiple approaches:
- RAG + Fine-tuning
- Prompt Engineering + RAG
- Prompt Engineering + Fine-tuning
- All three strategies together

### Decision Matrix

| Need | Primary Strategy | Secondary Strategy |
|------|-----------------|-------------------|
| Factual accuracy | RAG | Prompt Engineering |
| Consistent behavior | Fine-tuning | Prompt Engineering |
| Specific knowledge + style | Fine-tuning | RAG |
| Quick improvements | Prompt Engineering | None |

## Implementation Order

### Recommended Approach
1. **Start**: Prompt Engineering (fastest, easiest)
2. **Add context**: RAG (if factual grounding needed)
3. **Ensure consistency**: Fine-tuning (if behavioral consistency required)

## Key Takeaway
Fine-tuning is best used when you need consistent model behavior regarding style, format, and tone that prompt engineering alone cannot achieve. RAG addresses what the model knows (context), while fine-tuning addresses how the model acts (behavior). Strategies can be combined for optimal results.

# **Prepare your data to fine-tune a chat completion model**
---

## Dataset Requirements

### Data Quality Impact
- Dataset quality directly affects model quality
- Less data needed than training from scratch
- Still need enough data for behavioral consistency
- Amount needed depends on use case

### Three Required Components
1. **System message**: Defines AI behavior and role
2. **User message**: Example user input/question
3. **Assistant's response**: Model's ideal response

## JSONL File Format

### Single-Turn Example
```json
{
  "messages": [
    {
      "role": "system",
      "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."
    },
    {
      "role": "user",
      "content": "Is Xbox better than PlayStation?"
    },
    {
      "role": "assistant",
      "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"
    }
  ]
}
```

### Multi-Turn Example with Weights
```json
{
  "messages": [
    {
      "role": "system",
      "content": "Marv is a factual chatbot that is also sarcastic."
    },
    {
      "role": "user",
      "content": "What's the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris",
      "weight": 0
    },
    {
      "role": "user",
      "content": "Can you be more sarcastic?"
    },
    {
      "role": "assistant",
      "content": "Paris, as if everyone doesn't know that already.",
      "weight": 1
    }
  ]
}
```

## Weight Parameter Details

### Purpose
Control which assistant messages are used for training fine-tuned model

### Weight Values

**weight: 0**
- Message is **ignored** during training
- Not used to update model parameters
- Use when: Response is correct but not ideal example of desired behavior

**weight: 1**
- Message is **included** for training
- Used to update model parameters
- Use when: Response represents ideal behavior you want model to learn

### Use Cases for Weights

**Selective Training**:
- Include only specific turns in multi-turn conversations
- Focus training on particular response styles
- Filter out acceptable but non-exemplary responses

**Training Efficiency**:
- Reduce training on redundant patterns
- Emphasize important behavioral examples
- Control learning from specific interaction types

**Quality Control**:
- Exclude responses that are correct but lack desired tone
- Include only responses that perfectly match target behavior
- Fine-tune specific aspects of model behavior

### Strategic Weight Usage
- **Initial response (weight: 0)**: Basic factual answer, not showcasing desired style
- **Follow-up response (weight: 1)**: Enhanced response demonstrating target behavior (sarcasm, specific tone)
- **Result**: Model learns the enhanced behavior while maintaining factual accuracy

## Dataset Creation Best Practices

### Data Sources
- Real chat application history
- Curated example conversations
- Manually created ideal interactions

### Data Preparation Requirements

**Remove Sensitive Information**:
- Personal identifiable information (PII)
- Confidential business data
- Private customer details

**Ensure Diversity**:
- Various topics and scenarios
- Different conversation styles
- Multiple turn lengths
- Edge cases and exceptions

**Quality Over Quantity**:
- High-quality examples more valuable than large datasets
- Each example should demonstrate ideal behavior
- Diverse examples prevent overfitting

### Multi-Turn Conversations
- Can include multiple conversation turns on single line
- Represents realistic conversation flow
- Shows context handling and follow-up responses

## Dataset Preparation Steps

1. **Understand desired model behaviors**: Define target style, format, tone
2. **Create JSONL format dataset**: Structure with system, user, assistant messages
3. **Ensure high quality**: Include only ideal behavioral examples
4. **Add diversity**: Cover various scenarios and conversation types
5. **Apply weights strategically**: Control which responses train the model
6. **Remove sensitive data**: Protect privacy and confidentiality

## Key Takeaway
Fine-tuning datasets use JSONL format with system, user, and assistant messages. The optional weight parameter (0=ignore, 1=include) provides granular control over which assistant responses train the model, enabling selective learning of specific behaviors while maintaining conversation context.


# **Explore fine-tuning language models in Azure AI Foundry portal**
---

## Foundation Model Selection

### Available Through Azure AI Foundry Model Catalog
- Pretrained on large amounts of data
- Can fine-tune for various tasks: text classification, translation, chat completion

### Popular Chat Completion Models
- GPT-4
- Llama-2-7b

### Regional Availability Considerations
**Important**: Not all models available in all regions due to quota limitations
- Verify model availability in your AI hub region before starting
- Check quota allocation for desired model

## Model Selection Criteria

### Model Capabilities
- Evaluate how well capabilities align with task
- Example: BERT better for understanding short texts

### Pretraining Data
- Consider dataset used for pretraining
- Example: GPT-2 trained on unfiltered internet content (potential biases)

### Limitations and Biases
- Be aware of inherent model limitations
- Understand potential biases in pretrained models

### Language Support
- Check for specific language requirements
- Verify multilingual capabilities if needed

### Model Cards
- Descriptions available in Azure AI Foundry portal
- Detailed information on Hugging Face website
- Referenced in model overview

## Fine-Tuning Job Configuration

### Four Configuration Steps
1. Select base model
2. Select training data
3. (Optional) Select validation data
4. Configure advanced options

## Advanced Training Options

### batch_size
**Definition**: Number of training examples used in single forward and backward pass

**Considerations**:
- Larger batch sizes generally work better for larger datasets
- Default and maximum values are model-specific
- Larger batch size = less frequent parameter updates with lower variance

**Trade-offs**: Update frequency vs. stability

### learning_rate_multiplier
**Definition**: Fine-tuning learning rate = original pretraining learning rate × this multiplier

**Recommendations**:
- Larger learning rates work better with larger batch sizes
- Experiment with range 0.02 to 0.2
- Smaller learning rate helps avoid overfitting

**Use Cases**: Balance between training speed and avoiding overfitting

### n_epochs
**Definition**: Number of complete cycles through entire training dataset

**Purpose**: Controls how many times model sees all training data

**Consideration**: More epochs can improve learning but risk overfitting

### seed
**Definition**: Controls reproducibility of training job

**Characteristics**:
- Same seed + same parameters = same results (usually)
- Rare cases may differ
- Auto-generated if not specified

**Use Case**: Ensure reproducible experiments

## Training Job Lifecycle

### Job Submission
1. Submit fine-tuning job with configuration
2. Job created to train model
3. Monitor status during training

### Job Completion
- Review input parameters
- Understand how fine-tuned model was created
- Examine training configuration used

### Validation (Optional)
- If validation dataset included
- Review model performance metrics
- Assess generalization on validation data

## Model Deployment and Testing

### Deployment Options
- Can always deploy fine-tuned model
- Creates endpoint for model access

### Testing Process
1. Deploy fine-tuned model
2. Test model performance
3. Assess quality of responses
4. Verify desired behavior

### Integration
- Once satisfied with performance
- Integrate deployed model with chat application
- Use model endpoint in application code

## Complete Workflow

```
Select Base Model
    ↓
Configure Training Data
    ↓
(Optional) Add Validation Data
    ↓
Set Advanced Options
    ↓
Submit Fine-Tuning Job
    ↓
Monitor Training Status
    ↓
Review Performance (if validation data)
    ↓
Deploy Model
    ↓
Test Model
    ↓
Integrate with Application
```

## Best Practices

### Model Selection
- Filter by fine-tuning task in catalog
- Review model cards on Hugging Face
- Consider regional availability and quota
- Evaluate alignment with use case

### Configuration
- Start with default values for advanced options
- Experiment with learning rate (0.02-0.2 range)
- Balance batch size with dataset size
- Use validation data to assess performance

### Testing and Deployment
- Always test before production integration
- Review validation metrics if available
- Verify behavior matches desired outcomes
- Monitor performance after deployment

## Key Takeaway
Fine-tuning in Azure AI Foundry involves selecting an appropriate base model from the catalog, configuring training with your JSONL dataset, setting advanced options (batch_size, learning_rate_multiplier, n_epochs, seed), monitoring the training job, and deploying the model for testing and integration with applications.

# **Quiz**
---
## Question 1: Data Format for Fine-Tuning
**Question**: How must data be formatted for fine-tuning?

**Correct Answer**: JSONL

**Explanation**: JSONL (JSON Lines) format with system, user, and assistant messages is required for fine-tuning language models.

**Wrong Answers**:
- ❌ YAML: Not a supported format for fine-tuning
- ❌ HTML: Web markup language, not training data format

## Question 2: Fine-Tuning Purpose
**Question**: What does fine-tuning optimize in your model?

**Correct Answer**: How the model needs to act

**Explanation**: Fine-tuning optimizes model behavior - style, format, tone, and consistency of responses.

**Wrong Answers**:
- ❌ What the model needs to know: This is addressed by RAG (Retrieval Augmented Generation)
- ❌ Which words aren't allowed: Content filtering, not fine-tuning purpose

## Question 3: Training Cycle Parameter
**Question**: Which advanced option refers to one full cycle through the training dataset?

**Correct Answer**: n_epochs

**Explanation**: An epoch is one complete pass through the entire training dataset.

**Wrong Answers**:
- ❌ seed: Controls reproducibility of training job
- ❌ batch_size: Number of examples in single forward/backward pass

## Key Patterns

**JSONL Format** = Required data structure for fine-tuning
**Fine-tuning** = Optimizes how model acts (behavior)
**n_epochs** = Number of complete training cycles

## Optimization Framework

| Optimization Type | Strategy | Purpose |
|------------------|----------|---------|
| **What model knows** | RAG | Context and factual data |
| **How model acts** | Fine-tuning | Style, format, tone |

## Quick Reference

| Parameter | Definition |
|-----------|------------|
| **JSONL** | JSON Lines format with system/user/assistant messages |
| **n_epochs** | Complete cycles through training dataset |
| **batch_size** | Examples per training pass |
| **seed** | Reproducibility control |

# **Code Exercise**
---

## Lab Objective
Compare fine-tuned model vs base model to evaluate which better fits specific behavioral requirements (consistent, friendly conversational tone for travel agency chat app)

## Supported Regions for Fine-Tuning
- East US 2
- North Central US
- Sweden Central

*Note: At time of writing, these regions support fine-tuning for gpt-4o models*

## Initial Model Deployment

### Base Model Setup
- **Model**: gpt-4o
- **Deployment type**: Global standard
- **TPM limit**: 50K (or maximum available)
- **Purpose**: Testing baseline before fine-tuning

## Fine-Tuning Configuration

### Dataset Requirements
- **Format**: JSONL file
- **Content**: Training data with system/user/assistant messages
- **Source**: Download training dataset (travel-finetune-hotel.jsonl)

### Fine-Tuning Settings
- **Method**: Supervised
- **Base model**: gpt-4o (default version)
- **Training data**: Upload JSONL file
- **Model suffix**: ft-travel (or custom name)
- **Seed**: Random
- **Duration**: 30+ minutes (variable based on resources)

### Monitoring Progress
- View Fine-tuning page under Build and customize
- Check job status and logs
- Use Refresh button to update view

## Base Model Testing (While Waiting)

### Test Scenarios

**1. Generic Test**
- Prompt: "What can you do?"
- Expected: Generic responses

**2. With Basic System Message**
```
You are an AI assistant that helps people plan their travel.
```
- Result: May offer hotel/flight/car bookings (undesired behavior)

**3. With Improved System Message**
```
You are an AI travel assistant that helps people plan their trips. Your objective is to offer support for travel-related inquiries, such as visa requirements, weather forecasts, local attractions, and cultural norms.
You should not provide any hotel, flight, rental car or restaurant recommendations.
Ask engaging questions to help someone plan their trip and think about what they want to do on their holiday.
```

### Test Questions
1. "Where in Rome should I stay?"
2. "I'm mostly there for the food. Where should I stay to be within walking distance of affordable restaurants?"
3. "What are some local delicacies I should try?"
4. "When is the best time of year to visit in terms of the weather?"
5. "What's the best way to get around the city?"

**Observation**: Note tone and writing style consistency

## Training Data Structure

### JSONL Format Example
```json
{
  "messages": [
    {
      "role": "system",
      "content": "You are an AI travel assistant that helps people plan their trips. Your objective is to offer support for travel-related inquiries, such as visa requirements, weather forecasts, local attractions, and cultural norms. You should not provide any hotel, flight, rental car or restaurant recommendations. Ask engaging questions to help someone plan their trip and think about what they want to do on their holiday."
    },
    {
      "role": "user",
      "content": "What's a must-see in Paris?"
    },
    {
      "role": "assistant",
      "content": "Oh la la! You simply must twirl around the Eiffel Tower and snap a chic selfie! After that, consider visiting the Louvre Museum to see the Mona Lisa and other masterpieces. What type of attractions are you most interested in?"
    }
  ]
}
```

### Key Components
- **System message**: Same instruction set across all examples
- **User content**: Travel-related queries
- **Assistant content**: Responses demonstrating desired style and tone
- **Purpose**: Train model on specific conversational style

## Fine-Tuned Model Deployment

### When Fine-Tuning Completes
1. Navigate to Fine-tuning page
2. Select fine-tuning job to view details
3. Review Metrics tab for fine-tune metrics

### Deployment Configuration
- **Deployment name**: Custom valid name
- **Deployment type**: Standard
- **TPM limit**: 50K (or maximum available)
- **Content filter**: Default
- **Provisioning**: Wait until state shows "succeeded"

## Testing Fine-Tuned Model

### Setup
1. Open fine-tuned model in playground
2. Verify system message matches training data instructions
3. Use same test questions as base model

### Comparison Criteria
- **Consistency**: Does model maintain desired behavior?
- **Tone**: Is conversational style aligned with training examples?
- **Format**: Are responses structured as intended?
- **Accuracy**: Does model avoid unwanted recommendations?

### Expected Improvements
- More consistent adherence to system instructions
- Better tone matching training examples
- Reduced instances of ignoring instructions
- More predictable response patterns

## Key Evaluation Points

### Base Model Characteristics
- Responds to system messages
- May inconsistently follow instructions
- Generic conversational style
- Variable behavior across similar prompts

### Fine-Tuned Model Advantages
- **Consistency**: More reliable behavior
- **Style**: Matches training data examples
- **Format**: Adheres to desired structure
- **Tone**: Reflects training examples' personality

## Workflow Summary

```
1. Deploy base gpt-4o model
    ↓
2. Test base model with system messages
    ↓
3. Start fine-tuning job (30+ min)
    ↓
4. Review training data structure
    ↓
5. Monitor fine-tuning progress
    ↓
6. Deploy fine-tuned model
    ↓
7. Test fine-tuned model
    ↓
8. Compare base vs fine-tuned results
```

## Key Takeaway
Fine-tuning provides more consistent behavioral patterns than prompt engineering alone. Training data with system/user/assistant examples teaches the model specific conversational styles and tones. Compare base and fine-tuned models using identical test scenarios to evaluate improvement in consistency and adherence to desired behavior.