# 📘 Motivation for Fine-tuning Models

Welcome to the first notebook in our **Finetune-Your-Own-Model** tutorial series! Before diving into the technical details, let's understand *why* fine-tuning has become such a pivotal technique in modern machine learning.

## What You'll Learn
- What fine-tuning is and how it differs from training from scratch
- When you should consider fine-tuning a pre-trained model
- The tangible benefits and potential limitations
- Real-world examples showing the impact of fine-tuning
- A framework to decide if fine-tuning is right for your project

## 1. Fine-tuning vs. Training from Scratch

### The Traditional Approach: Training from Scratch

Historically, building a machine learning model involved:
- Collecting large amounts of task-specific data
- Designing a model architecture
- Initializing model weights randomly
- Training the model until convergence (often days or weeks)

This approach requires substantial computational resources, extensive datasets, and significant time investment.

### The Modern Approach: Fine-tuning

Fine-tuning leverages **transfer learning** by:
- Starting with a pre-trained model that has learned general features from a large dataset
- Adapting this model to a specific downstream task using a smaller, task-specific dataset
- Updating some or all of the pre-trained weights to better fit the new task

Think of it as teaching a model that already "understands" the world to focus on your specific problem.

In [None]:
# Let's visualize the difference between training from scratch and fine-tuning
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np

# Create sample data to simulate learning curves
epochs = np.arange(1, 101)
accuracy_scratch = 70 * (1 - np.exp(-epochs/30)) + 10 + np.random.normal(0, 1, 100)  # Slower convergence
accuracy_finetune = 85 * (1 - np.exp(-epochs/10)) + 10 + np.random.normal(0, 0.5, 100)  # Faster convergence, higher ceiling

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": False}]])

# Add traces
fig.add_trace(
    go.Scatter(x=epochs, y=accuracy_scratch, name="Training from Scratch", line=dict(color="#EF553B", width=2)),
)

fig.add_trace(
    go.Scatter(x=epochs, y=accuracy_finetune, name="Fine-tuning", line=dict(color="#636EFA", width=2)),
)

# Add annotations
fig.add_annotation(
    x=80, y=accuracy_finetune[79],
    text="Higher performance",
    showarrow=True,
    arrowhead=1,
    ax=50, ay=-40
)

fig.add_annotation(
    x=20, y=accuracy_finetune[19],
    text="Faster convergence",
    showarrow=True,
    arrowhead=1,
    ax=50, ay=30
)

# Add figure title and axis labels
fig.update_layout(
    title="Learning Curves: Fine-tuning vs Training from Scratch",
    xaxis_title="Training Epochs",
    yaxis_title="Accuracy (%)",
    legend=dict(y=0.99, x=0.01),
    template="plotly_white",
    height=500,
    hovermode="x unified"
)

fig.show()

## 2. When Should You Consider Fine-tuning?

Fine-tuning is particularly advantageous in these common scenarios:

### Domain Adaptation
- You need a model for a specific domain (medical, legal, financial) but have limited domain-specific data
- Example: Adapting a general language model to understand medical terminology and relationships

### Task Specialization
- You want to perform a specific task that's related to what the pre-trained model has learned
- Example: Taking an image classification model and fine-tuning it for object detection

### Resource Constraints
- You have limited computational resources or training time
- You have a small dataset that would be insufficient for training from scratch

### Low-Resource Languages or Domains
- You're working with languages or domains that have limited available data
- Example: Fine-tuning a multilingual model for a specific low-resource language

### Incremental Learning
- You want to update an existing model with new data or capabilities without starting over
- Example: Adding recognition of new product categories to an existing product classifier

In [None]:
# Visualize the relationship between dataset size and model performance
import plotly.graph_objects as go
import numpy as np

# Create sample data
dataset_sizes = np.array([100, 500, 1000, 5000, 10000, 50000, 100000, 500000])
log_sizes = np.log10(dataset_sizes)

# Performance curves (simulated)
performance_scratch = 30 + 55 * (1 - np.exp(-(log_sizes - 2) / 1.5))
performance_finetune = 65 + 30 * (1 - np.exp(-(log_sizes - 2) / 1))

# Create the figure
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(
    x=dataset_sizes, 
    y=performance_scratch,
    mode='lines+markers',
    name='Training from Scratch',
    line=dict(color='#EF553B', width=2),
    marker=dict(size=8)
))

fig.add_trace(go.Scatter(
    x=dataset_sizes, 
    y=performance_finetune,
    mode='lines+markers',
    name='Fine-tuning',
    line=dict(color='#636EFA', width=2),
    marker=dict(size=8)
))

# Add a vertical line to indicate a "typical" small dataset scenario
fig.add_shape(
    type="line",
    x0=1000, y0=0,
    x1=1000, y1=100,
    line=dict(color="gray", width=1, dash="dash"),
)

# Add annotation for the small dataset scenario
fig.add_annotation(
    x=1000,
    y=45,
    text="Small dataset<br>scenario",
    showarrow=False,
    yshift=10,
    font=dict(size=12)
)

# Add annotation for the performance gap
fig.add_annotation(
    x=1000,
    y=(performance_finetune[2] + performance_scratch[2]) / 2,
    text=f"Performance gap:<br>{performance_finetune[2] - performance_scratch[2]:.1f}%",
    showarrow=True,
    arrowhead=2,
    arrowsize=1,
    arrowwidth=1,
    arrowcolor="#555",
    ax=50,
    ay=0,
    font=dict(size=12)
)

# Update layout
fig.update_layout(
    title='Model Performance vs. Dataset Size',
    xaxis=dict(
        title='Dataset Size (number of examples)',
        type='log',
        tickvals=dataset_sizes,
        ticktext=[str(s) for s in dataset_sizes],
    ),
    yaxis=dict(
        title='Performance (accuracy %)',
        range=[30, 100]
    ),
    legend=dict(y=0.99, x=0.01),
    template="plotly_white",
    height=500,
    hovermode="x unified"
)

fig.show()

## 3. Benefits of Fine-tuning

### Reduced Computational Costs
- **Training from scratch**: May require hundreds of GPUs running for weeks
- **Fine-tuning**: Can often be done on a single GPU in hours or days
- Example: Training GPT-3 from scratch cost millions of dollars, but fine-tuning it can cost under $100

### Faster Development Cycles
- Shorter training times mean faster iterations and experimentation
- Enables rapid prototyping and validation of ideas
- Typical fine-tuning runs take hours instead of weeks

### Higher Performance with Less Data
- Pre-trained models already contain general knowledge that transfers to new tasks
- Fine-tuned models often outperform models trained from scratch, especially with limited data
- Can achieve state-of-the-art results with just hundreds or thousands of examples

### Democratization of Advanced AI
- Makes cutting-edge models accessible to researchers and developers with limited resources
- Enables specialized applications that wouldn't be feasible with from-scratch training

In [None]:
# Visualize the computational cost comparison
import plotly.graph_objects as go
import numpy as np

# Data for computational resources
models = ['BERT-base', 'RoBERTa-large', 'GPT-2', 'ViT-Large', 'T5-large']
training_costs = [3000, 15000, 30000, 10000, 25000]  # Approximate GPU hours
finetuning_costs = [10, 40, 100, 30, 80]  # Approximate GPU hours

# Create figure
fig = go.Figure()

# Add bars
fig.add_trace(go.Bar(
    x=models,
    y=training_costs,
    name='Training from Scratch',
    marker_color='#EF553B'
))

fig.add_trace(go.Bar(
    x=models,
    y=finetuning_costs,
    name='Fine-tuning',
    marker_color='#636EFA'
))

# Update layout
fig.update_layout(
    title='Computational Cost: Training vs. Fine-tuning (Lower is Better)',
    xaxis_title='Model Architecture',
    yaxis_title='GPU Hours (log scale)',
    yaxis_type='log',  # Log scale to show the dramatic difference
    barmode='group',
    template="plotly_white",
    height=500,
    legend=dict(y=0.99, x=0.99, xanchor='right')
)

# Add annotations to highlight the difference
for i, model in enumerate(models):
    ratio = training_costs[i] / finetuning_costs[i]
    fig.add_annotation(
        x=i,
        y=finetuning_costs[i] * 2,
        text=f"{ratio:.0f}x faster",
        showarrow=False,
        font=dict(size=10, color='white'),
        bgcolor='rgba(0,0,0,0.5)',
        bordercolor='rgba(0,0,0,0)',
        borderwidth=1,
        borderpad=2,
        opacity=0.8
    )

fig.show()

## 4. Trade-offs and Limitations

While fine-tuning offers many advantages, it's important to understand its limitations:

### Inheriting Biases and Limitations
- Fine-tuned models inherit biases present in the pre-trained model
- Example: A language model trained on internet text may contain social biases that transfer to fine-tuned applications

### Catastrophic Forgetting
- Aggressive fine-tuning can cause the model to "forget" useful general knowledge from pre-training
- Mitigation strategies include:
  - Parameter-efficient fine-tuning (PEFT)
  - Careful learning rate scheduling
  - Regularization techniques

### Domain Mismatch
- If your target domain differs significantly from the pre-training domain, benefits may be limited
- Example: Using a model pre-trained on natural images for medical imaging may require more extensive fine-tuning

### Architectural Constraints
- You're constrained by the architecture of the pre-trained model
- Making significant architectural changes often negates the benefits of pre-training

### Technical Complexity
- Fine-tuning can involve complex hyperparameter tuning
- Requires understanding of the pre-trained model's architecture and capabilities

In [None]:
# Visualize catastrophic forgetting during fine-tuning
import plotly.graph_objects as go
import numpy as np

# Simulated data
epochs = np.arange(1, 51)
target_task_perf = 65 + 30 * (1 - np.exp(-epochs/10))  # Performance on the target task improves
general_knowledge = 95 - 30 * (1 - np.exp(-epochs/15))  # Performance on general tasks declines
balanced_approach = 95 - 10 * (1 - np.exp(-epochs/25))  # Less forgetting with careful tuning

# Create figure
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(
    x=epochs,
    y=target_task_perf,
    mode='lines',
    name='Target Task Performance',
    line=dict(color='#636EFA', width=2)
))

fig.add_trace(go.Scatter(
    x=epochs,
    y=general_knowledge,
    mode='lines',
    name='General Knowledge (Aggressive Fine-tuning)',
    line=dict(color='#EF553B', width=2)
))

fig.add_trace(go.Scatter(
    x=epochs,
    y=balanced_approach,
    mode='lines',
    name='General Knowledge (Careful Fine-tuning)',
    line=dict(color='#00CC96', width=2, dash='dash')
))

# Add annotation for catastrophic forgetting
fig.add_annotation(
    x=40,
    y=general_knowledge[39],
    text="Catastrophic<br>Forgetting",
    showarrow=True,
    arrowhead=1,
    ax=40,
    ay=-30
)

# Add annotation for the balanced approach
fig.add_annotation(
    x=40,
    y=balanced_approach[39],
    text="Balanced<br>Approach",
    showarrow=True,
    arrowhead=1,
    ax=-40,
    ay=-20
)

# Update layout
fig.update_layout(
    title='The Challenge of Catastrophic Forgetting During Fine-tuning',
    xaxis_title='Fine-tuning Epochs',
    yaxis_title='Performance (%)',
    template="plotly_white",
    height=500,
    hovermode="x unified",
    legend=dict(y=0.99, x=0.01)
)

fig.show()

## 5. Real-World Examples: Before and After Fine-tuning

Let's look at some concrete examples of fine-tuning success stories across different domains:

### Example 1: Text Classification - Sentiment Analysis

**Before Fine-tuning**: Using BERT pre-trained on general text
- Accuracy on financial sentiment dataset: 76%
- Struggles with domain-specific terminology and context

**After Fine-tuning**: BERT fine-tuned on financial news and reports
- Accuracy: 92%
- Correctly interprets financial terms and industry-specific sentiment
- Training required only 5,000 labeled examples instead of millions

### Example 2: Image Classification - Medical Imaging

**Before Fine-tuning**: ResNet-50 pre-trained on ImageNet
- Accuracy on chest X-ray classification: 68%
- Fails to identify subtle medical features

**After Fine-tuning**: ResNet-50 fine-tuned on medical images
- Accuracy: 87%
- Successfully identifies pneumonia, COVID-19 markers, and other conditions
- Achieved with only 15,000 labeled medical images

### Example 3: Natural Language Generation - Legal Document Drafting

**Before Fine-tuning**: GPT-2 generating legal text
- Generated text uses casual language, lacks proper structure
- Inaccurate use of legal terminology
- Low usability score from legal professionals: 3.2/10

**After Fine-tuning**: GPT-2 fine-tuned on legal documents
- Generates properly structured legal documents with appropriate terminology
- Maintains formal tone and follows legal writing conventions
- Usability score from legal professionals: 7.8/10
- Fine-tuned with only 10,000 examples of legal documents

In [None]:
# Visualize real-world performance improvements
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Data
domains = ['Financial Sentiment', 'Medical Imaging', 'Legal Document Generation']
before_scores = [76, 68, 32]  # Before fine-tuning (percentage)
after_scores = [92, 87, 78]   # After fine-tuning (percentage)
data_sizes = [5000, 15000, 10000]  # Dataset sizes used for fine-tuning
models = ['BERT', 'ResNet-50', 'GPT-2']  # Base models

# Create subplots
fig = make_subplots(rows=1, cols=3, 
                    subplot_titles=domains,
                    specs=[[{'type': 'domain'}, {'type': 'domain'}, {'type': 'domain'}]])

colors = ['#EF553B', '#636EFA']
labels = ['Before Fine-tuning', 'After Fine-tuning']

# Add pie charts
for i, domain in enumerate(domains):
    fig.add_trace(go.Pie(
        labels=labels,
        values=[before_scores[i], after_scores[i] - before_scores[i]],
        name=domain,
        marker=dict(colors=colors),
        textinfo='label+percent',
        hole=.3,
        hoverinfo='label+percent+name',
        showlegend=(i==0)  # Only show legend for the first pie
    ), row=1, col=i+1)
    
    # Add annotations
    fig.add_annotation(
        text=f"<b>Model:</b> {models[i]}<br><b>Data:</b> {data_sizes[i]} examples",
        x=0.5, y=-0.2,
        xref=f"x{i+1}", yref=f"y{i+1}",
        showarrow=False,
        font=dict(size=10)
    )

# Update layout
fig.update_layout(
    title_text="Performance Improvements After Fine-tuning Across Domains",
    height=500,
    template="plotly_white",
    legend=dict(y=1.1, x=0.5, xanchor='center', orientation='h'),
    margin=dict(t=120, b=80)
)

fig.show()

## 6. Cost-Benefit Analysis Framework

To decide whether fine-tuning is the right approach for your project, consider this framework:

### Step 1: Assess Your Resources
- **Data availability**: How much task-specific data do you have?
- **Compute resources**: What GPU/TPU resources can you access?
- **Time constraints**: What's your development timeline?
- **Budget**: What's your financial constraint for training?

### Step 2: Evaluate Task Similarity
- How similar is your task to what the pre-trained model was trained on?
- Are there domain-specific nuances that the pre-trained model might miss?
- Is there a pre-trained model available that's reasonably close to your domain?

### Step 3: Consider Performance Requirements
- What level of performance do you need for your application?
- Is there a minimum accuracy/quality threshold your solution must meet?
- How important is inference speed vs. absolute accuracy?

### Step 4: Calculate ROI
- Compare the estimated costs (time, compute, data collection) of fine-tuning vs. training from scratch
- Estimate the performance difference between the approaches
- Consider the opportunity cost of longer development time

### Decision Matrix

| Scenario | Recommendation |
|----------|----------------|
| Limited data (<10k examples) | **Fine-tune** - Training from scratch likely won't work well |
| Limited compute | **Fine-tune** - Significantly lower resource requirements |
| Tight timeline | **Fine-tune** - Faster to reach acceptable performance |
| Very different domain from available pre-trained models | **Consider domain-adaptive pre-training** before fine-tuning |
| Need for architectural innovation | **Train from scratch** if your architecture differs significantly |
| Extremely high performance requirements | **Fine-tune, then distill** for optimal results |

In [None]:
# Interactive decision tool for fine-tuning
import plotly.graph_objects as go
import ipywidgets as widgets
from IPython.display import display

# Create interactive widgets
data_size = widgets.IntSlider(
    value=5000,
    min=100,
    max=100000,
    step=100,
    description='Dataset Size:',
    style={'description_width': 'initial'}
)

domain_similarity = widgets.IntSlider(
    value=70,
    min=0,
    max=100,
    step=5,
    description='Domain Similarity to Pre-trained Data (%):',
    style={'description_width': 'initial'}
)

compute_constraint = widgets.IntSlider(
    value=50,
    min=0,
    max=100,
    step=5,
    description='Compute Constraint Level (%):',
    style={'description_width': 'initial'}
)

time_constraint = widgets.IntSlider(
    value=50,
    min=0,
    max=100,
    step=5,
    description='Time Constraint Level (%):',
    style={'description_width': 'initial'}
)

performance_requirement = widgets.IntSlider(
    value=80,
    min=50,
    max=100,
    step=5,
    description='Performance Requirement (%):',
    style={'description_width': 'initial'}
)

# Function to update the recommendation
def update_recommendation(change):
    # Calculate scores
    finetune_score = 0
    scratch_score = 0
    
    # Data size factor
    if data_size.value < 1000:
        finetune_score += 30
        scratch_score -= 20
    elif data_size.value < 10000:
        finetune_score += 20
        scratch_score -= 5
    elif data_size.value < 50000:
        finetune_score += 10
        scratch_score += 10
    else:
        finetune_score += 5
        scratch_score += 20
    
    # Domain similarity factor
    if domain_similarity.value < 30:
        finetune_score -= 15
        scratch_score += 15
    elif domain_similarity.value < 60:
        finetune_score += 5
        scratch_score += 5
    else:
        finetune_score += 20
        scratch_score -= 5
    
    # Compute constraint factor
    if compute_constraint.value > 70:  # High constraint
        finetune_score += 25
        scratch_score -= 15
    elif compute_constraint.value > 40:  # Medium constraint
        finetune_score += 15
        scratch_score -= 5
    else:  # Low constraint
        finetune_score += 5
        scratch_score += 10
    
    # Time constraint factor
    if time_constraint.value > 70:  # High constraint
        finetune_score += 25
        scratch_score -= 15
    elif time_constraint.value > 40:  # Medium constraint
        finetune_score += 15
        scratch_score -= 5
    else:  # Low constraint
        finetune_score += 5
        scratch_score += 10
    
    # Performance requirement factor
    if performance_requirement.value > 90:  # Very high
        if domain_similarity.value > 80:  # Similar domain
            finetune_score += 15
            scratch_score += 10
        else:  # Different domain
            finetune_score += 5
            scratch_score += 15
    elif performance_requirement.value > 75:  # High
        finetune_score += 15
        scratch_score += 5
    else:  # Moderate
        finetune_score += 20
        scratch_score -= 5
    
    # Normalize scores to 0-100
    max_possible = 100
    finetune_score = min(max(finetune_score, 0), 100)
    scratch_score = min(max(scratch_score, 0), 100)
    
    # Update the gauge chart
    with output:
        fig = go.Figure()
        
        fig.add_trace(go.Indicator(
            mode = "gauge+number",
            value = finetune_score,
            title = {'text': "Fine-tuning Score"},
            domain = {'x': [0, 0.45], 'y': [0, 1]},
            gauge = {
                'axis': {'range': [0, 100]},
                'bar': {'color': "#636EFA"},
                'steps': [
                    {'range': [0, 33], 'color': "#EF553B"},
                    {'range': [33, 66], 'color': "#FFA15A"},
                    {'range': [66, 100], 'color': "#00CC96"}
                ]
            }
        ))
        
        fig.add_trace(go.Indicator(
            mode = "gauge+number",
            value = scratch_score,
            title = {'text': "Training from Scratch Score"},
            domain = {'x': [0.55, 1], 'y': [0, 1]},
            gauge = {
                'axis': {'range': [0, 100]},
                'bar': {'color': "#EF553B"},
                'steps': [
                    {'range': [0, 33], 'color': "#EF553B"},
                    {'range': [33, 66], 'color': "#FFA15A"},
                    {'range': [66, 100], 'color': "#00CC96"}
                ]
            }
        ))
        
        # Determine recommendation
        if finetune_score > scratch_score + 20:
            recommendation = "<b>Strong recommendation: Fine-tune</b><br>Fine-tuning is clearly the better approach for your scenario."
            rec_color = "#00CC96"
        elif finetune_score > scratch_score:
            recommendation = "<b>Recommendation: Fine-tune</b><br>Fine-tuning has an edge, but consider your specific requirements."
            rec_color = "#00CC96"
        elif scratch_score > finetune_score + 20:
            recommendation = "<b>Strong recommendation: Train from scratch</b><br>Your scenario favors training a custom model from scratch."
            rec_color = "#EF553B"
        else:
            recommendation = "<b>Recommendation: Train from scratch</b><br>Training from scratch has a slight advantage, but fine-tuning could still work."
            rec_color = "#EF553B"
        
        # Add recommendation annotation
        fig.add_annotation(
            xref="paper", yref="paper",
            x=0.5, y=0.25,
            text=recommendation,
            showarrow=False,
            font=dict(size=14, color=rec_color),
            align="center",
            bgcolor="rgba(255, 255, 255, 0.8)",
            bordercolor=rec_color,
            borderwidth=2,
            borderpad=4,
            width=400
        )
        
        fig.update_layout(
            title="Fine-tuning Decision Assistant",
            height=400,
            margin=dict(t=100, b=100, l=50, r=50),
            template="plotly_white"
        )
        
        fig.show()

# Connect the update function to the widgets
data_size.observe(update_recommendation, names='value')
domain_similarity.observe(update_recommendation, names='value')
compute_constraint.observe(update_recommendation, names='value')
time_constraint.observe(update_recommendation, names='value')
performance_requirement.observe(update_recommendation, names='value')

# Create output area for the chart
output = widgets.Output()

# Display everything
display(widgets.VBox([
    widgets.HTML(value="<h3>Adjust the parameters below to see if fine-tuning is right for your project:</h3>"),
    data_size,
    domain_similarity,
    compute_constraint,
    time_constraint,
    performance_requirement,
    output
]))

# Initialize the chart
update_recommendation(None)

## Summary: When to Fine-tune

Fine-tuning is typically the right approach when:

✅ You have limited labeled data for your specific task  
✅ Your task is related to what the pre-trained model has learned  
✅ You have computational constraints (time, budget, hardware)  
✅ You need to iterate quickly  
✅ You're working with a low-resource domain or language  

Training from scratch might be better when:

❌ Your domain is radically different from available pre-trained models  
❌ You need a significantly different model architecture  
❌ You have abundant domain-specific data  
❌ You need to avoid inheriting biases from pre-trained models  

## Next Steps

Now that you understand the motivation and benefits of fine-tuning, let's move on to the next notebook: **Data Collection and Preparation**. This is arguably the most critical step in the fine-tuning process, as the quality and structure of your dataset will significantly impact the performance of your fine-tuned model.