# üîÑ Fine-Tune LLMs with Replicate

This notebook guides you through fine-tuning a Large Language Model using Replicate.

**Why Replicate?**
- Pay-per-use pricing (no upfront costs)
- Simple API and deployment
- Good for experimentation
- Automatic model hosting after training

**What you'll learn:**
1. Prepare training data in Replicate's format
2. Create a fine-tuning training
3. Monitor training progress
4. Use your fine-tuned model

**Prerequisites:**
- Replicate account (https://replicate.com)
- API token from Replicate

## 1. Setup & Installation

In [None]:
# Install required packages
!pip install replicate pandas

In [None]:
import os
import json
import replicate
import pandas as pd
from pathlib import Path

# Set your Replicate API token
try:
    from google.colab import userdata
    REPLICATE_API_TOKEN = userdata.get('REPLICATE_API_TOKEN')
except:
    REPLICATE_API_TOKEN = input("Enter your Replicate API token: ")

os.environ['REPLICATE_API_TOKEN'] = REPLICATE_API_TOKEN

print("‚úÖ Replicate client configured")

## 2. Dataset Preparation

Replicate uses a simple **JSONL format** for fine-tuning:

```json
{"prompt": "<prompt text>", "completion": "<completion text>"}
```

Or for chat-style models:
```json
{"text": "<s>[INST] <<SYS>>\nSystem prompt\n<</SYS>>\n\nUser message [/INST] Assistant response </s>"}
```

### Key Requirements:
- JSONL format (one JSON object per line)
- Publicly accessible URL (we'll use GitHub or upload service)
- Minimum ~10 examples, recommended 100+

In [None]:
# ============================================================
# Sample Dataset: Code Review Assistant
# Replace with your own domain-specific data!
# ============================================================

SYSTEM_PROMPT = """You are an expert code reviewer. You provide constructive, actionable feedback on code.
Focus on:
- Code correctness and potential bugs
- Performance considerations
- Readability and maintainability
- Security concerns
Be specific and provide examples when suggesting improvements."""

# Llama 2/3 chat format
def format_for_llama(system: str, user: str, assistant: str) -> dict:
    """Format data for Llama-style fine-tuning."""
    text = f"<s>[INST] <<SYS>>\n{system}\n<</SYS>>\n\n{user} [/INST] {assistant} </s>"
    return {"text": text}

# Alternative: Simple prompt/completion format
def format_simple(user: str, assistant: str) -> dict:
    """Simple prompt-completion format."""
    return {
        "prompt": f"### Instruction:\n{user}\n\n### Response:\n",
        "completion": assistant
    }

sample_data = [
    format_for_llama(
        system=SYSTEM_PROMPT,
        user="""Review this Python function:
```python
def get_user(id):
    conn = sqlite3.connect('db.sqlite')
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {id}")
    return cursor.fetchone()
```""",
        assistant="""I've identified several critical issues with this code:

## üî¥ Critical: SQL Injection Vulnerability
The f-string interpolation makes this vulnerable to SQL injection:
```python
# Dangerous - an attacker could pass: "1; DROP TABLE users--"
cursor.execute(f"SELECT * FROM users WHERE id = {id}")
```

**Fix:** Use parameterized queries:
```python
cursor.execute("SELECT * FROM users WHERE id = ?", (id,))
```

## üü° Resource Leak
The connection is never closed, which can exhaust database connections.

**Fix:** Use a context manager:
```python
def get_user(user_id: int) -> Optional[tuple]:
    with sqlite3.connect('db.sqlite') as conn:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
        return cursor.fetchone()
```

## üü¢ Minor Improvements
- Rename `id` to `user_id` (shadows built-in)
- Add type hints
- Consider returning `None` explicitly when not found"""
    ),
    format_for_llama(
        system=SYSTEM_PROMPT,
        user="""Review this JavaScript code:
```javascript
async function fetchData() {
    const response = await fetch('/api/data');
    const data = await response.json();
    return data;
}
```""",
        assistant="""This code works but lacks important error handling. Here's my review:

## üü° Missing Error Handling
The function doesn't handle failed requests or invalid JSON.

**Issues:**
1. No check for `response.ok` (HTTP errors like 404, 500)
2. No try-catch for network failures or JSON parse errors
3. No timeout handling for slow responses

**Improved version:**
```javascript
async function fetchData() {
    try {
        const response = await fetch('/api/data');
        
        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }
        
        const data = await response.json();
        return data;
    } catch (error) {
        console.error('Failed to fetch data:', error);
        throw error; // Re-throw to let caller handle
    }
}
```

## üü¢ Good Practices Already Present
- Uses async/await (readable)
- Simple, single-purpose function

Consider adding a timeout with `AbortController` for production code."""
    ),
    format_for_llama(
        system=SYSTEM_PROMPT,
        user="""Review this React component:
```jsx
function UserList({ users }) {
    return (
        <ul>
            {users.map(user => (
                <li>{user.name}</li>
            ))}
        </ul>
    );
}
```""",
        assistant="""This is a clean component with one important fix needed:

## üü° Missing Key Prop
React requires a unique `key` prop when rendering lists for efficient reconciliation.

**Current issue:**
```jsx
<li>{user.name}</li>  // Missing key - React will warn
```

**Fix:**
```jsx
<li key={user.id}>{user.name}</li>
```

## üü° Edge Case: Empty/Null Users
Consider handling cases where `users` might be undefined or empty.

**Improved version:**
```jsx
function UserList({ users = [] }) {
    if (users.length === 0) {
        return <p>No users found.</p>;
    }
    
    return (
        <ul>
            {users.map(user => (
                <li key={user.id}>{user.name}</li>
            ))}
        </ul>
    );
}
```

## üü¢ What's Good
- Functional component (modern React)
- Props destructuring
- Simple, focused responsibility"""
    )
]

print(f"üìä Sample dataset with {len(sample_data)} examples")

### 2.1 Your Contribution: Add Training Examples

`‚òÖ Insight ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ`
**Why your examples matter for fine-tuning:**
- The model learns your specific style and domain expertise
- Edge cases teach the model what to do in unusual situations
- Consistency in format leads to consistent outputs
`‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ`

**Your task:** Add examples that cover code you commonly review. Think about:
- Languages your team uses most
- Common mistakes you see in PRs
- Your team's specific style guidelines

In [None]:
# ============================================================
# TODO: Add your training examples
# ============================================================

def create_review_example(code_snippet: str, review_response: str) -> dict:
    """Helper to create a code review training example."""
    user_message = f"Review this code:\n```\n{code_snippet}\n```"
    return format_for_llama(SYSTEM_PROMPT, user_message, review_response)

# Add your examples:
my_examples = [
    # Example: Uncomment and customize
    # create_review_example(
    #     code_snippet="""def process(data):
    #         for item in data:
    #             print(item)""",
    #     review_response="Here's my review..."
    # ),
]

# Combine all data
all_training_data = sample_data + my_examples
print(f"üìä Total training examples: {len(all_training_data)}")

In [None]:
# Save to JSONL file

TRAIN_FILE = "replicate_training_data.jsonl"

with open(TRAIN_FILE, 'w') as f:
    for example in all_training_data:
        f.write(json.dumps(example) + '\n')

print(f"‚úÖ Saved {len(all_training_data)} examples to {TRAIN_FILE}")

# Preview
print("\nüìÑ First example preview:")
print(json.dumps(all_training_data[0], indent=2)[:500] + "...")

## 3. Upload Training Data

Replicate needs a publicly accessible URL for your training data. Options:

1. **GitHub Gist** (free, easy)
2. **AWS S3** (private, scalable)
3. **Google Cloud Storage** (if using GCP)
4. **Replicate's file upload** (via API)

We'll use Replicate's built-in file hosting:

In [None]:
# Option 1: Upload to a temporary file hosting service
# For production, use your own cloud storage

import requests
import base64

def upload_to_github_gist(filename: str, content: str, description: str = "Training data") -> str:
    """Upload content to a GitHub Gist and return the raw URL.
    
    Note: Requires GITHUB_TOKEN environment variable with 'gist' scope.
    """
    token = os.environ.get('GITHUB_TOKEN')
    if not token:
        return None
    
    url = "https://api.github.com/gists"
    headers = {
        "Authorization": f"token {token}",
        "Accept": "application/vnd.github.v3+json"
    }
    
    data = {
        "description": description,
        "public": True,
        "files": {
            filename: {"content": content}
        }
    }
    
    response = requests.post(url, headers=headers, json=data)
    
    if response.status_code == 201:
        gist = response.json()
        raw_url = gist['files'][filename]['raw_url']
        return raw_url
    else:
        print(f"Failed to create gist: {response.status_code}")
        return None

# Read the training file
with open(TRAIN_FILE, 'r') as f:
    training_content = f.read()

# Try to upload to Gist
TRAINING_URL = upload_to_github_gist(TRAIN_FILE, training_content)

if TRAINING_URL:
    print(f"‚úÖ Training data uploaded to: {TRAINING_URL}")
else:
    print("‚ö†Ô∏è Could not auto-upload. Please:")
    print("   1. Create a GitHub Gist manually at https://gist.github.com")
    print("   2. Paste your training data content")
    print("   3. Get the 'Raw' URL and set TRAINING_URL below")
    
    # Manual URL entry
    # TRAINING_URL = "https://gist.githubusercontent.com/..."

In [None]:
# Alternative: Upload directly to Replicate via files API
# This creates a temporary URL valid for the training

def upload_to_replicate(file_path: str) -> str:
    """Upload a file to Replicate and return the URL."""
    with open(file_path, 'rb') as f:
        file_response = replicate.files.create(f)
    return file_response.urls['get']

# Upload if Gist didn't work
if not TRAINING_URL:
    try:
        TRAINING_URL = upload_to_replicate(TRAIN_FILE)
        print(f"‚úÖ Uploaded to Replicate: {TRAINING_URL}")
    except Exception as e:
        print(f"‚ùå Upload failed: {e}")
        print("   Please manually set TRAINING_URL to a publicly accessible URL")

## 4. Start Fine-Tuning

### Available Base Models on Replicate:

| Model | Replicate Name | Best For |
|-------|---------------|----------|
| Llama 3.1 8B | `meta/meta-llama-3.1-8b-instruct` | General purpose |
| Llama 3.1 70B | `meta/meta-llama-3.1-70b-instruct` | High quality |
| Mistral 7B | `mistralai/mistral-7b-instruct-v0.2` | Fast, efficient |
| Code Llama | `meta/codellama-34b-instruct` | Code generation |

In [None]:
# ============================================================
# Configure Fine-Tuning
# ============================================================

# Destination for your fine-tuned model
# Format: "your-username/model-name"
REPLICATE_USERNAME = input("Enter your Replicate username: ")
MODEL_NAME = "code-reviewer-llama"  # Change this!

DESTINATION = f"{REPLICATE_USERNAME}/{MODEL_NAME}"

# Base model to fine-tune
BASE_MODEL = "meta/meta-llama-3-8b-instruct"  # Good balance of speed/quality

print(f"üìã Configuration:")
print(f"   Base model: {BASE_MODEL}")
print(f"   Destination: {DESTINATION}")
print(f"   Training data: {TRAINING_URL[:50]}...")

In [None]:
# Create the fine-tuning training

print("üöÄ Starting fine-tuning...")

training = replicate.trainings.create(
    version="meta/meta-llama-3-8b-instruct:5b8a5c1e1f3b1e5e5c3e1b5a5c1e1f3b1e5e5c3e",  # Check Replicate for latest version
    input={
        "train_data": TRAINING_URL,
        "num_train_epochs": 3,
        "train_batch_size": 4,
        "learning_rate": 1e-5,
    },
    destination=DESTINATION
)

TRAINING_ID = training.id
print(f"‚úÖ Training created!")
print(f"   Training ID: {TRAINING_ID}")
print(f"   Status: {training.status}")

### Alternative: Use Replicate's Web UI

If the API approach has issues, you can also fine-tune via Replicate's web interface:

1. Go to https://replicate.com/create-training
2. Select your base model
3. Upload or link your training data
4. Configure hyperparameters
5. Start training

## 5. Monitor Training Progress

In [None]:
import time

def check_training_status(training_id: str):
    """Check the status of a training."""
    training = replicate.trainings.get(training_id)
    return training

def monitor_training(training_id: str, poll_interval: int = 60):
    """Monitor training until completion."""
    print(f"üìä Monitoring training {training_id}...")
    print("   (Fine-tuning can take 15-120+ minutes)\n")
    
    while True:
        training = check_training_status(training_id)
        status = training.status
        
        print(f"   [{time.strftime('%H:%M:%S')}] Status: {status}")
        
        if training.logs:
            # Show last few lines of logs
            recent_logs = training.logs.split('\n')[-3:]
            for log in recent_logs:
                if log.strip():
                    print(f"                  {log[:80]}")
        
        if status == "succeeded":
            print(f"\n‚úÖ Training completed!")
            print(f"   Model version: {training.output}")
            return training
        elif status in ["failed", "canceled"]:
            print(f"\n‚ùå Training {status}")
            if training.error:
                print(f"   Error: {training.error}")
            return training
        
        time.sleep(poll_interval)

In [None]:
# Quick status check
if 'TRAINING_ID' in dir():
    training_status = check_training_status(TRAINING_ID)
    print(f"Training ID: {TRAINING_ID}")
    print(f"Status: {training_status.status}")
    
    if training_status.status == "succeeded":
        print(f"Model: {training_status.output}")
else:
    print("No training ID found. Set TRAINING_ID manually if you have one.")

In [None]:
# Monitor until completion (uncomment to run)
# completed_training = monitor_training(TRAINING_ID)

## 6. Test Your Fine-Tuned Model

In [None]:
# Get your fine-tuned model
if 'TRAINING_ID' in dir():
    training_info = check_training_status(TRAINING_ID)
    if training_info.status == "succeeded":
        FINE_TUNED_MODEL = training_info.output
        print(f"üéØ Fine-tuned model: {FINE_TUNED_MODEL}")
    else:
        print(f"‚ö†Ô∏è Training not complete. Status: {training_info.status}")
        FINE_TUNED_MODEL = None
else:
    # Manually set if you know your model version
    FINE_TUNED_MODEL = f"{DESTINATION}:latest"  # Or specific version hash
    print(f"üéØ Using model: {FINE_TUNED_MODEL}")

In [None]:
def run_model(model_version: str, prompt: str, system_prompt: str = None) -> str:
    """Run inference on a Replicate model."""
    
    # Format for Llama chat
    if system_prompt:
        full_prompt = f"<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{prompt} [/INST]"
    else:
        full_prompt = f"<s>[INST] {prompt} [/INST]"
    
    output = replicate.run(
        model_version,
        input={
            "prompt": full_prompt,
            "max_new_tokens": 1024,
            "temperature": 0.7,
            "top_p": 0.9,
        }
    )
    
    # Collect streaming output
    result = "".join(output)
    return result

In [None]:
# Test the fine-tuned model

test_code_snippets = [
    """def calculate_total(items):
    total = 0
    for i in range(len(items)):
        total = total + items[i]['price'] * items[i]['quantity']
    return total""",
    
    """const getUserData = async (userId) => {
    const response = await fetch(`/api/users/${userId}`);
    return response.json();
}""",
]

if FINE_TUNED_MODEL:
    print("üß™ Testing fine-tuned model\n")
    print("=" * 70)
    
    for code in test_code_snippets:
        prompt = f"Review this code:\n```\n{code}\n```"
        
        print(f"\nüìù Code to review:")
        print(code)
        print("-" * 50)
        
        response = run_model(FINE_TUNED_MODEL, prompt, SYSTEM_PROMPT)
        print(f"ü§ñ Review:\n{response}")
        print("=" * 70)
else:
    print("‚ö†Ô∏è Please set FINE_TUNED_MODEL first")

## 7. Deploy as API

Your fine-tuned model is automatically deployed on Replicate. You can use it via:

### Python
```python
import replicate

output = replicate.run(
    "your-username/code-reviewer-llama:version-hash",
    input={"prompt": "Review this code..."}
)
```

### cURL
```bash
curl -s -X POST \
  -H "Authorization: Token $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"version": "...", "input": {"prompt": "..."}}' \
  https://api.replicate.com/v1/predictions
```

### JavaScript
```javascript
const Replicate = require('replicate');
const replicate = new Replicate();

const output = await replicate.run(
  "your-username/code-reviewer-llama:version-hash",
  { input: { prompt: "Review this code..." } }
);
```

In [None]:
# Generate deployment code snippet

if FINE_TUNED_MODEL:
    print("üöÄ Your model is ready to use!")
    print(f"\nModel: {FINE_TUNED_MODEL}")
    print(f"\nüìã Quick usage:")
    print(f"""
import replicate

output = replicate.run(
    "{FINE_TUNED_MODEL}",
    input={{
        "prompt": "<s>[INST] Review this code: ... [/INST]",
        "max_new_tokens": 1024,
        "temperature": 0.7
    }}
)

print("".join(output))
""")

## üìö Resources

- [Replicate Documentation](https://replicate.com/docs)
- [Fine-Tuning Guide](https://replicate.com/docs/guides/fine-tune-a-language-model)
- [Model Library](https://replicate.com/collections/language-models)
- [Pricing](https://replicate.com/pricing)

## üí° Tips

1. **Start small**: Test with ~50 examples before scaling up
2. **Quality > Quantity**: Well-crafted examples beat more mediocre ones
3. **Monitor costs**: Replicate charges per-second for training
4. **Version control**: Keep your training data versioned