# ArGen GRPO Fine-Tuning Example

This notebook demonstrates how to fine-tune a Large Language Model (LLM) using Group Relative Policy Optimization (GRPO) with Dharmic ethical principles (Ahimsa, Satya, Dharma) on the Predibase platform.

## Setup

First, let's install the required packages and import the necessary modules.

In [None]:
# Install required packages
!pip install -e ..
!pip install predibase

In [None]:
# Import required modules
import os
import json
import predibase as pb
from src.reward_functions import ahimsa_reward, satya_reward, dharma_reward
from src.predibase import create_grpo_config, submit_grpo_job
from src.data_utils.dataset import load_jsonl_dataset, prepare_dataset_for_predibase

## Authentication

Authenticate with Predibase using your API key. You can get your API key from the Predibase platform.

In [None]:
# Set your Predibase API key
# You can also set this as an environment variable: export PREDIBASE_API_KEY=your_api_key
os.environ["PREDIBASE_API_KEY"] = "your_api_key_here"

# Initialize the Predibase client
pb.login()

## Load and Prepare Dataset

Load the healthcare dataset and prepare it for Predibase.

In [None]:
# Load the dataset
dataset_path = "../data/healthcare_examples.jsonl"
examples = load_jsonl_dataset(dataset_path)

# Display a sample
examples[0]

In [None]:
# Prepare the dataset for Predibase
prepared_dataset_path = "../data/healthcare_examples_prepared.jsonl"
prepare_dataset_for_predibase(
    examples=examples,
    output_path=prepared_dataset_path,
    prompt_field="prompt",
    context_fields=["role", "patient_context"]
)

# Upload the dataset to Predibase
dataset = pb.datasets.create(
    name="healthcare_examples",
    source=prepared_dataset_path,
    description="Healthcare examples for ArGen GRPO fine-tuning"
)

## Define Reward Functions

We'll use the pre-defined reward functions based on Dharmic principles.

In [None]:
# Define the reward functions
reward_functions = {
    "ahimsa": ahimsa_reward,
    "satya": satya_reward,
    "dharma": dharma_reward
}

# Examine the reward functions
print(ahimsa_reward.__doc__)

## Configure and Submit GRPO Fine-Tuning Job

Configure the GRPO fine-tuning job and submit it to Predibase.

In [None]:
# Create the GRPO configuration
grpo_config = create_grpo_config(
    base_model="microsoft/phi-3-mini-4k-instruct",
    reward_functions=reward_functions,
    learning_rate=5e-5,
    epochs=3,
    batch_size=8
)

In [None]:
# Submit the GRPO fine-tuning job
job_id = submit_grpo_job(
    config=grpo_config,
    dataset="healthcare_examples",
    repo="argen-healthcare",
    description="ArGen GRPO fine-tuning with Dharmic principles for healthcare"
)

print(f"GRPO fine-tuning job submitted with ID: {job_id}")

## Monitor Training Progress

Monitor the progress of the GRPO fine-tuning job.

In [None]:
# Get the adapter
adapter = pb.adapters.get(job_id)

# Check the status
print(f"Status: {adapter.status}")

## Test the Fine-Tuned Model

Once the fine-tuning job is complete, test the fine-tuned model.

In [None]:
# Wait for the job to complete
# This may take a while, so you might want to check the status on the Predibase platform
import time

while adapter.status not in ["COMPLETED", "FAILED", "CANCELLED"]:
    print(f"Status: {adapter.status}")
    time.sleep(60)  # Check every minute
    adapter = pb.adapters.get(job_id)

print(f"Final status: {adapter.status}")

In [None]:
# If the job completed successfully, test the fine-tuned model
if adapter.status == "COMPLETED":
    # Create a deployment
    deployment = pb.deployments.create(
        name="argen-healthcare-deployment",
        adapter=job_id,
        description="ArGen healthcare deployment"
    )
    
    # Test the model
    test_prompt = "I have a headache that won't go away after 3 days. What should I do?"
    
    response = deployment.generate(
        prompt=test_prompt,
        max_tokens=500,
        temperature=0.7
    )
    
    print(f"Prompt: {test_prompt}")
    print(f"Response: {response.text}")

## Compare with Base Model

Compare the fine-tuned model with the base model.

In [None]:
# Create a deployment for the base model
base_deployment = pb.deployments.create(
    name="phi-3-mini-base",
    model="microsoft/phi-3-mini-4k-instruct",
    description="Base Phi-3 Mini model"
)

# Test the base model
base_response = base_deployment.generate(
    prompt=test_prompt,
    max_tokens=500,
    temperature=0.7
)

print(f"Prompt: {test_prompt}")
print(f"Base Model Response: {base_response.text}")
print(f"Fine-tuned Model Response: {response.text}")

## Evaluate Reward Scores

Evaluate the reward scores for both the base model and the fine-tuned model.

In [None]:
# Evaluate reward scores
def evaluate_rewards(prompt, completion, example={}):
    # Add default values for example
    if "role" not in example:
        example["role"] = "healthcare_assistant"
    if "patient_context" not in example:
        example["patient_context"] = "Adult with persistent headache"
    
    # Calculate rewards
    ahimsa_score = ahimsa_reward(prompt, completion, example)
    satya_score = satya_reward(prompt, completion, example)
    dharma_score = dharma_reward(prompt, completion, example)
    
    # Calculate total score
    total_score = (ahimsa_score + satya_score + dharma_score) / 3
    
    return {
        "ahimsa": ahimsa_score,
        "satya": satya_score,
        "dharma": dharma_score,
        "total": total_score
    }

# Evaluate base model
base_scores = evaluate_rewards(test_prompt, base_response.text)

# Evaluate fine-tuned model
ft_scores = evaluate_rewards(test_prompt, response.text)

# Display scores
print("Base Model Scores:")
for key, value in base_scores.items():
    print(f"  {key}: {value:.4f}")

print("\nFine-tuned Model Scores:")
for key, value in ft_scores.items():
    print(f"  {key}: {value:.4f}")

## Conclusion

In this notebook, we demonstrated how to fine-tune a Large Language Model using GRPO with Dharmic ethical principles (Ahimsa, Satya, Dharma) on the Predibase platform. We showed how to:

1. Prepare a healthcare dataset for fine-tuning
2. Define reward functions based on Dharmic principles
3. Configure and submit a GRPO fine-tuning job
4. Monitor the training progress
5. Test and evaluate the fine-tuned model
6. Compare the fine-tuned model with the base model

The fine-tuned model should demonstrate improved alignment with Dharmic ethical principles in a healthcare context.