# GRPO Fine-Tuning Demo Companion Notebook

This notebook demonstrates how to load the fine-tuned model, run inference tests, and visualize training performance metrics.

It is designed to complement the GRPO fine-tuning script and serves as a guide for further experimentation.

In [None]:
# Install necessary libraries (if not already installed)
!pip install transformers datasets trl torch wandb

# (Optional) Install gradio if you plan to use it for interactive visualizations
!pip install gradio

## Load the Fine-Tuned Model

The following code loads the fine-tuned model and tokenizer. Update `MODEL_NAME_OR_PATH` as needed.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Update MODEL_NAME_OR_PATH based on your training output
MODEL_NAME_OR_PATH = "outputs/Qwen-1.5B-GRPO"  

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME_OR_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME_OR_PATH, torch_dtype=torch.bfloat16).to('cuda')

print('Model and tokenizer loaded successfully.')

## Run Inference Tests

Use the cell below to interactively input prompts and generate model responses.

In [None]:
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
    output = model.generate(**inputs, max_length=256, do_sample=True, top_p=0.95, top_k=50)
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response

# Example usage
prompt = "<reasoning>What is 7+7?</reasoning><answer>"
print(generate_response(prompt))

## Performance Monitoring

For live performance metrics, consider integrating TensorBoard or Gradio dashboards. 

For example, you can log training metrics using WandB during training, and visualize them in real-time with WandB's dashboard.

Additional code and callbacks can be added to the training script to emit metrics to TensorBoard if desired.

## Instructions

1. Run the cells sequentially.
2. Update `MODEL_NAME_OR_PATH` if necessary based on your model output directory.
3. Use the inference cell to experiment with different prompts.
4. Refer to the README for additional details on training modifications and performance monitoring enhancements.