# Azure Reinforcement Learning (GRPO) with Speculative Decoding - Simplified

This notebook demonstrates the complete RL training and speculative decoding workflow in **~5 lines of code** per section.

All complexity is abstracted to `rl_spec_dec_utils.py`.

## 1. Setup Workspace

In [None]:
from rl_spec_dec_utils import setup_workspace, run_rl_training_pipeline, run_draft_model_pipeline, prepare_combined_model_for_deployment, deploy_speculative_decoding_endpoint, test_deployment

# Setup Azure ML workspace and registry connections
ml_client, registry_ml_client = setup_workspace(registry_name="test_centralus")

: 

## 2. Run RL Training Pipeline (GRPO)

In [None]:
# Run complete RL training pipeline: verify datasets, register data, train model, register model
rl_job, status, registered_model = run_rl_training_pipeline(
    ml_client=ml_client,
    registry_ml_client=registry_ml_client,
    base_model_id="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    compute_cluster="h100-dedicated",
    training_config={"trainer_total_epochs": 15, "actor_optim_lr": 3e-6}, #grpo , reinforce_plus_plus
    monitor=False  # Set to False to submit and continue without waiting
)

## 3. Create Draft Model for Speculative Decoding

In [None]:
# Train EAGLE3 draft model for speculative decoding
draft_job, draft_status = run_draft_model_pipeline(
    ml_client=ml_client,
    registry_ml_client=registry_ml_client,
    compute_cluster="h100-dedicated",
    num_epochs=1,
    monitor=False  # Set to True to wait for completion
)

## 4. Prepare Combined Model for Deployment

In [None]:
# Download draft model, download base model, combine and register for deployment
combined_model = prepare_combined_model_for_deployment(
    ml_client=ml_client,
    draft_job_name=draft_job.name,
    base_model_hf_id="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    model_name="grpo-speculative-decoding",
)

## 5. Deploy Speculative Decoding Endpoint

In [None]:
# Deploy managed online endpoint with speculative decoding
endpoint_name = deploy_speculative_decoding_endpoint(
    ml_client=ml_client,
    combined_model=combined_model,
    instance_type="Standard_NC24ads_A100_v4"
)

## 6. Test Deployment

In [None]:
# Test the deployed endpoint with a financial reasoning question
result = test_deployment(ml_client, endpoint_name)

## 7. Cleanup (Optional)

In [None]:
# Uncomment to delete endpoint and free up resources
# ml_client.online_endpoints.begin_delete(name=endpoint_name).wait()
# print(f"✓ Endpoint deleted: {endpoint_name}")

## Summary

This simplified notebook demonstrates the complete workflow in **~30 lines of code**:

1. ✅ **Setup**: Connected to Azure ML workspace and registry
2. ✅ **RL Training**: Trained GRPO model on FinQA dataset  
3. ✅ **Draft Model**: Created EAGLE3 draft model for speculative decoding
4. ✅ **Model Preparation**: Combined base and draft models
5. ✅ **Deployment**: Deployed speculative decoding endpoint
6. ✅ **Testing**: Validated 2-3x faster inference

All implementation details are abstracted in `rl_spec_dec_utils.py`.