# LMEval Custom Resource Generation Examples

This notebook demonstrates how to generate LMEval Custom Resources (CRs) without deploying them to Kubernetes.

Start by installing the necessary dependencies:

```bash
pip install llama-stack-provider-lmeval
```

In [None]:
# Setup and imports
import json
import sys
import os
from typing import Optional, Dict, Any, List

sys.path.insert(0, os.path.join(os.path.dirname('.'), 'src'))

from llama_stack_provider_lmeval.lmeval import (
    LMEvalCRBuilder, LMEvalCR, LMEvalSpec, LMEvalMetadata, TaskList, ModelArg
)

def dict_to_yaml(d: Dict[Any, Any], indent: int = 0) -> str:
    """Convert dictionary to YAML-like format."""
    yaml_str = ""
    for key, value in d.items():
        if isinstance(value, dict):
            yaml_str += "  " * indent + f"{key}:\n"
            yaml_str += dict_to_yaml(value, indent + 1)
        elif isinstance(value, list):
            yaml_str += "  " * indent + f"{key}:\n"
            for item in value:
                if isinstance(item, dict):
                    yaml_str += "  " * (indent + 1) + "-\n"
                    yaml_str += dict_to_yaml(item, indent + 2)
                else:
                    yaml_str += "  " * (indent + 1) + f"- {item}\n"
        elif isinstance(value, bool):
            yaml_str += "  " * indent + f"{key}: {str(value).lower()}\n"
        elif value is None:
            yaml_str += "  " * indent + f"{key}: null\n"
        else:
            yaml_str += "  " * indent + f"{key}: {value}\n"
    return yaml_str

def display_cr_yaml(cr_dict: Dict[Any, Any]) -> None:
    """Display CR as YAML with nice formatting."""
    yaml_output = dict_to_yaml(cr_dict)
    print("```yaml")
    print(yaml_output.rstrip())
    print("```")


In [2]:
# Method 1: Using the existing LMEvalCRBuilder
# This uses your production infrastructure to generate CRs

def create_mock_benchmark_config():
    """Create a mock benchmark config for testing purposes."""
    class MockEvalCandidate:
        def __init__(self):
            self.type = "model"
            self.model = "microsoft/Phi-3-mini-4k-instruct"
            self.sampling_params = {}

    class MockBenchmarkConfig:
        def __init__(self):
            self.eval_candidate = MockEvalCandidate()
            self.dataset = {"identifier": "dk_bench"}
            self.scoring_params = {}
            self.env_vars = []
            self.metadata = {}
    return MockBenchmarkConfig()

# Create the CR builder
namespace = "lmeval-demo"
cr_builder = LMEvalCRBuilder(namespace=namespace)
mock_config = create_mock_benchmark_config()

print(f"LMEvalCRBuilder initialised for namespace: {namespace}")


LMEvalCRBuilder initialised for namespace: lmeval-demo


## Example 1: Basic LMEval CR

This generates a standard evaluation CR for the Phi-3 model with dk_bench tasks.

### Expected YAML Output:
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: basic-evaluation-001
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
    - dk_bench
  logSamples: true
  batchSize: "1"
  modelArgs:
  - name: model
    value: microsoft/Phi-3-mini-4k-instruct
  - name: num_concurrent
    value: "1"
```


In [3]:
print("🔄 Generating Basic LMEval CR...")

# Generate basic CR
basic_cr = cr_builder.create_cr(
    benchmark_id="basic-evaluation-001",
    task_config=mock_config,
    base_url=None,
    limit=None
)

print("📋 Basic LMEval CR (YAML format):")
print("=" * 50)
display_cr_yaml(basic_cr)


🔄 Generating Basic LMEval CR...


📋 Basic LMEval CR (YAML format):
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: lmeval-llama-stack-job-8d14ea58
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
      - basic-evaluation-001
  logSamples: true
  batchSize: 1
  limit: null
  modelArgs:
    -
      name: model
      value: microsoft/Phi-3-mini-4k-instruct
    -
      name: num_concurrent
      value: 1
  pod: null
  offline: null
```


## Example 2: External Model Service CR

This generates a CR that points to an external model service (like your deployed Phi-3 predictor).

### Expected YAML Output:
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: external-model-evaluation-001
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
    - dk_bench
  logSamples: true
  batchSize: "1"
  modelArgs:
  - name: model
    value: microsoft/Phi-3-mini-4k-instruct
  - name: base_url
    value: https://phi-3-predictor-example.apps.cluster.local/v1/openai/v1/completions
  - name: num_concurrent
    value: "1"
```


In [4]:
print("Generating External Model Service CR...")

# Generate CR with external model service
external_cr = cr_builder.create_cr(
    benchmark_id="external-model-evaluation-001",
    task_config=mock_config,
    base_url="https://phi-3-predictor-example.apps.cluster.local",
    limit=None
)

print("External Model Service CR (YAML format):")
print("=" * 50)
display_cr_yaml(external_cr)


Generating External Model Service CR...


External Model Service CR (YAML format):
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: lmeval-llama-stack-job-32373d38
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
      - external-model-evaluation-001
  logSamples: true
  batchSize: 1
  limit: null
  modelArgs:
    -
      name: model
      value: microsoft/Phi-3-mini-4k-instruct
    -
      name: base_url
      value: https://phi-3-predictor-example.apps.cluster.local/v1/openai/v1/completions
    -
      name: num_concurrent
      value: 1
  pod: null
  offline: null
```


## Example 3: Limited Sample Evaluation CR

This generates a CR with a limit on the number of samples to evaluate.

### Expected YAML Output:
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: limited-evaluation-001
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
    - dk_bench
  logSamples: true
  batchSize: "1"
  limit: "50"
  modelArgs:
  - name: model
    value: microsoft/Phi-3-mini-4k-instruct
  - name: num_concurrent
    value: "1"
```


In [5]:
print("Generating Limited Sample Evaluation CR...")

# Generate CR with sample limit
limited_cr = cr_builder.create_cr(
    benchmark_id="limited-evaluation-001",
    task_config=mock_config,
    base_url=None,
    limit="50"  # Only evaluate 50 samples
)

print("Limited Sample Evaluation CR (YAML format):")
print("=" * 50)
display_cr_yaml(limited_cr)


Generating Limited Sample Evaluation CR...


Limited Sample Evaluation CR (YAML format):
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: lmeval-llama-stack-job-2260943c
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
      - limited-evaluation-001
  logSamples: true
  batchSize: 1
  limit: 50
  modelArgs:
    -
      name: model
      value: microsoft/Phi-3-mini-4k-instruct
    -
      name: num_concurrent
      value: 1
  pod: null
  offline: null
```


## Method 2: Using Pydantic Models Directly

This shows how to build CRs using the Pydantic models directly, including configurations like offline storage and custom tasks.
 


In [6]:
# Helper function for direct CR construction using Pydantic models
def create_basic_cr_dict(
    name: str = "example-lmeval",
    namespace: str = "default",
    model_name: str = "microsoft/Phi-3-mini-4k-instruct",
    task_names: List[str] = None,
    base_url: Optional[str] = None,
    limit: Optional[str] = None
) -> Dict[str, Any]:
    """Create a basic LMEval CR dictionary structure."""
    if task_names is None:
        task_names = ["dk_bench"]

    # Create model args
    model_args = [ModelArg(name="model", value=model_name)]
    if base_url:
        base_url = base_url.rstrip("/")
        openai_base_url = f"{base_url}/v1"
        model_args.append(ModelArg(name="base_url", value=openai_base_url))
    model_args.append(ModelArg(name="num_concurrent", value="1"))

    # Create task list
    task_list = TaskList(taskNames=task_names)

    # Create spec
    spec = LMEvalSpec(
        model="local-completions",
        taskList=task_list,
        logSamples=True,
        batchSize="1",
        limit=limit,
        modelArgs=model_args
    )

    # Create metadata
    metadata = LMEvalMetadata(name=name, namespace=namespace)

    # Create the full CR
    cr = LMEvalCR(metadata=metadata, spec=spec)

    return cr.model_dump()

## Example 4: CR with Offline Storage

This creates a CR configured to use persistent storage for offline evaluation data.

### Expected YAML Output:
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: offline-evaluation
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
    - dk_bench
  logSamples: true
  batchSize: "1"
  modelArgs:
  - name: model
    value: microsoft/Phi-3-mini-4k-instruct
  - name: num_concurrent
    value: "1"
  offline:
    storage:
      pvcName: lmeval-storage-pvc
```


In [7]:
print("Generating CR with Offline Storage...")

# Create CR with offline storage
offline_cr = create_basic_cr_dict(
    name="offline-evaluation",
    namespace="lmeval-demo",
    model_name="microsoft/Phi-3-mini-4k-instruct",
    task_names=["dk_bench"]
)

# Add offline storage configuration
offline_cr["spec"]["offline"] = {"storage": {"pvcName": "lmeval-storage-pvc"}}

print("CR with Offline Storage (YAML format):")
print("=" * 50)
display_cr_yaml(offline_cr)

print("\nNote: The offline.storage.pvcName references a PersistentVolumeClaim")


Generating CR with Offline Storage...
CR with Offline Storage (YAML format):
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: offline-evaluation
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
      - dk_bench
  logSamples: true
  batchSize: 1
  limit: null
  modelArgs:
    -
      name: model
      value: microsoft/Phi-3-mini-4k-instruct
    -
      name: num_concurrent
      value: 1
  pod: null
  offline:
    storage:
      pvcName: lmeval-storage-pvc
```

Note: The offline.storage.pvcName references a PersistentVolumeClaim


## Example 5: CR with Custom Tasks from Git

This creates a CR that loads custom evaluation tasks from a Git repository.

### Expected YAML Output:
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: custom-tasks-evaluation
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
    - custom_task_1
    - custom_task_2
    customTasks:
      source:
        git:
          url: https://github.com/example/custom-tasks.git
          revision: main
          directory: custom_tasks
  logSamples: true
  batchSize: "1"
  modelArgs:
  - name: model
    value: microsoft/Phi-3-mini-4k-instruct
  - name: num_concurrent
    value: "1"
```


In [8]:
print("Generating CR with Custom Tasks from Git...")

# Create CR with custom tasks
custom_cr = create_basic_cr_dict(
    name="custom-tasks-evaluation",
    namespace="lmeval-demo",
    model_name="microsoft/Phi-3-mini-4k-instruct",
    task_names=["custom_task_1", "custom_task_2"]
)

# Add custom tasks configuration
custom_cr["spec"]["taskList"]["customTasks"] = {
    "source": {
        "git": {
            "url": "https://github.com/example/custom-tasks.git",
            "revision": "main",
            "directory": "custom_tasks"
        }
    }
}

print("CR with Custom Tasks from Git (YAML format):")
print("=" * 50)
display_cr_yaml(custom_cr)

print("\nNote: Custom tasks are loaded from the specified Git repository")


Generating CR with Custom Tasks from Git...
CR with Custom Tasks from Git (YAML format):
```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: custom-tasks-evaluation
  namespace: lmeval-demo
spec:
  allowOnline: true
  allowCodeExecution: true
  model: local-completions
  taskList:
    taskNames:
      - custom_task_1
      - custom_task_2
    customTasks:
      source:
        git:
          url: https://github.com/example/custom-tasks.git
          revision: main
          directory: custom_tasks
  logSamples: true
  batchSize: 1
  limit: null
  modelArgs:
    -
      name: model
      value: microsoft/Phi-3-mini-4k-instruct
    -
      name: num_concurrent
      value: 1
  pod: null
  offline: null
```

Note: Custom tasks are loaded from the specified Git repository
