# Pipeline Experimentation: OpenAI vs Cerebras

This notebook experiments with both executors:
- **Cerebras**: Text-only processing
- **OpenAI**: Text and image processing

## Setup

Make sure you have the required environment variables set:
```bash
CEREBRAS_API_KEY="your_cerebras_key"
CEREBRAS_API_URL=https://api.cerebras.ai/v1/completions
OPENAI_API_KEY="your_openai_key"
```

In [2]:
import os
import json
import base64
from llm_pipeline.pipeline import Pipeline
from llm_pipeline.registry.template_registry import TemplateRegistry
from llm_pipeline.config.models import PipelineConfig, ExecutorType
from llm_pipeline.agents.default_agent import DefaultAgent

# Check API keys
print("API Key Status:")
print(f"CEREBRAS_API_KEY: {'✅ Set' if os.getenv('CEREBRAS_API_KEY') else '❌ Missing'}")
print(f"OPENAI_API_KEY: {'✅ Set' if os.getenv('OPENAI_API_KEY') else '❌ Missing'}")

API Key Status:
CEREBRAS_API_KEY: ✅ Set
OPENAI_API_KEY: ✅ Set


## Initialize Pipeline Components


In [4]:
# Set up registry and agent
registry = TemplateRegistry(base_dir="templates")
agent = DefaultAgent()
pipeline = Pipeline(registry, agent)

# Common output schema for consistency
output_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number", "minimum": 0, "maximum": 1},
        "reasoning": {"type": "string"}
    },
    "required": ["answer", "confidence"],
    "additionalProperties": False
}

print("Pipeline components initialized successfully!")


Pipeline components initialized successfully!


## Test Data Setup

In [6]:
from base64 import b64encode
import requests

# Test inputs
text_input = {
    "context": "Photosynthesis is the process by which green plants convert sunlight into chemical energy using chlorophyll.",
    "question": "What is photosynthesis and what molecule is essential for this process?",
    "template_name": "v1"
}

# Create a simple test image (1x1 red pixel in base64)
image_url = "https://www.shutterstock.com/image-photo/red-apple-cut-half-water-600nw-2532255795.jpg"
test_image_b64 = b64encode(requests.get(image_url).content).decode("utf-8")

image_input = {
    "context": "You are analyzing an image to provide insights about its content.",
    "question": "What do you see in this image? Describe any colors, shapes, or patterns.",
    "template_name": "v1",
    "images": [test_image_b64]
}

print("Test data prepared:")
print(f"- Text input: {len(text_input['context'])} chars of context")
print(f"- Image input: {len(image_input['images'])} image(s), {len(test_image_b64)} chars base64")




Test data prepared:
- Text input: 108 chars of context
- Image input: 1 image(s), 56604 chars base64


## Experiment 1: Cerebras Text Processing


In [8]:
# Cerebras configuration
cerebras_config = PipelineConfig(
    default_model="llama3.1-8b",
    default_executor=ExecutorType.CEREBRAS,
    template_namespace="templates",
    timeout_seconds=30,
    max_output_tokens=512,
    json_retry_attempts=2,
    output_schema=output_schema
)

print("🧠 Testing Cerebras with text input...")
try:
    cerebras_result = pipeline.run(text_input, cerebras_config)
    
    print("\n✅ Cerebras Result:")
    print(f"Model: {cerebras_result['model']}")
    print(f"Template: {cerebras_result['template']['name']}")
    print(f"Validation: {cerebras_result['validation']}")
    print(f"Usage: {cerebras_result['usage']}")
    print(f"Answer: {json.dumps(cerebras_result['answer'], indent=2)}")
    
except Exception as e:
    print(f"❌ Cerebras test failed: {e}")


🧠 Testing Cerebras with text input...

✅ Cerebras Result:
Model: llama3.1-8b
Template: v1
Validation: {'valid': True, 'errors': []}
Usage: {'prompt_tokens': 216, 'completion_tokens': 512, 'total_tokens': 728, 'prompt_tokens_details': {'cached_tokens': 0}}
Answer: {
  "answer": "Photons are being absorbed by plant cells. The energy is used to convert carbon dioxide and water into glucose and oxygen. Chlorophyll is the green pigment found on leaves and is important in this process as it absorbs light and transfers it to the plant's cells.",
  "confidence": 0.9,
  "reasoning": "Photosynthesis is the process by which green plants convert sunlight into chemical energy. Chlorophyll is the green pigment found on leaves and is important in this process as it absorbs light and transfers it to the plant's cells."
}


## Experiment 2: OpenAI Text Processing


In [9]:
# OpenAI configuration for text
openai_text_config = PipelineConfig(
    default_model="gpt-4o-mini",
    default_executor=ExecutorType.OPENAI,
    template_namespace="templates",
    timeout_seconds=30,
    max_output_tokens=512,
    json_retry_attempts=2,
    output_schema=output_schema
)

print("🤖 Testing OpenAI with text input...")
try:
    openai_text_result = pipeline.run(text_input, openai_text_config)
    
    print("\n✅ OpenAI Text Result:")
    print(f"Model: {openai_text_result['model']}")
    print(f"Template: {openai_text_result['template']['name']}")
    print(f"Validation: {openai_text_result['validation']}")
    print(f"Usage: {openai_text_result['usage']}")
    print(f"Answer: {json.dumps(openai_text_result['answer'], indent=2)}")
    
except Exception as e:
    print(f"❌ OpenAI text test failed: {e}")


🤖 Testing OpenAI with text input...

✅ OpenAI Text Result:
Model: gpt-4o-mini
Template: v1
Validation: {'valid': True, 'errors': []}
Usage: {'prompt_tokens': 221, 'completion_tokens': 69, 'total_tokens': 290}
Answer: {
  "answer": "Photosynthesis is the process by which green plants convert sunlight into chemical energy using chlorophyll. The essential molecule for this process is chlorophyll.",
  "confidence": 0.95,
  "reasoning": "The definition and the essential role of chlorophyll in photosynthesis are well-established concepts in biology."
}


## Experiment 3: OpenAI Vision Processing


In [10]:
# OpenAI configuration for vision
openai_vision_config = PipelineConfig(
    default_model="gpt-4o-mini",  # Vision-capable model
    default_executor=ExecutorType.OPENAI,
    template_namespace="templates",
    timeout_seconds=45,  # Longer timeout for vision
    max_output_tokens=512,
    json_retry_attempts=2,
    output_schema=output_schema
)

print("👁️ Testing OpenAI with image input...")
try:
    openai_vision_result = pipeline.run(image_input, openai_vision_config)
    
    print("\n✅ OpenAI Vision Result:")
    print(f"Model: {openai_vision_result['model']}")
    print(f"Template: {openai_vision_result['template']['name']}")
    print(f"Validation: {openai_vision_result['validation']}")
    print(f"Usage: {openai_vision_result['usage']}")
    print(f"Answer: {json.dumps(openai_vision_result['answer'], indent=2)}")
    
except Exception as e:
    print(f"❌ OpenAI vision test failed: {e}")


👁️ Testing OpenAI with image input...

✅ OpenAI Vision Result:
Model: gpt-4o-mini
Template: v1
Validation: {'valid': True, 'errors': []}
Usage: {'prompt_tokens': 14386, 'completion_tokens': 61, 'total_tokens': 14447}
Answer: {
  "answer": "Red apple with a visible cut showing its white interior and seeds, accompanied by green leaves.",
  "confidence": 0.9,
  "reasoning": "The description is based on the common appearance of apples and the specific colors and shapes typically associated with them."
}


## Experiment 4: Error Handling - Images with Cerebras


In [11]:
print("🚫 Testing Cerebras with image input (should fail gracefully)...")
try:
    cerebras_image_result = pipeline.run(image_input, cerebras_config)
    print("❓ Unexpected: Cerebras accepted images")
    print(f"Result: {json.dumps(cerebras_image_result['answer'], indent=2)}")
    
except ValueError as e:
    print(f"✅ Expected error: {e}")
except Exception as e:
    print(f"❌ Unexpected error type: {e}")


🚫 Testing Cerebras with image input (should fail gracefully)...
✅ Expected error: The 'images' parameter is not supported for Cerebras API.
