# LSM-005: Prompt Hub and Version Control

## üéØ Learning Objectives

By the end of this notebook, you will:
- Master the LangSmith Prompt Hub for collaborative prompt development
- Implement prompt versioning and rollback strategies
- Build systematic prompt optimization workflows
- Create prompt templates with dynamic parameters
- Set up team collaboration workflows for prompt engineering
- Use A/B testing for prompt optimization
- Implement prompt performance monitoring and analytics

## üõ†Ô∏è Setup and Dependencies

Let's start by setting up our prompt engineering environment.

In [None]:
# Install required packages for prompt engineering
!pip install langsmith langchain langchain-openai langchain-hub
!pip install python-dotenv pandas numpy matplotlib seaborn
!pip install jinja2 pydantic typing-extensions

In [None]:
import os
import json
import time
from typing import List, Dict, Any, Optional, Union
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
from jinja2 import Template
from pydantic import BaseModel, Field

from dotenv import load_dotenv
from langsmith import Client, traceable
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.prompts import (
    ChatPromptTemplate, 
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    MessagesPlaceholder,
    PromptTemplate
)
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser

# Load environment variables
load_dotenv()

# Initialize clients
client = Client()
llm = ChatOpenAI(temperature=0.1, model="gpt-3.5-turbo")

print(f"‚úÖ Prompt Engineering environment ready")
print(f"üìä Project: {os.getenv('LANGSMITH_PROJECT', 'Not set')}")

## üé® Advanced Prompt Design Patterns

Let's start by exploring advanced prompt design patterns and templates.

In [None]:
# Advanced prompt design patterns

class PromptPattern(BaseModel):
    """Structured prompt pattern definition"""
    name: str = Field(description="Pattern name")
    description: str = Field(description="Pattern description")
    template: str = Field(description="Prompt template")
    variables: List[str] = Field(description="Required variables")
    use_cases: List[str] = Field(description="Common use cases")
    examples: List[Dict[str, str]] = Field(description="Example inputs/outputs")

class AdvancedPromptPatterns:
    """Collection of advanced prompt engineering patterns"""
    
    def __init__(self):
        self.patterns = self._initialize_patterns()
    
    def _initialize_patterns(self) -> Dict[str, PromptPattern]:
        """Initialize collection of prompt patterns"""
        patterns = {}
        
        # Chain of Thought Pattern
        patterns["chain_of_thought"] = PromptPattern(
            name="Chain of Thought",
            description="Step-by-step reasoning pattern for complex problems",
            template="""Let's work through this step-by-step.

Problem: {problem}

Please provide your reasoning in clear steps:
Step 1: [First step of reasoning]
Step 2: [Second step of reasoning]
...
Final Answer: [Your conclusion]

Remember to show your work and explain each step clearly.""",
            variables=["problem"],
            use_cases=["Mathematical problems", "Logical reasoning", "Complex analysis"],
            examples=[
                {
                    "problem": "If a train travels 60 mph for 2 hours, then 80 mph for 1.5 hours, what is the average speed?",
                    "expected_steps": "Calculate total distance, total time, then average"
                }
            ]
        )
        
        # Few-Shot Learning Pattern
        patterns["few_shot"] = PromptPattern(
            name="Few-Shot Learning",
            description="Learning pattern from examples",
            template="""Here are some examples of {task_description}:

{examples}

Now, please {task_instruction} for the following:

Input: {input}
Output:""",
            variables=["task_description", "examples", "task_instruction", "input"],
            use_cases=["Classification", "Text transformation", "Style adaptation"],
            examples=[
                {
                    "task_description": "sentiment classification",
                    "examples": "Input: I love this! Output: Positive\nInput: This is terrible. Output: Negative",
                    "input": "This product is amazing!"
                }
            ]
        )
        
        # Role-Playing Pattern
        patterns["role_playing"] = PromptPattern(
            name="Role-Playing",
            description="AI assumes a specific role or persona",
            template="""You are {role_description}. {role_context}

Your characteristics:
- {characteristic_1}
- {characteristic_2}
- {characteristic_3}

User Request: {user_request}

Please respond in character, maintaining your role throughout the conversation.""",
            variables=["role_description", "role_context", "characteristic_1", "characteristic_2", "characteristic_3", "user_request"],
            use_cases=["Customer service", "Educational tutoring", "Creative writing"],
            examples=[
                {
                    "role_description": "a helpful Python programming tutor",
                    "characteristic_1": "Patient and encouraging",
                    "user_request": "Help me understand loops"
                }
            ]
        )
        
        # Constraint-Based Pattern
        patterns["constraint_based"] = PromptPattern(
            name="Constraint-Based",
            description="Output must satisfy specific constraints",
            template="""Please {task} following these constraints:

REQUIRED CONSTRAINTS:
{constraints}

ADDITIONAL GUIDELINES:
{guidelines}

Task Input: {input}

Please ensure your response strictly adheres to all constraints listed above.""",
            variables=["task", "constraints", "guidelines", "input"],
            use_cases=["Structured output", "Format compliance", "Content guidelines"],
            examples=[
                {
                    "task": "write a summary",
                    "constraints": "- Exactly 50 words\n- Include 3 key points\n- No technical jargon",
                    "input": "Complex technical document"
                }
            ]
        )
        
        return patterns
    
    def get_pattern(self, pattern_name: str) -> Optional[PromptPattern]:
        """Get a specific prompt pattern"""
        return self.patterns.get(pattern_name)
    
    def list_patterns(self) -> List[str]:
        """List available patterns"""
        return list(self.patterns.keys())
    
    def demonstrate_pattern(self, pattern_name: str, variables: Dict[str, str]):
        """Demonstrate a pattern with provided variables"""
        pattern = self.get_pattern(pattern_name)
        if not pattern:
            print(f"Pattern '{pattern_name}' not found.")
            return
        
        try:
            template = Template(pattern.template)
            filled_prompt = template.render(**variables)
            
            print(f"üé® Pattern: {pattern.name}")
            print(f"üìù Description: {pattern.description}")
            print(f"üîß Use Cases: {', '.join(pattern.use_cases)}")
            print(f"\nüìã Generated Prompt:")
            print("=" * 50)
            print(filled_prompt)
            print("=" * 50)
            
        except Exception as e:
            print(f"Error demonstrating pattern: {e}")

# Initialize prompt patterns
prompt_patterns = AdvancedPromptPatterns()

print("üé® Advanced Prompt Patterns Library Initialized")
print(f"üìö Available patterns: {', '.join(prompt_patterns.list_patterns())}")

# Demonstrate Chain of Thought pattern
print("\nüß† Demonstrating Chain of Thought Pattern:")
prompt_patterns.demonstrate_pattern(
    "chain_of_thought",
    {"problem": "A company's revenue increased by 20% in Q1, then decreased by 15% in Q2. If the original revenue was $100,000, what is the revenue after Q2?"}
)

## üèóÔ∏è Prompt Hub Integration

Now let's explore how to work with the LangSmith Prompt Hub for collaborative prompt development.

In [None]:
# Prompt Hub integration and management

class PromptHubManager:
    """Manager for LangSmith Prompt Hub operations"""
    
    def __init__(self, client: Client):
        self.client = client
        self.llm = ChatOpenAI(temperature=0.3, model="gpt-3.5-turbo")
    
    def create_prompt_template(self, name: str, template: str, description: str, 
                             variables: List[str], tags: List[str] = None) -> str:
        """Create a new prompt template in the hub"""
        try:
            # For demonstration purposes, we'll create local prompt templates
            # In actual implementation, you'd use the LangSmith Hub API
            
            prompt_data = {
                "name": name,
                "description": description,
                "template": template,
                "variables": variables,
                "tags": tags or [],
                "created_at": datetime.now().isoformat(),
                "version": "1.0.0"
            }
            
            # Save locally for demo
            filename = f"prompt_{name.replace(' ', '_').lower()}.json"
            with open(filename, 'w') as f:
                json.dump(prompt_data, f, indent=2)
            
            print(f"‚úÖ Prompt template '{name}' created: {filename}")
            return filename
            
        except Exception as e:
            print(f"‚ùå Error creating prompt template: {e}")
            return None
    
    def version_prompt(self, base_name: str, new_template: str, 
                      version_notes: str, variables: List[str]) -> str:
        """Create a new version of an existing prompt"""
        try:
            base_filename = f"prompt_{base_name.replace(' ', '_').lower()}.json"
            
            # Load base prompt
            try:
                with open(base_filename, 'r') as f:
                    base_prompt = json.load(f)
            except FileNotFoundError:
                print(f"‚ùå Base prompt '{base_name}' not found")
                return None
            
            # Create new version
            current_version = base_prompt.get("version", "1.0.0")
            version_parts = current_version.split(".")
            new_minor = int(version_parts[1]) + 1
            new_version = f"{version_parts[0]}.{new_minor}.0"
            
            new_prompt = {
                **base_prompt,
                "template": new_template,
                "variables": variables,
                "version": new_version,
                "updated_at": datetime.now().isoformat(),
                "version_notes": version_notes,
                "previous_version": current_version
            }
            
            # Save new version
            new_filename = f"prompt_{base_name.replace(' ', '_').lower()}_v{new_version.replace('.', '_')}.json"
            with open(new_filename, 'w') as f:
                json.dump(new_prompt, f, indent=2)
            
            # Update base prompt
            with open(base_filename, 'w') as f:
                json.dump(new_prompt, f, indent=2)
            
            print(f"‚úÖ New version {new_version} created: {new_filename}")
            return new_filename
            
        except Exception as e:
            print(f"‚ùå Error versioning prompt: {e}")
            return None
    
    def load_prompt_template(self, name: str, version: str = None) -> Optional[Dict]:
        """Load a prompt template from the hub"""
        try:
            if version:
                filename = f"prompt_{name.replace(' ', '_').lower()}_v{version.replace('.', '_')}.json"
            else:
                filename = f"prompt_{name.replace(' ', '_').lower()}.json"
            
            with open(filename, 'r') as f:
                return json.load(f)
                
        except FileNotFoundError:
            print(f"‚ùå Prompt template '{name}' not found")
            return None
        except Exception as e:
            print(f"‚ùå Error loading prompt template: {e}")
            return None
    
    def compare_prompt_versions(self, name: str, version1: str, version2: str):
        """Compare two versions of a prompt"""
        prompt1 = self.load_prompt_template(name, version1)
        prompt2 = self.load_prompt_template(name, version2)
        
        if not prompt1 or not prompt2:
            print("‚ùå Could not load one or both prompt versions")
            return
        
        print(f"üìä Comparing {name} v{version1} vs v{version2}:\n")
        
        print(f"Version {version1}:")
        print("-" * 40)
        print(prompt1["template"][:200] + "..." if len(prompt1["template"]) > 200 else prompt1["template"])
        
        print(f"\nVersion {version2}:")
        print("-" * 40)
        print(prompt2["template"][:200] + "..." if len(prompt2["template"]) > 200 else prompt2["template"])
        
        print(f"\nChanges:")
        print(f"- Variables: {prompt1.get('variables', [])} ‚Üí {prompt2.get('variables', [])}")
        print(f"- Version notes: {prompt2.get('version_notes', 'No notes provided')}")

# Initialize Prompt Hub Manager
prompt_hub = PromptHubManager(client)

print("üèóÔ∏è Prompt Hub Manager initialized")

# Create sample prompt templates
print("\nüìù Creating Sample Prompt Templates...")

# Customer Service Prompt
customer_service_template = """
You are a helpful customer service representative for {company_name}.

Customer Context:
- Customer Name: {customer_name}
- Issue Type: {issue_type}
- Priority: {priority}

Customer Message: {customer_message}

Please respond professionally and helpfully. Follow these guidelines:
1. Acknowledge the customer's concern
2. Provide a clear solution or next steps
3. Offer additional assistance
4. Maintain a friendly, professional tone

Response:
"""

prompt_hub.create_prompt_template(
    name="Customer Service Response",
    template=customer_service_template,
    description="Professional customer service response template",
    variables=["company_name", "customer_name", "issue_type", "priority", "customer_message"],
    tags=["customer_service", "support", "professional"]
)

# Content Generation Prompt
content_generation_template = """
Create engaging {content_type} content for {target_audience}.

Topic: {topic}
Tone: {tone}
Length: {length}
Key Points to Include:
{key_points}

Additional Requirements:
{additional_requirements}

Please ensure the content is:
- Engaging and relevant to the target audience
- Well-structured and easy to read
- Optimized for the specified tone and length
- Includes all key points naturally

Content:
"""

prompt_hub.create_prompt_template(
    name="Content Generation",
    template=content_generation_template,
    description="Flexible content generation template for various formats",
    variables=["content_type", "target_audience", "topic", "tone", "length", "key_points", "additional_requirements"],
    tags=["content", "marketing", "creative"]
)

print("\n‚úÖ Sample prompt templates created successfully!")

## üîÑ Prompt Version Control and A/B Testing

Let's implement systematic prompt optimization through version control and A/B testing.

In [None]:
# Prompt optimization and A/B testing framework

class PromptOptimizer:
    """Framework for systematic prompt optimization"""
    
    def __init__(self, client: Client, prompt_hub: PromptHubManager):
        self.client = client
        self.prompt_hub = prompt_hub
        self.llm = ChatOpenAI(temperature=0.1, model="gpt-3.5-turbo")
    
    @traceable(run_type="prompt_test", tags=["optimization", "testing"])
    def test_prompt_version(self, prompt_name: str, version: str, 
                           test_inputs: List[Dict], test_name: str) -> Dict[str, Any]:
        """Test a specific prompt version with given inputs"""
        
        prompt_data = self.prompt_hub.load_prompt_template(prompt_name, version)
        if not prompt_data:
            return {"error": "Prompt not found"}
        
        results = []
        
        for i, test_input in enumerate(test_inputs):
            try:
                # Fill the prompt template
                template = Template(prompt_data["template"])
                filled_prompt = template.render(**test_input)
                
                # Test with LLM
                start_time = time.time()
                response = self.llm.invoke([HumanMessage(content=filled_prompt)])
                end_time = time.time()
                
                results.append({
                    "test_case": i + 1,
                    "input": test_input,
                    "output": response.content,
                    "latency": round(end_time - start_time, 3),
                    "success": True
                })
                
            except Exception as e:
                results.append({
                    "test_case": i + 1,
                    "input": test_input,
                    "error": str(e),
                    "success": False
                })
        
        return {
            "prompt_name": prompt_name,
            "version": version,
            "test_name": test_name,
            "total_tests": len(test_inputs),
            "successful_tests": sum(1 for r in results if r["success"]),
            "average_latency": np.mean([r["latency"] for r in results if "latency" in r]),
            "results": results
        }
    
    def run_ab_test(self, prompt_name: str, version_a: str, version_b: str, 
                   test_inputs: List[Dict], test_name: str) -> Dict[str, Any]:
        """Run A/B test between two prompt versions"""
        
        print(f"üß™ Running A/B test: {prompt_name} v{version_a} vs v{version_b}")
        
        # Test version A
        results_a = self.test_prompt_version(
            prompt_name, version_a, test_inputs, f"{test_name}_version_a"
        )
        
        # Test version B
        results_b = self.test_prompt_version(
            prompt_name, version_b, test_inputs, f"{test_name}_version_b"
        )
        
        # Compare results
        comparison = {
            "test_name": test_name,
            "prompt_name": prompt_name,
            "version_a": {
                "version": version_a,
                "success_rate": results_a["successful_tests"] / results_a["total_tests"],
                "avg_latency": results_a.get("average_latency", 0),
                "results": results_a
            },
            "version_b": {
                "version": version_b,
                "success_rate": results_b["successful_tests"] / results_b["total_tests"],
                "avg_latency": results_b.get("average_latency", 0),
                "results": results_b
            }
        }
        
        # Determine winner
        if comparison["version_a"]["success_rate"] > comparison["version_b"]["success_rate"]:
            winner = "version_a"
        elif comparison["version_b"]["success_rate"] > comparison["version_a"]["success_rate"]:
            winner = "version_b"
        else:
            # Tie-breaker: lower latency wins
            winner = "version_a" if comparison["version_a"]["avg_latency"] < comparison["version_b"]["avg_latency"] else "version_b"
        
        comparison["winner"] = winner
        comparison["recommendation"] = f"Version {comparison[winner]['version']} performs better"
        
        return comparison
    
    def generate_prompt_variations(self, base_prompt: str, variation_types: List[str]) -> List[Dict]:
        """Generate prompt variations for testing"""
        variations = [{"type": "original", "prompt": base_prompt}]
        
        for variation_type in variation_types:
            try:
                if variation_type == "more_specific":
                    # Add more specific instructions
                    variation = base_prompt + "\n\nPlease be specific and provide detailed examples in your response."
                
                elif variation_type == "more_concise":
                    # Request more concise output
                    variation = base_prompt + "\n\nPlease provide a concise, to-the-point response."
                
                elif variation_type == "step_by_step":
                    # Add step-by-step instruction
                    variation = "Let's approach this step-by-step.\n\n" + base_prompt + "\n\nBreak down your response into clear steps."
                
                elif variation_type == "with_examples":
                    # Request examples
                    variation = base_prompt + "\n\nPlease include relevant examples to illustrate your points."
                
                elif variation_type == "creative":
                    # Encourage creativity
                    variation = base_prompt + "\n\nFeel free to be creative and think outside the box in your response."
                
                else:
                    continue
                
                variations.append({"type": variation_type, "prompt": variation})
                
            except Exception as e:
                print(f"Warning: Could not generate {variation_type} variation: {e}")
        
        return variations
    
    def optimize_prompt_iteratively(self, base_prompt: str, test_inputs: List[Dict], 
                                  iterations: int = 3) -> Dict[str, Any]:
        """Iteratively optimize a prompt through multiple rounds of testing"""
        
        print(f"üîÑ Starting iterative prompt optimization ({iterations} iterations)")
        
        current_best = base_prompt
        optimization_history = []
        
        for iteration in range(iterations):
            print(f"\nüìä Iteration {iteration + 1}/{iterations}")
            
            # Generate variations
            variations = self.generate_prompt_variations(
                current_best, 
                ["more_specific", "more_concise", "step_by_step", "with_examples"]
            )
            
            # Test each variation
            best_score = 0
            best_variation = None
            
            for var in variations:
                try:
                    # Simulate testing (in real scenario, you'd use actual evaluation metrics)
                    test_score = np.random.uniform(0.6, 0.95)  # Simulated score
                    
                    if test_score > best_score:
                        best_score = test_score
                        best_variation = var
                    
                    print(f"  - {var['type']}: {test_score:.3f}")
                    
                except Exception as e:
                    print(f"  - {var['type']}: Error - {e}")
            
            if best_variation:
                current_best = best_variation["prompt"]
                optimization_history.append({
                    "iteration": iteration + 1,
                    "best_type": best_variation["type"],
                    "score": best_score,
                    "prompt": current_best
                })
                print(f"  ‚úÖ Best: {best_variation['type']} (score: {best_score:.3f})")
        
        return {
            "original_prompt": base_prompt,
            "optimized_prompt": current_best,
            "optimization_history": optimization_history,
            "total_iterations": iterations,
            "improvement_achieved": len(optimization_history) > 0
        }

# Initialize Prompt Optimizer
optimizer = PromptOptimizer(client, prompt_hub)

print("üîÑ Prompt Optimizer initialized")

# Create an improved version of the customer service prompt
print("\nüìù Creating improved customer service prompt version...")

improved_customer_service_template = """
You are an expert customer service representative for {company_name}.

Customer Profile:
- Name: {customer_name}
- Issue Category: {issue_type}
- Priority Level: {priority}
- Previous Interactions: [Check if customer has contacted before]

Customer's Message: "{customer_message}"

Response Framework:
1. ACKNOWLEDGE: Personally acknowledge their specific concern
2. EMPATHIZE: Show understanding of their situation
3. SOLVE: Provide clear, actionable solution steps
4. FOLLOW-UP: Offer additional help and next steps
5. PERSONALIZE: Use customer's name and reference their specific situation

Tone Guidelines:
- Warm and professional
- Confident in solutions
- Proactive in offering help

Your Response:
"""

# Create new version
prompt_hub.version_prompt(
    base_name="Customer Service Response",
    new_template=improved_customer_service_template,
    version_notes="Added customer profile section, structured response framework, and enhanced tone guidelines",
    variables=["company_name", "customer_name", "issue_type", "priority", "customer_message"]
)

# Run A/B test between versions
print("\nüß™ Running A/B test between prompt versions...")

test_inputs = [
    {
        "company_name": "TechCorp",
        "customer_name": "Sarah Johnson",
        "issue_type": "Product Defect",
        "priority": "High",
        "customer_message": "I received my order yesterday and the screen is cracked. This is unacceptable for a premium product."
    },
    {
        "company_name": "TechCorp",
        "customer_name": "Mike Chen",
        "issue_type": "Billing Question",
        "priority": "Medium",
        "customer_message": "I was charged twice for my subscription this month. Can you help me understand why?"
    }
]

try:
    ab_test_results = optimizer.run_ab_test(
        prompt_name="Customer Service Response",
        version_a="1.0.0",
        version_b="1.1.0",
        test_inputs=test_inputs,
        test_name="customer_service_optimization"
    )
    
    print(f"\nüìä A/B Test Results:")
    print(f"üèÜ Winner: {ab_test_results['winner']}")
    print(f"üí° Recommendation: {ab_test_results['recommendation']}")
    
    # Display metrics comparison
    print(f"\nüìà Performance Metrics:")
    print(f"Version A (1.0.0): {ab_test_results['version_a']['success_rate']:.2%} success, {ab_test_results['version_a']['avg_latency']:.3f}s avg latency")
    print(f"Version B (1.1.0): {ab_test_results['version_b']['success_rate']:.2%} success, {ab_test_results['version_b']['avg_latency']:.3f}s avg latency")
    
except Exception as e:
    print(f"‚ùå A/B test failed: {e}")

print("\n‚úÖ Prompt optimization workflow completed!")

## ü§ù Team Collaboration Workflows

Let's implement collaborative workflows for prompt engineering teams.

In [None]:
# Team collaboration tools for prompt engineering

class PromptCollaboration:
    """Tools for collaborative prompt engineering"""
    
    def __init__(self, client: Client):
        self.client = client
        self.reviews = []  # In-memory storage for demo
        self.approval_workflow = []
    
    def submit_prompt_for_review(self, prompt_name: str, version: str, 
                                author: str, reviewers: List[str], 
                                description: str) -> str:
        """Submit a prompt for peer review"""
        
        review_id = f"review_{len(self.reviews) + 1}"
        
        review_request = {
            "id": review_id,
            "prompt_name": prompt_name,
            "version": version,
            "author": author,
            "reviewers": reviewers,
            "description": description,
            "status": "pending",
            "created_at": datetime.now().isoformat(),
            "reviews": [],
            "comments": []
        }
        
        self.reviews.append(review_request)
        
        print(f"üìù Review request {review_id} created for '{prompt_name}' v{version}")
        print(f"üë• Reviewers assigned: {', '.join(reviewers)}")
        
        return review_id
    
    def add_review_comment(self, review_id: str, reviewer: str, 
                          comment: str, rating: int = None, 
                          suggestions: List[str] = None) -> bool:
        """Add a review comment to a prompt"""
        
        review = next((r for r in self.reviews if r["id"] == review_id), None)
        if not review:
            print(f"‚ùå Review {review_id} not found")
            return False
        
        if reviewer not in review["reviewers"]:
            print(f"‚ùå {reviewer} is not assigned as a reviewer for this prompt")
            return False
        
        review_comment = {
            "reviewer": reviewer,
            "comment": comment,
            "rating": rating,  # 1-5 scale
            "suggestions": suggestions or [],
            "timestamp": datetime.now().isoformat()
        }
        
        review["reviews"].append(review_comment)
        
        print(f"‚úÖ Review added by {reviewer} for {review_id}")
        if rating:
            print(f"‚≠ê Rating: {rating}/5")
        
        return True
    
    def approve_prompt(self, review_id: str, approver: str, 
                      approval_notes: str = "") -> bool:
        """Approve a prompt for production use"""
        
        review = next((r for r in self.reviews if r["id"] == review_id), None)
        if not review:
            print(f"‚ùå Review {review_id} not found")
            return False
        
        # Check if all reviewers have provided feedback
        reviewers_who_reviewed = {r["reviewer"] for r in review["reviews"]}
        missing_reviewers = set(review["reviewers"]) - reviewers_who_reviewed
        
        if missing_reviewers:
            print(f"‚ö†Ô∏è  Warning: Still waiting for reviews from: {', '.join(missing_reviewers)}")
        
        # Calculate average rating
        ratings = [r["rating"] for r in review["reviews"] if r["rating"]]
        avg_rating = np.mean(ratings) if ratings else None
        
        approval = {
            "review_id": review_id,
            "approver": approver,
            "approval_notes": approval_notes,
            "timestamp": datetime.now().isoformat(),
            "average_rating": avg_rating,
            "total_reviews": len(review["reviews"])
        }
        
        review["status"] = "approved"
        review["approval"] = approval
        self.approval_workflow.append(approval)
        
        print(f"‚úÖ Prompt approved by {approver}")
        if avg_rating:
            print(f"üìä Average rating: {avg_rating:.1f}/5.0")
        
        return True
    
    def generate_review_report(self, review_id: str) -> Dict[str, Any]:
        """Generate a comprehensive review report"""
        
        review = next((r for r in self.reviews if r["id"] == review_id), None)
        if not review:
            return {"error": f"Review {review_id} not found"}
        
        # Compile suggestions
        all_suggestions = []
        for r in review["reviews"]:
            all_suggestions.extend(r["suggestions"])
        
        # Calculate metrics
        ratings = [r["rating"] for r in review["reviews"] if r["rating"]]
        
        report = {
            "review_summary": {
                "id": review_id,
                "prompt_name": review["prompt_name"],
                "version": review["version"],
                "author": review["author"],
                "status": review["status"],
                "created_at": review["created_at"]
            },
            "review_metrics": {
                "total_reviewers": len(review["reviewers"]),
                "reviews_completed": len(review["reviews"]),
                "average_rating": np.mean(ratings) if ratings else None,
                "rating_distribution": {i: ratings.count(i) for i in range(1, 6)} if ratings else {},
                "completion_rate": len(review["reviews"]) / len(review["reviewers"]) if review["reviewers"] else 0
            },
            "feedback_summary": {
                "total_suggestions": len(all_suggestions),
                "common_themes": self._extract_common_themes(all_suggestions),
                "individual_reviews": review["reviews"]
            },
            "recommendations": self._generate_recommendations(review)
        }
        
        return report
    
    def _extract_common_themes(self, suggestions: List[str]) -> List[str]:
        """Extract common themes from review suggestions"""
        # Simple keyword-based theme extraction
        themes = []
        keywords = ["clarity", "specificity", "examples", "tone", "structure", "length"]
        
        for keyword in keywords:
            if any(keyword.lower() in suggestion.lower() for suggestion in suggestions):
                themes.append(keyword.title())
        
        return themes
    
    def _generate_recommendations(self, review: Dict) -> List[str]:
        """Generate actionable recommendations based on review feedback"""
        recommendations = []
        
        ratings = [r["rating"] for r in review["reviews"] if r["rating"]]
        if ratings:
            avg_rating = np.mean(ratings)
            if avg_rating < 3.0:
                recommendations.append("Consider significant revisions before production deployment")
            elif avg_rating < 4.0:
                recommendations.append("Address reviewer feedback and consider minor revisions")
            else:
                recommendations.append("Prompt is ready for production with excellent review scores")
        
        # Check completion rate
        completion_rate = len(review["reviews"]) / len(review["reviewers"]) if review["reviewers"] else 0
        if completion_rate < 0.8:
            recommendations.append("Consider getting additional reviews before final approval")
        
        return recommendations
    
    def create_collaboration_dashboard(self) -> Dict[str, Any]:
        """Create a dashboard view of all collaboration activities"""
        
        dashboard = {
            "overview": {
                "total_reviews": len(self.reviews),
                "pending_reviews": len([r for r in self.reviews if r["status"] == "pending"]),
                "approved_prompts": len([r for r in self.reviews if r["status"] == "approved"]),
                "total_approvals": len(self.approval_workflow)
            },
            "active_reviews": [
                {
                    "id": r["id"],
                    "prompt_name": r["prompt_name"],
                    "author": r["author"],
                    "reviewers": r["reviewers"],
                    "reviews_completed": len(r["reviews"]),
                    "status": r["status"]
                }
                for r in self.reviews if r["status"] == "pending"
            ],
            "recent_approvals": self.approval_workflow[-5:],  # Last 5 approvals
            "team_metrics": self._calculate_team_metrics()
        }
        
        return dashboard
    
    def _calculate_team_metrics(self) -> Dict[str, Any]:
        """Calculate team collaboration metrics"""
        all_reviewers = set()
        all_authors = set()
        review_times = []
        
        for review in self.reviews:
            all_authors.add(review["author"])
            all_reviewers.update(review["reviewers"])
            
            # Calculate review completion time (simplified)
            if review["reviews"]:
                created_time = datetime.fromisoformat(review["created_at"])
                last_review_time = max(
                    datetime.fromisoformat(r["timestamp"]) 
                    for r in review["reviews"]
                )
                review_times.append((last_review_time - created_time).total_seconds() / 3600)  # Hours
        
        return {
            "active_team_members": len(all_reviewers | all_authors),
            "active_reviewers": len(all_reviewers),
            "active_authors": len(all_authors),
            "avg_review_time_hours": np.mean(review_times) if review_times else 0
        }

# Initialize collaboration tools
collaboration = PromptCollaboration(client)

print("ü§ù Prompt Collaboration Tools initialized")

# Demo collaboration workflow
print("\nüìã Demonstrating Collaboration Workflow...")

# Submit prompt for review
review_id = collaboration.submit_prompt_for_review(
    prompt_name="Customer Service Response",
    version="1.1.0",
    author="Alice Johnson",
    reviewers=["Bob Smith", "Carol Davis", "David Wilson"],
    description="Improved customer service prompt with structured response framework"
)

# Add review comments
print("\nüí¨ Adding review comments...")

collaboration.add_review_comment(
    review_id=review_id,
    reviewer="Bob Smith",
    comment="Great improvement! The structured framework makes responses more consistent. Consider adding examples for complex scenarios.",
    rating=4,
    suggestions=["Add examples for edge cases", "Consider tone guidance for different priority levels"]
)

collaboration.add_review_comment(
    review_id=review_id,
    reviewer="Carol Davis",
    comment="The response framework is excellent. The ACKNOWLEDGE-EMPATHIZE-SOLVE-FOLLOW-UP structure is very clear.",
    rating=5,
    suggestions=["Perfect as is", "Maybe add personalization tips"]
)

collaboration.add_review_comment(
    review_id=review_id,
    reviewer="David Wilson",
    comment="Good structure but might be too lengthy for simple issues. Consider a simplified version for low-priority cases.",
    rating=3,
    suggestions=["Create simplified version", "Add conditional logic for issue complexity"]
)

# Generate review report
print("\nüìä Generating Review Report...")
report = collaboration.generate_review_report(review_id)

print(f"\nüìã Review Report for {report['review_summary']['prompt_name']} v{report['review_summary']['version']}:")
print(f"üìà Completion Rate: {report['review_metrics']['completion_rate']:.1%}")
print(f"‚≠ê Average Rating: {report['review_metrics']['average_rating']:.1f}/5.0")
print(f"üí° Common Themes: {', '.join(report['feedback_summary']['common_themes'])}")
print(f"üéØ Recommendations:")
for rec in report['recommendations']:
    print(f"  - {rec}")

# Approve the prompt
print("\n‚úÖ Approving prompt...")
collaboration.approve_prompt(
    review_id=review_id,
    approver="Emma Thompson",
    approval_notes="Approved for production with minor suggestions for future iterations"
)

# Show collaboration dashboard
print("\nüìä Collaboration Dashboard:")
dashboard = collaboration.create_collaboration_dashboard()
print(f"üìà Overview: {dashboard['overview']['total_reviews']} total reviews, {dashboard['overview']['approved_prompts']} approved")
print(f"üë• Team: {dashboard['team_metrics']['active_team_members']} active members")

print("\n‚úÖ Collaboration workflow demonstration complete!")

## üìä Prompt Performance Analytics

Let's implement comprehensive analytics for prompt performance monitoring.

In [None]:
# Prompt performance analytics and monitoring

class PromptAnalytics:
    """Advanced analytics for prompt performance monitoring"""
    
    def __init__(self, client: Client):
        self.client = client
        self.performance_data = []  # In-memory storage for demo
    
    @traceable(run_type="analytics", tags=["performance-tracking"])
    def track_prompt_performance(self, prompt_name: str, version: str, 
                                metrics: Dict[str, Any]) -> None:
        """Track performance metrics for a prompt"""
        
        performance_record = {
            "prompt_name": prompt_name,
            "version": version,
            "timestamp": datetime.now().isoformat(),
            "metrics": metrics
        }
        
        self.performance_data.append(performance_record)
        print(f"üìä Performance metrics tracked for {prompt_name} v{version}")
    
    def analyze_prompt_trends(self, prompt_name: str, days_back: int = 30) -> Dict[str, Any]:
        """Analyze performance trends for a prompt over time"""
        
        # Filter data for the specified prompt and time period
        cutoff_date = datetime.now().timestamp() - (days_back * 24 * 3600)
        
        relevant_data = [
            record for record in self.performance_data
            if (record["prompt_name"] == prompt_name and 
                datetime.fromisoformat(record["timestamp"]).timestamp() >= cutoff_date)
        ]
        
        if not relevant_data:
            return {"error": f"No data found for {prompt_name} in the last {days_back} days"}
        
        # Aggregate metrics
        all_metrics = [record["metrics"] for record in relevant_data]
        
        # Calculate trends for common metrics
        trend_analysis = {
            "prompt_name": prompt_name,
            "analysis_period_days": days_back,
            "total_data_points": len(relevant_data),
            "date_range": {
                "start": min(record["timestamp"] for record in relevant_data),
                "end": max(record["timestamp"] for record in relevant_data)
            },
            "metric_trends": {}
        }
        
        # Analyze each metric
        metric_names = set()
        for metrics in all_metrics:
            metric_names.update(metrics.keys())
        
        for metric_name in metric_names:
            values = [metrics.get(metric_name) for metrics in all_metrics if metric_name in metrics]
            numeric_values = [v for v in values if isinstance(v, (int, float))]
            
            if numeric_values:
                trend_analysis["metric_trends"][metric_name] = {
                    "average": np.mean(numeric_values),
                    "median": np.median(numeric_values),
                    "std_deviation": np.std(numeric_values),
                    "min": min(numeric_values),
                    "max": max(numeric_values),
                    "trend": self._calculate_trend(numeric_values),
                    "data_points": len(numeric_values)
                }
        
        return trend_analysis
    
    def _calculate_trend(self, values: List[float]) -> str:
        """Calculate trend direction (improving, declining, stable)"""
        if len(values) < 2:
            return "insufficient_data"
        
        # Simple linear trend calculation
        x = list(range(len(values)))
        slope = np.polyfit(x, values, 1)[0]
        
        if slope > 0.01:  # Threshold for significant positive trend
            return "improving"
        elif slope < -0.01:  # Threshold for significant negative trend
            return "declining"
        else:
            return "stable"
    
    def compare_prompt_versions_performance(self, prompt_name: str, 
                                          versions: List[str]) -> Dict[str, Any]:
        """Compare performance between different prompt versions"""
        
        version_data = {}
        
        for version in versions:
            version_records = [
                record for record in self.performance_data
                if (record["prompt_name"] == prompt_name and 
                    record["version"] == version)
            ]
            
            if version_records:
                all_metrics = [record["metrics"] for record in version_records]
                version_data[version] = self._aggregate_metrics(all_metrics)
        
        if not version_data:
            return {"error": "No performance data found for specified versions"}
        
        # Generate comparison
        comparison = {
            "prompt_name": prompt_name,
            "versions_compared": list(version_data.keys()),
            "version_metrics": version_data,
            "recommendations": self._generate_version_recommendations(version_data)
        }
        
        return comparison
    
    def _aggregate_metrics(self, all_metrics: List[Dict]) -> Dict[str, Any]:
        """Aggregate metrics from multiple records"""
        aggregated = {}
        
        # Get all metric names
        metric_names = set()
        for metrics in all_metrics:
            metric_names.update(metrics.keys())
        
        # Aggregate each metric
        for metric_name in metric_names:
            values = [metrics.get(metric_name) for metrics in all_metrics if metric_name in metrics]
            numeric_values = [v for v in values if isinstance(v, (int, float))]
            
            if numeric_values:
                aggregated[metric_name] = {
                    "mean": np.mean(numeric_values),
                    "median": np.median(numeric_values),
                    "std": np.std(numeric_values),
                    "count": len(numeric_values)
                }
        
        return aggregated
    
    def _generate_version_recommendations(self, version_data: Dict) -> List[str]:
        """Generate recommendations based on version comparison"""
        recommendations = []
        
        if len(version_data) < 2:
            recommendations.append("Need at least 2 versions for meaningful comparison")
            return recommendations
        
        # Compare key metrics between versions
        versions = list(version_data.keys())
        
        # Look for common metrics to compare
        common_metrics = set(version_data[versions[0]].keys())
        for version in versions[1:]:
            common_metrics &= set(version_data[version].keys())
        
        if "accuracy" in common_metrics or "success_rate" in common_metrics:
            metric_name = "accuracy" if "accuracy" in common_metrics else "success_rate"
            best_version = max(versions, key=lambda v: version_data[v][metric_name]["mean"])
            recommendations.append(f"Version {best_version} shows highest {metric_name}")
        
        if "latency" in common_metrics or "response_time" in common_metrics:
            metric_name = "latency" if "latency" in common_metrics else "response_time"
            fastest_version = min(versions, key=lambda v: version_data[v][metric_name]["mean"])
            recommendations.append(f"Version {fastest_version} shows lowest {metric_name}")
        
        return recommendations
    
    def create_performance_dashboard(self) -> Dict[str, Any]:
        """Create a comprehensive performance dashboard"""
        
        if not self.performance_data:
            return {"message": "No performance data available"}
        
        # Overall statistics
        unique_prompts = len(set(record["prompt_name"] for record in self.performance_data))
        unique_versions = len(set(f"{record['prompt_name']}:{record['version']}" for record in self.performance_data))
        
        # Recent activity
        recent_cutoff = datetime.now().timestamp() - (7 * 24 * 3600)  # 7 days
        recent_data = [
            record for record in self.performance_data
            if datetime.fromisoformat(record["timestamp"]).timestamp() >= recent_cutoff
        ]
        
        # Top performing prompts
        prompt_performance = {}
        for record in self.performance_data:
            prompt_name = record["prompt_name"]
            if prompt_name not in prompt_performance:
                prompt_performance[prompt_name] = []
            prompt_performance[prompt_name].append(record["metrics"])
        
        dashboard = {
            "overview": {
                "total_data_points": len(self.performance_data),
                "unique_prompts": unique_prompts,
                "unique_versions": unique_versions,
                "recent_activity_7d": len(recent_data)
            },
            "recent_activity": recent_data[-10:],  # Last 10 records
            "prompt_summary": {
                prompt_name: {
                    "total_measurements": len(metrics_list),
                    "latest_metrics": metrics_list[-1] if metrics_list else None
                }
                for prompt_name, metrics_list in prompt_performance.items()
            }
        }
        
        return dashboard
    
    def visualize_performance_trends(self, prompt_name: str):
        """Create visualizations of prompt performance trends"""
        
        # Filter data for the prompt
        prompt_data = [
            record for record in self.performance_data
            if record["prompt_name"] == prompt_name
        ]
        
        if not prompt_data:
            print(f"No data found for prompt: {prompt_name}")
            return
        
        # Extract timestamps and metrics
        timestamps = [datetime.fromisoformat(record["timestamp"]) for record in prompt_data]
        
        # Plot trends for numeric metrics
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        axes = axes.flatten()
        
        metrics_to_plot = ["accuracy", "latency", "success_rate", "cost"]
        
        for i, metric_name in enumerate(metrics_to_plot):
            if i >= len(axes):
                break
            
            values = []
            metric_timestamps = []
            
            for j, record in enumerate(prompt_data):
                if metric_name in record["metrics"] and isinstance(record["metrics"][metric_name], (int, float)):
                    values.append(record["metrics"][metric_name])
                    metric_timestamps.append(timestamps[j])
            
            if values:
                axes[i].plot(metric_timestamps, values, marker='o', linewidth=2, markersize=6)
                axes[i].set_title(f'{metric_name.title()} Trend for {prompt_name}')
                axes[i].set_xlabel('Time')
                axes[i].set_ylabel(metric_name.title())
                axes[i].grid(True, alpha=0.3)
                axes[i].tick_params(axis='x', rotation=45)
                
                # Add trend line
                if len(values) > 1:
                    z = np.polyfit(range(len(values)), values, 1)
                    p = np.poly1d(z)
                    axes[i].plot(metric_timestamps, p(range(len(values))), "--", alpha=0.7, color='red')
            else:
                axes[i].text(0.5, 0.5, f'No {metric_name} data', transform=axes[i].transAxes, 
                           ha='center', va='center', fontsize=12)
                axes[i].set_title(f'{metric_name.title()} - No Data')
        
        plt.tight_layout()
        plt.show()

# Initialize analytics
analytics = PromptAnalytics(client)

print("üìä Prompt Analytics initialized")

# Simulate performance data collection
print("\nüìà Simulating performance data collection...")

# Generate sample performance data
import random
from datetime import timedelta

prompts_to_track = [
    ("Customer Service Response", "1.0.0"),
    ("Customer Service Response", "1.1.0"),
    ("Content Generation", "1.0.0")
]

base_time = datetime.now() - timedelta(days=30)

for prompt_name, version in prompts_to_track:
    for day in range(30):
        # Generate realistic performance metrics
        metrics = {
            "accuracy": random.uniform(0.75, 0.95),
            "latency": random.uniform(0.8, 2.5),
            "success_rate": random.uniform(0.85, 0.98),
            "cost": random.uniform(0.001, 0.005),
            "user_satisfaction": random.uniform(3.5, 4.8)
        }
        
        # Simulate version improvements
        if version == "1.1.0":
            metrics["accuracy"] += 0.05  # Improved version
            metrics["success_rate"] += 0.03
            metrics["user_satisfaction"] += 0.2
        
        # Add some noise and temporal trends
        improvement_factor = day / 30 * 0.1  # Gradual improvement over time
        metrics["accuracy"] = min(0.98, metrics["accuracy"] + improvement_factor)
        
        analytics.performance_data.append({
            "prompt_name": prompt_name,
            "version": version,
            "timestamp": (base_time + timedelta(days=day)).isoformat(),
            "metrics": metrics
        })

print(f"‚úÖ Generated {len(analytics.performance_data)} performance data points")

# Analyze trends
print("\nüìä Analyzing Performance Trends...")

trend_analysis = analytics.analyze_prompt_trends("Customer Service Response", days_back=30)
if "error" not in trend_analysis:
    print(f"\nüìà Trend Analysis for Customer Service Response:")
    print(f"üìä Data Points: {trend_analysis['total_data_points']}")
    
    for metric_name, trend_data in trend_analysis['metric_trends'].items():
        print(f"\n{metric_name.upper()}:")
        print(f"  Average: {trend_data['average']:.3f}")
        print(f"  Trend: {trend_data['trend']}")
        print(f"  Range: {trend_data['min']:.3f} - {trend_data['max']:.3f}")

# Compare versions
print("\nüÜö Comparing Prompt Versions...")

version_comparison = analytics.compare_prompt_versions_performance(
    "Customer Service Response", ["1.0.0", "1.1.0"]
)

if "error" not in version_comparison:
    print(f"\nüìä Version Comparison Results:")
    for version, metrics in version_comparison['version_metrics'].items():
        print(f"\nVersion {version}:")
        for metric_name, metric_data in metrics.items():
            print(f"  {metric_name}: {metric_data['mean']:.3f} (¬±{metric_data['std']:.3f})")
    
    print(f"\nüí° Recommendations:")
    for rec in version_comparison['recommendations']:
        print(f"  - {rec}")

# Create dashboard
print("\nüìä Performance Dashboard:")
dashboard = analytics.create_performance_dashboard()
print(f"üìà Overview: {dashboard['overview']['total_data_points']} data points across {dashboard['overview']['unique_prompts']} prompts")
print(f"üîÑ Recent Activity: {dashboard['overview']['recent_activity_7d']} measurements in last 7 days")

# Visualize trends
print("\nüìà Generating Performance Visualizations...")
analytics.visualize_performance_trends("Customer Service Response")

print("\n‚úÖ Prompt performance analytics demonstration complete!")

## üí° Key Takeaways and Best Practices

### ‚úÖ What You've Mastered

1. **Advanced Prompt Design Patterns**:
   - Chain of Thought reasoning
   - Few-shot learning templates
   - Role-playing and persona-based prompts
   - Constraint-based structured outputs

2. **Prompt Hub Integration**:
   - Template creation and management
   - Version control and rollback strategies
   - A/B testing frameworks
   - Collaborative development workflows

3. **Team Collaboration**:
   - Peer review processes
   - Approval workflows
   - Feedback collection and analysis
   - Team performance metrics

4. **Performance Analytics**:
   - Comprehensive metrics tracking
   - Trend analysis and visualization
   - Version comparison frameworks
   - Real-time performance monitoring

### üéØ Best Practices for Production

1. **Prompt Design**:
   - Start with proven patterns and adapt to your needs
   - Use clear, specific instructions
   - Include examples when possible
   - Test with diverse inputs and edge cases

2. **Version Management**:
   - Maintain clear version numbering
   - Document all changes with rationale
   - Test new versions thoroughly before deployment
   - Keep rollback plans ready

3. **Collaboration Workflows**:
   - Establish clear review criteria
   - Involve domain experts in reviews
   - Use structured feedback forms
   - Maintain approval audit trails

4. **Performance Monitoring**:
   - Track key business metrics continuously
   - Set up alerts for performance degradation
   - Regular trend analysis and optimization
   - Compare versions systematically

### üîß Advanced Tips

- **Template Variables**: Use descriptive variable names and provide examples
- **Conditional Logic**: Consider different prompts for different scenarios
- **Performance Baselines**: Establish baselines before optimization
- **User Feedback**: Incorporate real user feedback into prompt iterations
- **Documentation**: Maintain comprehensive prompt documentation

### üö® Common Pitfalls to Avoid

- **Over-optimization**: Don't optimize for metrics that don't reflect real value
- **Version Confusion**: Always clearly identify which version is in production
- **Insufficient Testing**: Test with realistic, diverse data
- **Ignoring Edge Cases**: Consider unusual inputs and failure scenarios
- **Solo Development**: Always involve others in prompt review and testing

## üöÄ What's Next?

You've mastered collaborative prompt engineering! Continue to:

- **LSM-006: Production Monitoring** - Set up enterprise-grade monitoring with OpenTelemetry integration
- **LSM-007: Advanced Patterns** - Explore complex use cases and integration patterns
- **LSM-008: Tips and FAQs** - Learn pro tips and troubleshooting techniques

---

**Ready for enterprise-grade monitoring?** Continue to **LSM-006: Production Monitoring** to master production-grade operations and monitoring! üè≠