# AI Applications in Production

## Learning Objectives
By the end of this lesson, you will be able to:
- Build and deploy complete AI systems
- Implement MLOps practices for production
- Create real-world AI applications
- Monitor and maintain AI systems in production

## Core Concepts
- **MLOps**: Managing machine learning models in production
- **Model Registry**: Version control for AI models
- **Monitoring**: Tracking model performance over time
- **A/B Testing**: Comparing different models in production
- **Deployment**: Making AI models available to users

## 1. Building AI-Powered Applications

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("🚀 AI APPLICATIONS IN PRODUCTION")

# Simulate a complete AI application pipeline
print("🏗️ BUILDING END-TO-END AI SYSTEM")

# 1. Data Collection and Preprocessing
print("\n1. DATA PIPELINE:")

def collect_user_data():
    """Simulate collecting user interaction data"""
    np.random.seed(42)
    data = {
        'user_id': range(1000, 1100),
        'age': np.random.randint(18, 65, 100),
        'income': np.random.normal(50000, 15000, 100),
        'clicks': np.random.poisson(5, 100),
        'time_on_site': np.random.exponential(10, 100),
        'purchases': np.random.binomial(1, 0.3, 100)
    }
    return pd.DataFrame(data)

user_data = collect_user_data()
print(f"Collected data: {user_data.shape[0]} users, {user_data.shape[1]} features")
print(user_data.head())

# 2. Model Training and Validation
print(f"\n2. MODEL DEVELOPMENT:")

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Prepare features
features = ['age', 'income', 'clicks', 'time_on_site']
X = user_data[features]
y = user_data['purchases']

# Train model with validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X_train, y_train)

# Validate model
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)

print(f"Model Performance:")
print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")

# Cross-validation for robustness
cv_scores = cross_val_score(model, X, y, cv=5)
print(f"Cross-validation accuracy: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

# 3. Model Deployment Simulation
print(f"\n3. MODEL DEPLOYMENT:")

# Save model for production
model_filename = 'purchase_prediction_model.pkl'
with open(model_filename, 'wb') as f:
    pickle.dump(model, f)

print(f"✅ Model saved as {model_filename}")

# Create API-like prediction function
def predict_purchase_probability(age, income, clicks, time_on_site):
    """Production prediction function"""
    # Load model (in production, this would be done once at startup)
    with open(model_filename, 'rb') as f:
        loaded_model = pickle.load(f)
    
    # Make prediction
    features = np.array([[age, income, clicks, time_on_site]])
    probability = loaded_model.predict_proba(features)[0][1]
    prediction = loaded_model.predict(features)[0]
    
    return {
        'will_purchase': bool(prediction),
        'probability': float(probability),
        'timestamp': datetime.now().isoformat()
    }

# Test the API function
test_user = predict_purchase_probability(age=35, income=60000, clicks=8, time_on_site=15)
print(f"Test prediction: {test_user}")

# 4. Real-time monitoring
print(f"\n4. MONITORING AND MAINTENANCE:")

def monitor_model_performance(new_data, model, threshold=0.05):
    """Monitor for model drift"""
    current_accuracy = accuracy_score(new_data['actual'], new_data['predicted'])
    
    if current_accuracy < threshold:
        return {"status": "ALERT", "message": "Model performance degraded", "accuracy": current_accuracy}
    else:
        return {"status": "OK", "message": "Model performing well", "accuracy": current_accuracy}

# Simulate monitoring
monitoring_data = {
    'actual': np.random.binomial(1, 0.3, 50),
    'predicted': np.random.binomial(1, 0.3, 50)
}

monitor_result = monitor_model_performance(monitoring_data, model, threshold=0.7)
print(f"Monitoring result: {monitor_result}")

# 5. A/B Testing Framework
print(f"\n5. A/B TESTING:")

def ab_test_models(model_a, model_b, test_data, metric='accuracy'):
    """Compare two models using A/B testing"""
    predictions_a = model_a.predict(test_data[features])
    predictions_b = model_b.predict(test_data[features])
    
    if metric == 'accuracy':
        score_a = accuracy_score(test_data['purchases'], predictions_a)
        score_b = accuracy_score(test_data['purchases'], predictions_b)
    
    winner = 'Model A' if score_a > score_b else 'Model B'
    
    return {
        'model_a_score': score_a,
        'model_b_score': score_b,
        'winner': winner,
        'improvement': abs(score_a - score_b)
    }

# Compare with a simpler model
simple_model = RandomForestClassifier(n_estimators=10, random_state=42)
simple_model.fit(X_train, y_train)

ab_results = ab_test_models(model, simple_model, user_data.iloc[:50])
print(f"A/B Test Results: {ab_results}")

print(f"\n✅ PRODUCTION AI SYSTEM COMPONENTS:")
print(f"• Data pipeline for continuous learning")
print(f"• Model validation and testing")
print(f"• Deployment with API interface")
print(f"• Real-time monitoring and alerts")
print(f"• A/B testing for model improvements")

## 2. MLOps and Model Management

## 3. Real-World AI Projects

# Practice Exercises

In [None]:
# MLOps: Managing AI Models in Production
print("⚙️ MLOPS: MANAGING AI IN PRODUCTION")

# Version control for models
print("\n📁 MODEL VERSIONING:")

class ModelRegistry:
    def __init__(self):
        self.models = {}
        self.metadata = {}
    
    def register_model(self, name, version, model, metrics):
        key = f"{name}_v{version}"
        self.models[key] = model
        self.metadata[key] = {
            'version': version,
            'metrics': metrics,
            'timestamp': datetime.now().isoformat(),
            'status': 'registered'
        }
        print(f"Registered {key} with accuracy: {metrics.get('accuracy', 'N/A')}")
    
    def get_best_model(self, name, metric='accuracy'):
        versions = [k for k in self.models.keys() if k.startswith(name)]
        if not versions:
            return None
        
        best = max(versions, key=lambda v: self.metadata[v]['metrics'].get(metric, 0))
        return self.models[best], self.metadata[best]
    
    def promote_to_production(self, name, version):
        key = f"{name}_v{version}"
        if key in self.models:
            self.metadata[key]['status'] = 'production'
            print(f"Promoted {key} to production")

# Use model registry
registry = ModelRegistry()
registry.register_model('purchase_predictor', '1.0', model, {'accuracy': accuracy, 'precision': precision})

# Automated model retraining
print(f"\n🔄 AUTOMATED RETRAINING:")

def check_for_retraining(performance_threshold=0.8, data_drift_threshold=0.1):
    """Check if model needs retraining"""
    current_performance = 0.75  # Simulated current performance
    data_drift_score = 0.15     # Simulated drift measurement
    
    needs_retraining = (
        current_performance < performance_threshold or 
        data_drift_score > data_drift_threshold
    )
    
    return {
        'needs_retraining': needs_retraining,
        'current_performance': current_performance,
        'data_drift': data_drift_score,
        'thresholds': {
            'performance': performance_threshold,
            'drift': data_drift_threshold
        }
    }

retraining_check = check_for_retraining()
print(f"Retraining check: {retraining_check}")

# Real-World AI Project Examples
print(f"\n🌍 REAL-WORLD AI PROJECT SCENARIOS:")

# Project 1: Customer Churn Prevention
print(f"\n1. CUSTOMER CHURN PREVENTION:")
def churn_prevention_system():
    """Complete churn prevention pipeline"""
    
    # Simulate customer behavior data
    customers = pd.DataFrame({
        'customer_id': range(1, 501),
        'months_active': np.random.randint(1, 36, 500),
        'support_tickets': np.random.poisson(2, 500),
        'monthly_spending': np.random.normal(150, 50, 500),
        'login_frequency': np.random.poisson(10, 500)
    })
    
    # Create churn labels (customers with low engagement)
    customers['will_churn'] = (
        (customers['login_frequency'] < 5) | 
        (customers['support_tickets'] > 5) |
        (customers['monthly_spending'] < 100)
    ).astype(int)
    
    # Train churn model
    features = ['months_active', 'support_tickets', 'monthly_spending', 'login_frequency']
    X = customers[features]
    y = customers['will_churn']
    
    churn_model = RandomForestClassifier(n_estimators=30, random_state=42)
    churn_model.fit(X, y)
    
    # Identify high-risk customers
    probabilities = churn_model.predict_proba(X)[:, 1]
    high_risk = customers[probabilities > 0.7]
    
    return {
        'total_customers': len(customers),
        'high_risk_customers': len(high_risk),
        'potential_revenue_loss': high_risk['monthly_spending'].sum() * 12,
        'model_accuracy': churn_model.score(X, y)
    }

churn_results = churn_prevention_system()
print(f"Churn prevention results: {churn_results}")

# Project 2: Inventory Optimization
print(f"\n2. INVENTORY OPTIMIZATION:")
def inventory_optimization():
    """AI-powered inventory management"""
    
    # Simulate product sales data
    products = pd.DataFrame({
        'product_id': range(1, 101),
        'avg_daily_sales': np.random.poisson(15, 100),
        'seasonality_factor': np.random.uniform(0.8, 1.2, 100),
        'supplier_lead_time': np.random.randint(3, 14, 100),
        'unit_cost': np.random.uniform(10, 100, 100)
    })
    
    # Calculate optimal stock levels
    products['safety_stock'] = products['avg_daily_sales'] * products['supplier_lead_time'] * 1.5
    products['reorder_point'] = products['avg_daily_sales'] * products['supplier_lead_time'] + products['safety_stock']
    products['optimal_stock'] = products['reorder_point'] * 2
    
    # Calculate potential savings
    current_stock_value = products['unit_cost'].sum() * 100  # Assume 100 units per product
    optimal_stock_value = (products['unit_cost'] * products['optimal_stock']).sum()
    savings = current_stock_value - optimal_stock_value
    
    return {
        'products_analyzed': len(products),
        'current_inventory_value': current_stock_value,
        'optimized_inventory_value': optimal_stock_value,
        'potential_savings': max(0, savings),
        'avg_stock_reduction': (100 - products['optimal_stock'].mean()) / 100
    }

inventory_results = inventory_optimization()
print(f"Inventory optimization results: {inventory_results}")

# Project 3: Fraud Detection
print(f"\n3. FRAUD DETECTION SYSTEM:")
def fraud_detection_system():
    """Real-time fraud detection"""
    
    # Simulate transaction data
    transactions = pd.DataFrame({
        'amount': np.random.lognormal(4, 1, 1000),
        'hour_of_day': np.random.randint(0, 24, 1000),
        'merchant_risk_score': np.random.uniform(0, 1, 1000),
        'user_spending_pattern': np.random.normal(0, 1, 1000),
        'location_risk': np.random.uniform(0, 1, 1000)
    })
    
    # Create fraud labels (rare events)
    fraud_probability = (
        0.1 * (transactions['amount'] > 1000) +
        0.05 * (transactions['merchant_risk_score'] > 0.8) +
        0.03 * (transactions['location_risk'] > 0.9)
    )
    
    transactions['is_fraud'] = np.random.binomial(1, fraud_probability)
    
    # Train fraud detection model
    features = ['amount', 'hour_of_day', 'merchant_risk_score', 'user_spending_pattern', 'location_risk']
    X = transactions[features]
    y = transactions['is_fraud']
    
    fraud_model = RandomForestClassifier(n_estimators=50, random_state=42, class_weight='balanced')
    fraud_model.fit(X, y)
    
    # Evaluate fraud detection
    predictions = fraud_model.predict(X)
    fraud_detected = np.sum(predictions)
    actual_fraud = np.sum(y)
    
    return {
        'total_transactions': len(transactions),
        'actual_fraud_cases': actual_fraud,
        'detected_fraud_cases': fraud_detected,
        'detection_rate': fraud_detected / max(actual_fraud, 1),
        'model_precision': precision_score(y, predictions) if fraud_detected > 0 else 0
    }

fraud_results = fraud_detection_system()
print(f"Fraud detection results: {fraud_results}")

# Practice Exercises
print(f"\n📚 COMPREHENSIVE PRACTICE EXERCISES:")

# Exercise 1: Build a recommendation system
print(f"\nExercise 1: Recommendation System")
print("Build a system that recommends products based on user behavior")
print("- Collaborative filtering (user-based)")
print("- Content-based filtering (item features)")
print("- Hybrid approach combining both methods")

# Exercise 2: Time series forecasting
print(f"\nExercise 2: Sales Forecasting")
print("Predict future sales using historical data")
print("- Handle seasonality and trends")
print("- Use ARIMA or neural networks")
print("- Evaluate forecast accuracy")

# Exercise 3: Natural language processing
print(f"\nExercise 3: Customer Service Chatbot")
print("Build an AI assistant for customer support")
print("- Intent classification")
print("- Entity extraction")
print("- Response generation")

# Exercise 4: Computer vision application
print(f"\nExercise 4: Quality Control System")
print("Detect defects in manufacturing using images")
print("- Image preprocessing")
print("- Defect classification")
print("- Real-time processing")

# Exercise 5: End-to-end MLOps
print(f"\nExercise 5: Complete MLOps Pipeline")
print("Build a production-ready ML system")
print("- Automated data pipelines")
print("- Model versioning and deployment")
print("- Monitoring and alerting")
print("- A/B testing framework")

# Career path guidance
print(f"\n🎯 CAREER PATHS IN AI:")
print(f"\n1. DATA SCIENTIST:")
print(f"   • Build models and analyze data")
print(f"   • Focus on statistics and algorithms")
print(f"   • Skills: Python, SQL, Statistics, ML")

print(f"\n2. ML ENGINEER:")
print(f"   • Deploy models to production")
print(f"   • Focus on scalability and reliability")
print(f"   • Skills: Python, Docker, Kubernetes, MLOps")

print(f"\n3. AI RESEARCHER:")
print(f"   • Develop new AI techniques")
print(f"   • Focus on innovation and research")
print(f"   • Skills: Deep learning, Mathematics, Research")

print(f"\n4. AI PRODUCT MANAGER:")
print(f"   • Guide AI product development")
print(f"   • Focus on business impact")
print(f"   • Skills: Business, AI understanding, Strategy")

# Final project ideas
print(f"\n🚀 CAPSTONE PROJECT IDEAS:")
print(f"1. Personal finance AI advisor")
print(f"2. Smart home automation system")
print(f"3. Health monitoring application")
print(f"4. Educational AI tutor")
print(f"5. Business process optimization tool")

print(f"\n🎉 CONGRATULATIONS!")
print(f"You've completed the comprehensive AI/ML curriculum!")
print(f"You now have the skills to:")
print(f"✅ Build end-to-end ML systems")
print(f"✅ Deploy models to production")
print(f"✅ Apply AI to real business problems")
print(f"✅ Continue learning advanced topics")
print(f"✅ Start your career in AI/ML")

print(f"\n🌟 KEEP LEARNING:")
print(f"• Practice with real datasets")
print(f"• Contribute to open source projects")
print(f"• Build a portfolio on GitHub")
print(f"• Join AI communities and competitions")
print(f"• Stay updated with latest research")

## Build an AI Assistant

Create a sophisticated AI assistant using RAG and modern LLM techniques.

In [None]:
"""
COMPLETE MODERN AI APPLICATIONS TUTORIAL
=======================================

This tutorial teaches you to build modern AI systems like ChatGPT, 
document Q&A systems, and AI agents.
"""

# =============================================================================
# PART 1: UNDERSTANDING LARGE LANGUAGE MODELS (LLMs)
# =============================================================================

print("PART 1: UNDERSTANDING LARGE LANGUAGE MODELS")
print("=" * 60)

# LLMs are massive neural networks trained on enormous amounts of text
print("What are Large Language Models?")
print("• Massive neural networks (100+ billion parameters)")
print("• Trained on internet-scale text data")
print("• Learn to predict the next word in a sequence")
print("• Can understand context and generate human-like text")

print("\nHow LLMs Work (Simplified):")
print("1. Input: 'The weather today is'")
print("2. Model predicts next word probabilities:")
print("   - 'sunny': 30%")
print("   - 'rainy': 25%") 
print("   - 'cloudy': 20%")
print("   - 'nice': 15%")
print("   - other: 10%")
print("3. Select word (often highest probability)")
print("4. Repeat with new sequence: 'The weather today is sunny'")

# Simulate a simple language model
class SimpleLanguageModel:
    """Simplified language model for educational purposes"""
    
    def __init__(self):
        # Simple word transition probabilities
        self.transitions = {
            'the': {'weather': 0.3, 'cat': 0.2, 'sun': 0.2, 'book': 0.3},
            'weather': {'is': 0.7, 'today': 0.3},
            'today': {'is': 0.8, 'was': 0.2},
            'is': {'sunny': 0.4, 'rainy': 0.3, 'cloudy': 0.2, 'good': 0.1},
            'cat': {'is': 0.5, 'sleeps': 0.3, 'runs': 0.2},
            'sun': {'is': 0.6, 'shines': 0.4}
        }
    
    def predict_next_word(self, current_word):
        """Predict next word given current word"""
        if current_word.lower() in self.transitions:
            return self.transitions[current_word.lower()]
        else:
            return {'unknown': 1.0}
    
    def generate_text(self, start_word, length=5):
        """Generate text starting from a word"""
        result = [start_word]
        current = start_word
        
        for _ in range(length - 1):
            predictions = self.predict_next_word(current)
            if predictions:
                # Choose word with highest probability
                next_word = max(predictions.items(), key=lambda x: x[1])[0]
                result.append(next_word)
                current = next_word
            else:
                break
                
        return ' '.join(result)

# Test our simple model
simple_model = SimpleLanguageModel()

print("\nSimple Language Model Demo:")
start_words = ['the', 'cat', 'weather']
for word in start_words:
    generated = simple_model.generate_text(word, 5)
    predictions = simple_model.predict_next_word(word)
    print(f"Starting with '{word}':")
    print(f"  Generated: {generated}")
    print(f"  Next word probabilities: {predictions}")

# =============================================================================
# PART 2: PROMPT ENGINEERING - TALKING TO AI
# =============================================================================

print("\n\nPART 2: PROMPT ENGINEERING - HOW TO TALK TO AI")
print("=" * 60)

print("Prompt engineering is the art of writing effective instructions for AI models.")

# Examples of good vs bad prompts
prompt_examples = {
    "Bad Prompts": [
        "Write something about dogs",
        "Help me",
        "What should I do?"
    ],
    "Good Prompts": [
        "Write a 100-word paragraph about why dogs make good pets, including specific benefits like companionship and security.",
        "I'm learning Python and getting a 'KeyError' when accessing dictionary keys. Explain what this means and show me how to handle it safely.",
        "I want to start exercising but have only 30 minutes per day. Create a weekly workout plan for a beginner."
    ]
}

print("Prompt Quality Examples:")
print("-" * 40)
for category, prompts in prompt_examples.items():
    print(f"\n{category}:")
    for i, prompt in enumerate(prompts, 1):
        print(f"  {i}. {prompt}")

# Prompt engineering techniques
techniques = {
    "Be Specific": "Instead of 'write code', say 'write a Python function that calculates compound interest'",
    "Provide Context": "I'm a beginner in Python. Explain functions with simple examples.",
    "Use Examples": "Format the output like this: Name: John, Age: 25, City: NYC",
    "Set Constraints": "Respond in exactly 3 bullet points, each under 20 words",
    "Chain of Thought": "Think step by step: 1) Analyze the problem 2) Plan the solution 3) Write the code"
}

print(f"\nPrompt Engineering Techniques:")
print("-" * 40)
for technique, example in techniques.items():
    print(f"• {technique}: {example}")

# =============================================================================
# PART 3: RETRIEVAL AUGMENTED GENERATION (RAG) SYSTEMS
# =============================================================================

print("\n\nPART 3: RAG SYSTEMS - GIVING AI ACCESS TO KNOWLEDGE")
print("=" * 60)

print("Problem: LLMs have knowledge cutoffs and can't access new information.")
print("Solution: RAG = Retrieval Augmented Generation")
print("\nRAG Process:")
print("1. User asks a question")
print("2. Search relevant documents/information")
print("3. Provide found information + question to LLM")
print("4. LLM generates answer based on retrieved context")

# Simulate a simple RAG system
class SimpleRAGSystem:
    """Simplified RAG system for educational purposes"""
    
    def __init__(self):
        # Knowledge base (in practice, this would be much larger)
        self.knowledge_base = [
            {
                "id": 1,
                "text": "Python is a high-level programming language known for its readability and simplicity. It was created by Guido van Rossum in 1991.",
                "topic": "python_basics"
            },
            {
                "id": 2, 
                "text": "Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
                "topic": "machine_learning"
            },
            {
                "id": 3,
                "text": "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes that process information.",
                "topic": "neural_networks"
            },
            {
                "id": 4,
                "text": "Data science involves extracting insights from large amounts of data using statistical methods, programming, and domain expertise.",
                "topic": "data_science"
            }
        ]
    
    def search_knowledge(self, query, top_k=2):
        """Search for relevant documents (simplified keyword matching)"""
        query_words = set(query.lower().split())
        scored_docs = []
        
        for doc in self.knowledge_base:
            doc_words = set(doc["text"].lower().split())
            # Simple scoring: count of matching words
            score = len(query_words.intersection(doc_words))
            scored_docs.append((score, doc))
        
        # Sort by score and return top_k
        scored_docs.sort(key=lambda x: x[0], reverse=True)
        return [doc for score, doc in scored_docs[:top_k] if score > 0]
    
    def generate_answer(self, query, retrieved_docs):
        """Generate answer using query and retrieved documents"""
        if not retrieved_docs:
            return "I don't have information about that topic in my knowledge base."
        
        # Combine retrieved information
        context = " ".join([doc["text"] for doc in retrieved_docs])
        
        # Simulate LLM response (in practice, this would call a real LLM)
        answer = f"Based on the available information: {context}\n\nTo answer your question '{query}': "
        
        # Simple rule-based response generation
        if "what is" in query.lower() and "python" in query.lower():
            answer += "Python is a high-level programming language known for its readability and simplicity, created by Guido van Rossum in 1991."
        elif "machine learning" in query.lower():
            answer += "Machine learning is a subset of AI that enables computers to learn from experience without explicit programming."
        else:
            answer += "The retrieved documents contain relevant information about your query."
            
        return answer
    
    def ask_question(self, query):
        """Complete RAG pipeline"""
        print(f"Query: {query}")
        
        # Step 1: Retrieve relevant documents
        retrieved_docs = self.search_knowledge(query)
        print(f"Retrieved {len(retrieved_docs)} relevant documents")
        
        for i, doc in enumerate(retrieved_docs, 1):
            print(f"  Doc {i}: {doc['text'][:50]}...")
        
        # Step 2: Generate answer
        answer = self.generate_answer(query, retrieved_docs)
        print(f"Answer: {answer}")
        
        return answer

# Test RAG system
rag_system = SimpleRAGSystem()

print("RAG System Demo:")
print("-" * 40)

test_queries = [
    "What is Python?",
    "Tell me about machine learning",
    "How do neural networks work?",
    "What is quantum computing?"  # Not in knowledge base
]

for query in test_queries:
    print(f"\n" + "="*50)
    rag_system.ask_question(query)

# =============================================================================
# PART 4: VECTOR EMBEDDINGS AND SEMANTIC SEARCH
# =============================================================================

print("\n\nPART 4: VECTOR EMBEDDINGS - SEMANTIC UNDERSTANDING")
print("=" * 60)

print("Vector embeddings convert text into numerical vectors that capture meaning.")
print("Similar concepts have similar vectors, enabling semantic search.")

# Simulate text embeddings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

class SimpleEmbeddingSystem:
    """Simple text embedding system using TF-IDF"""
    
    def __init__(self):
        self.vectorizer = TfidfVectorizer(stop_words='english', max_features=100)
        self.documents = []
        self.embeddings = None
    
    def add_documents(self, docs):
        """Add documents to the system"""
        self.documents = docs
        # Create embeddings for all documents
        self.embeddings = self.vectorizer.fit_transform(docs)
        print(f"Created embeddings for {len(docs)} documents")
        print(f"Embedding dimension: {self.embeddings.shape[1]}")
    
    def search_similar(self, query, top_k=3):
        """Find most similar documents to query"""
        # Convert query to embedding
        query_embedding = self.vectorizer.transform([query])
        
        # Calculate similarities
        similarities = cosine_similarity(query_embedding, self.embeddings)[0]
        
        # Get top_k most similar
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        results = []
        for idx in top_indices:
            if similarities[idx] > 0:  # Only return if there's some similarity
                results.append({
                    'document': self.documents[idx],
                    'similarity': similarities[idx],
                    'index': idx
                })
        
        return results

# Test embedding system
docs = [
    "Python is a programming language used for web development and data science",
    "Machine learning algorithms can predict future outcomes from historical data", 
    "Neural networks are inspired by the human brain and used in deep learning",
    "Data visualization helps us understand patterns in large datasets",
    "Web development involves creating websites and web applications",
    "Artificial intelligence aims to create machines that can think like humans",
    "Statistics is the foundation of data analysis and machine learning"
]

embedding_system = SimpleEmbeddingSystem()
embedding_system.add_documents(docs)

print("\nSemantic Search Demo:")
print("-" * 40)

test_queries = [
    "programming languages for data analysis",
    "artificial intelligence and neural networks", 
    "statistical analysis of data",
    "building websites"
]

for query in test_queries:
    print(f"\nQuery: '{query}'")
    results = embedding_system.search_similar(query, top_k=2)
    
    for i, result in enumerate(results, 1):
        print(f"  {i}. Similarity: {result['similarity']:.3f}")
        print(f"     Document: {result['document']}")

# =============================================================================
# PART 5: AI AGENTS - AUTONOMOUS AI SYSTEMS
# =============================================================================

print("\n\nPART 5: AI AGENTS - AUTONOMOUS AI SYSTEMS")
print("=" * 60)

print("AI Agents can:")
print("• Make decisions based on goals")
print("• Use tools and take actions")
print("• Learn from experience")
print("• Interact with environments")

# Simple AI agent that can use tools
class SimpleAIAgent:
    """Basic AI agent with tool usage capabilities"""
    
    def __init__(self, name):
        self.name = name
        self.tools = {
            'calculator': self.calculator_tool,
            'search': self.search_tool,
            'translator': self.translator_tool
        }
        self.memory = []
    
    def calculator_tool(self, expression):
        """Simple calculator tool"""
        try:
            # Safe evaluation of basic math expressions
            allowed_chars = set('0123456789+-*/(). ')
            if not all(c in allowed_chars for c in expression):
                return "Error: Invalid characters in expression"
            
            result = eval(expression)
            return f"Calculation result: {result}"
        except:
            return "Error: Invalid mathematical expression"
    
    def search_tool(self, query):
        """Simulated search tool"""
        # Simulate search results
        search_results = {
            'python': 'Python is a programming language created by Guido van Rossum',
            'ai': 'Artificial Intelligence is the simulation of human intelligence in machines',
            'weather': 'Current weather information would be retrieved from a weather API'
        }
        
        for keyword in search_results:
            if keyword in query.lower():
                return f"Search result: {search_results[keyword]}"
        
        return "Search result: No specific information found for this query"
    
    def translator_tool(self, text):
        """Simulated translation tool"""
        # Simple word translations
        translations = {
            'hello': 'hola (Spanish)',
            'thank you': 'gracias (Spanish)', 
            'goodbye': 'adiós (Spanish)'
        }
        
        for english, translation in translations.items():
            if english in text.lower():
                return f"Translation: {translation}"
        
        return "Translation: Translation not available for this phrase"
    
    def parse_task(self, task):
        """Parse user task and determine which tool to use"""
        task_lower = task.lower()
        
        if any(op in task_lower for op in ['+', '-', '*', '/', 'calculate', 'math']):
            # Extract mathematical expression
            import re
            math_pattern = r'[\d+\-*/().\s]+'
            match = re.search(math_pattern, task)
            if match:
                return 'calculator', match.group().strip()
        
        elif any(word in task_lower for word in ['search', 'find', 'what is', 'tell me about']):
            return 'search', task
        
        elif any(word in task_lower for word in ['translate', 'translation']):
            return 'translator', task
        
        return None, task
    
    def execute_task(self, task):
        """Execute a task using appropriate tools"""
        print(f"{self.name}: Received task - '{task}'")
        
        # Parse task to determine tool and input
        tool_name, tool_input = self.parse_task(task)
        
        if tool_name and tool_name in self.tools:
            print(f"{self.name}: Using {tool_name} tool")
            result = self.tools[tool_name](tool_input)
            self.memory.append({'task': task, 'tool': tool_name, 'result': result})
            print(f"{self.name}: {result}")
        else:
            result = f"I'm not sure how to help with that task. Available tools: {list(self.tools.keys())}"
            print(f"{self.name}: {result}")
        
        return result

# Create and test AI agent
agent = SimpleAIAgent("Assistant")

print("AI Agent Demo:")
print("-" * 40)

test_tasks = [
    "Calculate 25 * 4 + 10",
    "Search for information about Python",
    "Translate hello to Spanish",
    "What is 100 / 5?",
    "Find information about artificial intelligence",
    "Help me write a poem"  # Task the agent can't handle
]

for task in test_tasks:
    print(f"\n" + "="*50)
    agent.execute_task(task)

print(f"\nAgent Memory ({len(agent.memory)} tasks completed):")
for i, memory in enumerate(agent.memory, 1):
    print(f"{i}. Tool: {memory['tool']}, Task: {memory['task'][:30]}...")

# =============================================================================
# PART 6: MULTI-MODAL AI (TEXT + IMAGES)
# =============================================================================

print("\n\nPART 6: MULTI-MODAL AI - TEXT + IMAGES")
print("=" * 60)

print("Multi-modal AI can understand and generate multiple types of content:")
print("• Text + Images (like GPT-4 Vision)")
print("• Text + Audio (like speech recognition)")
print("• Text + Video (like video understanding)")

# Simulate a multi-modal AI system
class SimpleMultiModalAI:
    """Simulated multi-modal AI that processes text and images"""
    
    def __init__(self):
        self.image_categories = {
            'high_contrast': 'This appears to be a high-contrast image, possibly containing text or clear shapes',
            'low_contrast': 'This appears to be a low-contrast image, possibly a photograph or natural scene',
            'geometric': 'This image contains geometric patterns or shapes',
            'random': 'This appears to be a random or noisy image'
        }
    
    def analyze_image(self, image_array):
        """Analyze image and return description"""
        # Simple image analysis based on statistical properties
        mean_intensity = np.mean(image_array)
        std_intensity = np.std(image_array)
        
        # Classify based on simple heuristics
        if std_intensity > 0.3:
            category = 'high_contrast'
        elif std_intensity < 0.1:
            category = 'low_contrast'  
        elif mean_intensity > 0.7:
            category = 'geometric'
        else:
            category = 'random'
        
        analysis = {
            'category': category,
            'description': self.image_categories[category],
            'mean_intensity': mean_intensity,
            'contrast': std_intensity,
            'size': image_array.shape
        }
        
        return analysis
    
    def process_text_and_image(self, text_query, image_array):
        """Process both text and image together"""
        # Analyze image
        image_analysis = self.analyze_image(image_array)
        
        # Generate response based on text query and image analysis
        if 'describe' in text_query.lower():
            response = f"I can see an image that is {image_analysis['description'].lower()}. "
            response += f"It has dimensions {image_analysis['size']} with average intensity {image_analysis['mean_intensity']:.2f}."
        
        elif 'what' in text_query.lower() and 'see' in text_query.lower():
            response = f"I see a {image_analysis['category'].replace('_', ' ')} image. {image_analysis['description']}"
        
        else:
            response = f"Based on the image analysis: {image_analysis['description']}"
        
        return response, image_analysis

# Test multi-modal AI
multimodal_ai = SimpleMultiModalAI()

# Create test images
test_images = {
    'High Contrast': np.random.choice([0, 1], size=(10, 10)),  # Binary image
    'Low Contrast': np.random.normal(0.5, 0.05, size=(10, 10)),  # Low variance
    'Geometric': np.zeros((10, 10))  # Will be modified to create patterns
}

# Create geometric pattern
test_images['Geometric'][2:8, 2:8] = 1  # Square in the middle

print("Multi-Modal AI Demo:")
print("-" * 40)

test_queries = [
    "Describe what you see in this image",
    "What can you tell me about this picture?",
    "Analyze the visual content"
]

for img_name, img_array in test_images.items():
    print(f"\nTesting with {img_name} image:")
    
    # Visualize the test image
    plt.figure(figsize=(4, 3))
    plt.imshow(img_array, cmap='gray')
    plt.title(f'{img_name} Test Image')
    plt.axis('off')
    plt.show()
    
    query = test_queries[0]  # Use first query for demo
    response, analysis = multimodal_ai.process_text_and_image(query, img_array)
    
    print(f"Query: {query}")
    print(f"Response: {response}")
    print(f"Technical analysis: Mean={analysis['mean_intensity']:.3f}, Contrast={analysis['contrast']:.3f}")

# =============================================================================
# SUMMARY AND NEXT STEPS
# =============================================================================

print("\n\nSUMMARY: MODERN AI APPLICATIONS MASTERED")
print("=" * 60)

print("""
✅ Core AI Concepts:
   • Large Language Models (LLMs) and text generation
   • Prompt engineering for effective AI communication
   • Retrieval Augmented Generation (RAG) systems
   • Vector embeddings and semantic search
   • AI agents with tool usage capabilities
   • Multi-modal AI (text + images)

✅ Practical Systems Built:
   • Simple language model with word prediction
   • RAG system for document question-answering
   • Embedding-based semantic search engine
   • AI agent that can use multiple tools
   • Multi-modal AI for text and image processing

✅ Real-World Applications:
   • Document Q&A systems (like ChatPDF)
   • Semantic search engines
   • AI assistants with tool access
   • Content generation and analysis
   • Knowledge retrieval systems

✅ Technical Skills:
   • Understanding transformer architectures
   • Building RAG pipelines
   • Implementing semantic search
   • Creating AI agents with tool usage
   • Processing multiple data modalities

🎯 Industry Applications:
   • Customer service chatbots
   • Document analysis systems
   • Code generation assistants
   • Research and knowledge management
   • Content creation and editing tools

🚀 Next Steps:
   • Explore real LLM APIs (OpenAI, Anthropic, Hugging Face)
   • Build production RAG systems with vector databases
   • Learn about fine-tuning and model customization
   • Understand AI safety and alignment
   • Create full-stack AI applications
""")

print("🎉 Congratulations! You now understand modern AI systems!")
print("You have the foundation to build cutting-edge AI applications!")
print("From machine learning basics to modern AI agents - you've mastered the full spectrum!")

## RAG System Implementation

Build a Retrieval Augmented Generation system for question answering.

In [None]:
# Your RAG system code here

## AI Agent Framework

Design and implement an autonomous AI agent that can perform tasks.

In [None]:
# Your AI agent code here