# SKLEARN PIPELINE OPTIMIZATION

### This notebook demonstrates how to use a simple ThoughtAction object to significantly improve a sklearn pipeline. This notebook serves as an introduction of how to actually use the ThoughtAction abstraction in a dedicated example. This serves as a foundation for the CoTAEngine where we build chain-of-thought-actions (CoTA) which just chains together ThoughtAction objects for complex use cases. In this notebook, we will be taking the digits dataset and training an sklearn pipeline which does PCA -> Random Forest classification and optimizing it. 

### Various hyperparameter optimization techniques exist (ex. Bayesian optimization, grid search, genetic algorithms etc.), but those techniques are computationally expensive and have no inherit mechanism for understanding problem context. 

### Specifically, consider trying to optimize hyperparameters for models trained on synthetic datasets vs. "real world" datasets. The classic HP optimizers above only undestand floats, they have no understanding of problem context. In this example, we will show how we can incorporate context into the optimization. We will use prompting to optimize - a single LLM API call to increase performance. We can extend this by incorporating domain expertise itself into an error signal. 

### LLMs often hallucinate code, this can be controlled by having them operating on fixed templates. When implemented correctly, we never have to worry about hallucinated code again. 

## Getting started
### This notebook uses the getpass module - this keeps your keys secure and we can all run this notebook without exposing secrets
### Remember, if you wish you not use Claude, you will simply need to use the OpenAIEngine OR subclass QueryEngine to use another LLM! 


In [1]:
import os
import io
import json
import sys 
# Add the project root to Python path
# Get the current working directory of the notebook
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)
from contextlib import redirect_stdout
from cotarag.cota_engine.thought_actions import LLMThoughtAction



  from .autonotebook import tqdm as notebook_tqdm


## MLPipeline as a LLMThoughtAction

#### Here, we will subclass LLMThoughtAction, a familar pattern to those familiar with Pytorch where all neural nets subclass torch.nn.Module 
#### We must implement the "thought" method and the "action" method. 
#### All ThoughtActions (LLMThoughtActions subclass this base class), have a built in __call__ method which executes a (thought -> action) process. 
#### Note this pattern is not **explicitly** forced on the user, but i a convenience when building ThoughtAction chains. 
#### The key distinction between ThoughtAction and LLMThoughtAction is the former does not explicitly require an API call to an LLM while the latter expects it. It also keeps reasoning behind the code more clear. This also means we have *full* control over what parts of our pipeline will use an LLM and which will not, a limitation LangChain fails to address adequately for reasons beyond the scope of this notebook. 

In [None]:
os.environ["CLAUDE_API_KEY"] = "<API-KEY>"

In [3]:
class MLPipeline(LLMThoughtAction):
    def __init__(self, api_key=None, query_engine=None):
        # Initialize parent class for LLM capabilities
        super().__init__(api_key=api_key, query_engine=query_engine)
        
    def thought(self, input_data):
        # Extract parameters from input_data (this will be a program treated as a parameterized string) 
        if not isinstance(input_data, dict):
            raise ValueError("input_data must be a dictionary with parameters")
            
        n_components = input_data.get('n_components')
        num_trees = input_data.get('num_trees')
        max_depth = input_data.get('max_depth')
        
        if any(x is None for x in [n_components, num_trees, max_depth]):
            raise ValueError("Missing required parameters: n_components, num_trees, max_depth")
            # NOTE: This means that if we use LLMs to generate code itself - we can catch errors where there are missing template parameters early
            # TIP: Have LLMs generate controlled code templates and use ThoughtActions to populate them. This contraint reduces hallucinatory outputs
            
        # Note that code templates are just python files as strings. We can use f = open(my_python_script.py,'r').read() 
        # The code template here as a raw string  is provided for illustration 
        # Note that the code becomes parameterized and so it changes only in predictable ways defined by the user. 
        code_template = """
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import json

# Reasoning: Load and prepare data
data = load_digits()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Reasoning: Create and execute pipeline
scaler = StandardScaler()
pca = PCA(n_components={n_components})
rf = RandomForestClassifier(n_estimators={num_trees}, max_depth={max_depth}, random_state=42)

# Reasoning: Fit and transform
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Reasoning: Train and predict
rf.fit(X_train_pca, y_train)
y_pred = rf.predict(X_test_pca)

# Reasoning: Calculate metrics
metrics = {{
    'accuracy': accuracy_score(y_test, y_pred),
    'precision': precision_score(y_test, y_pred, average='weighted'),
    'recall': recall_score(y_test, y_pred, average='weighted'),
    'f1': f1_score(y_test, y_pred, average='weighted'),
    'confusion_matrix': confusion_matrix(y_test, y_pred).tolist(),
    'feature_importance': rf.feature_importances_.tolist(),
    'explained_variance_ratio': pca.explained_variance_ratio_.tolist(),
    'current_params': {{
        'n_components': {n_components},
        'num_trees': {num_trees},
        'max_depth': {max_depth}
    }}
}}

# Reasoning: Store results for analysis
result = json.dumps(metrics, indent=2)
"""
        # Format the template with provided arguments
        # NOTE: This logic extends to any python program, and can be adapted to any other program executable via the subprocess module. 
        try:
            formatted_code = code_template.format(
                n_components=n_components,
                num_trees=num_trees,
                max_depth=max_depth
            )
        except KeyError as e:
            raise ValueError(f"Missing required argument: {e}")
            
        return formatted_code
        
    def action(self, code):
        # Execute the code and capture output
        try:
            # Reasoning: Create a string buffer to capture stdout
            output_buffer = io.StringIO()
            
            # Reasoning: Execute code and capture output
            with redirect_stdout(output_buffer):
                # Reasoning: Execute in a new namespace to avoid pollution
                namespace = {}
                exec(code, namespace)
                
            # Reasoning: Get metrics as JSON string
            metrics_str = namespace.get('result', '{}')
            
            # Reasoning: Analyze results using LLM
            # NOTE: Actions can be LLM evaluations, this allows actually meta-reasoning which will be covered in a later notebook. 
            analysis_prompt = f"""Analyze the following ML pipeline results and suggest improvements:

Current Results:
{metrics_str}

Please provide your analysis in two parts:

PART 1 - Performance Analysis:
1. Performance Summary:
   - Overall accuracy and key metrics
   - Strengths and weaknesses
   - Class-wise performance (if applicable)

2. PCA Analysis:
   - Impact of current n_components
   - Explained variance insights

3. Random Forest Analysis:
   - Current tree configuration effectiveness
   - Feature importance insights

4. General Recommendations:
   - Potential model architecture changes
   - Additional features or preprocessing to consider

PART 2 - Hyperparameter Recommendations:
Format your hyperparameter recommendations exactly as follows:
max_depth = [suggested_value]
num_trees = [suggested_value]
n_components = [suggested_value]

Analysis:"""

            # Reasoning: Get LLM analysis
            analysis = self.query_engine.generate_response(analysis_prompt)
            
            # Reasoning: Extract hyperparameter recommendations
            hp_recommendations = {}
            for line in analysis.split('\n'):
                if '=' in line:
                    param, value = line.split('=')
                    param = param.strip()
                    value = value.strip()
                    if param in ['max_depth', 'num_trees', 'n_components']:
                        try:
                            hp_recommendations[param] = int(value)
                        except ValueError:
                            hp_recommendations[param] = value
            
            # Reasoning: Return metrics, analysis, and structured recommendations
            return {
                'metrics': json.loads(metrics_str),
                'analysis': analysis,
                'hp_recommendations': hp_recommendations
            }
            
        except Exception as e:
            # Reasoning: Handle execution errors
            error_msg = f"Error executing ML pipeline: {str(e)}"
            return {
                'error': error_msg,
                'analysis': None,
                'hp_recommendations': None
            }

In [4]:
def main():
    # Reasoning: Check for API key
    api_key = os.environ.get("CLAUDE_API_KEY")
    if not api_key:
        raise ValueError("CLAUDE_API_KEY environment variable not set")
    
    print("\n=== ML Pipeline Thought-Action Demo ===")
    
    # Reasoning: Create pipeline with initial parameters
    print("\n1. Initializing ML Pipeline...")
    pipeline = MLPipeline(api_key=api_key)
    
    # Reasoning: First iteration with default parameters
    print("\n2. Running first iteration...")
    print("   Parameters:")
    print("   - n_components: 10")
    print("   - num_trees: 100")
    print("   - max_depth: 5")
    
    result1 = pipeline(input_data={
        'n_components': 10,
        'num_trees': 100,
        'max_depth': 5
    })
    # Reasoning: Print first iteration results
    print("\n3. First Iteration Results:")
    print("   Metrics:")
    metrics = result1['metrics']
    print(f"   - Accuracy: {metrics['accuracy']:.4f}")
    print(f"   - F1 Score: {metrics['f1']:.4f}")
    
    print("\n   Analysis:")
    print(result1['analysis'])
    
    # Reasoning: Get hyperparameter recommendations
    hp_recs = result1['hp_recommendations']
    print("\n   Recommended Parameters for Next Iteration:")
    print(f"   - n_components: {hp_recs['n_components']}")
    print(f"   - num_trees: {hp_recs['num_trees']}")
    print(f"   - max_depth: {hp_recs['max_depth']}")
    
    # Reasoning: Second iteration with recommended parameters
    print("\n4. Running second iteration with recommended parameters...")
    result2 = pipeline(input_data={
        'n_components': hp_recs['n_components'],
        'num_trees': hp_recs['num_trees'],
        'max_depth': hp_recs['max_depth']
    })
    
    # Reasoning: Print second iteration results
    print("\n5. Second Iteration Results:")
    print("   Metrics:")
    metrics = result2['metrics']
    print(f"   - Accuracy: {metrics['accuracy']:.4f}")
    print(f"   - F1 Score: {metrics['f1']:.4f}")
    
    print("\n   Analysis:")
    print(result2['analysis'])
    
    # Reasoning: Compare iterations
    print("\n6. Performance Comparison:")
    print(f"   First Iteration Accuracy: {result1['metrics']['accuracy']:.4f}")
    print(f"   Second Iteration Accuracy: {result2['metrics']['accuracy']:.4f}")
    improvement = (result2['metrics']['accuracy'] - result1['metrics']['accuracy']) * 100
    print(f"   Improvement: {improvement:+.2f}%")
    
    print("\n=== Demo Complete ===")

if __name__ == "__main__":
    main() 


=== ML Pipeline Thought-Action Demo ===

1. Initializing ML Pipeline...

2. Running first iteration...
   Parameters:
   - n_components: 10
   - num_trees: 100
   - max_depth: 5

3. First Iteration Results:
   Metrics:
   - Accuracy: 0.8778
   - F1 Score: 0.8788

   Analysis:
# Analysis of ML Pipeline Results

## PART 1 - Performance Analysis

### 1. Performance Summary

**Overall Metrics:**
- Accuracy: 87.78%
- Precision: 88.31%
- Recall: 87.78%
- F1 Score: 87.88%

**Strengths:**
- Overall good performance with balanced precision and recall
- Most classes show strong classification performance, particularly classes 0, 4, 5, 6, and 7
- Model achieves above 85% on all key metrics

**Weaknesses:**
- Confusion between specific class pairs, particularly:
  - Class 3 and Class 9 (5 instances of class 3 predicted as 9)
  - Class 9 and Class 3 (6 instances of class 9 predicted as 3)
  - Class 5 and Class 9 (4 instances of class 5 predicted as 9)
  - Class 8 has relatively lower performance w