# LLM-Aided Testbench Generation

## Overview

This notebook demonstrates a complete **LLM-aided testbench generation system** for Verilog hardware designs. The system automates the creation of comprehensive testbenches with golden reference outputs using a 5-step pipeline:

1. **Step 1-2**: Accept natural language description and Verilog code
2. **Step 3**: Generate testbench with comprehensive test patterns using LLM
3. **Step 4**: Create Python golden model from description and compute expected outputs
4. **Step 5**: Update testbench with verification logic and run simulation

### Features

- 🤖 **LLM-Powered**: Uses GPT-4 to generate intelligent testbenches
- 🎯 **Comprehensive Testing**: Covers corner cases, boundary values, and random patterns
- 🔍 **Automatic Verification**: Built-in pass/fail checking and test summaries
- 📝 **Self-Contained**: All code included in this notebook for easy execution
- ⚡ **Ready to Run**: Execute all cells to see the complete workflow

This notebook is completely self-contained - you can run it directly without any external dependencies (except the `openai` library).

## Section 1: Installation and Setup

First, we install the required dependencies and set up the environment. The only external dependency is the OpenAI API client.

In [None]:
# Install required packages
!pip install openai>=0.27.0 -q

# Import standard libraries
import os
import json
import sys
import subprocess
from typing import Dict, Any, List, Optional
import re

print("✓ Dependencies installed successfully")

## Section 2: API Key Configuration

To use the full LLM-powered generation, you need to set your OpenAI API key. You can either:
1. Set it as an environment variable: `export OPENAI_API_KEY='your-api-key'`
2. Directly set it in the cell below

**Note**: Without an API key, the system will run in mock/demo mode for demonstration purposes.

In [None]:
# Set your OpenAI API key here (or use environment variable)
# os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Check if API key is set
api_key = os.environ.get('OPENAI_API_KEY', '')
if api_key:
    print("✓ OpenAI API key is configured")
else:
    print("⚠ Warning: OpenAI API key not set. Running in demo mode.")
    print("  Set your API key with: os.environ['OPENAI_API_KEY'] = 'your-key'")

## Section 3: Create Output Directory

We'll create a directory to store all generated files (testbenches, golden models, etc.).

In [None]:
# Create output directory
output_dir = "notebook_output"
os.makedirs(output_dir, exist_ok=True)
print(f"✓ Output directory created: {output_dir}")

## Section 4: LLM Client Implementation

The `LLMClient` class handles all interactions with the OpenAI API. It provides a unified interface for generating text using GPT models.

In [None]:
"""
LLM Client for interacting with language models.
Supports multiple LLM providers (OpenAI, Anthropic, etc.)
"""

import os
import json
from typing import Optional, Dict, Any
from xml.parsers.expat import model
import openai


class LLMClient:
    """Client for interacting with LLM APIs."""
    
    def __init__(self, api_key: Optional[str] = None, model: str = "gpt-4", provider: str = "openai"):
        """
        Initialize LLM client.
        
        Args:
            api_key: API key for the LLM provider (if None, reads from environment)
            model: Model name to use
            provider: LLM provider ('openai', 'anthropic', etc.)
        """
        self.provider = provider
        self.model = model
        self.api_key = api_key or os.getenv("OPENAI_API_KEY")
        self.client = openai.OpenAI(api_key=api_key)
    
    def generate(self, prompt: str, system_prompt: Optional[str] = None, 
                 temperature: float = 0.7, max_tokens: int = 4000) -> str:
        """
        Generate text using the LLM.
        
        Args:
            prompt: User prompt
            system_prompt: System prompt for the model
            temperature: Sampling temperature (0-1)
            max_tokens: Maximum tokens to generate
            
        Returns:
            Generated text response
        """
        try:
            if self.provider == "openai":
                messages = []
                if system_prompt:
                    messages.append({"role": "system", "content": system_prompt})
                messages.append({"role": "user", "content": prompt})

                response = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    temperature=temperature,
                    max_tokens=max_tokens,
                )
                return response.choices[0].message.content
            else:
                raise ValueError(f"Unsupported provider: {self.provider}")
        except Exception as e:
            print(f"Error generating response: {e}")
            return f"Error: {str(e)}"
    
    def is_available(self) -> bool:
        """Check if the LLM client is properly configured."""
        return self.api_key is not None and len(self.api_key) > 0


## Section 5: Testbench Generator (Step 3)

The `TestbenchGenerator` class generates Verilog testbenches with comprehensive test patterns. It:
- Extracts module information (name, inputs, outputs) from Verilog code
- Uses LLM to generate comprehensive test patterns covering corner cases, boundary values, and random values
- Creates a testbench skeleton without expected outputs (those come in Step 4)

In [None]:
"""
Step 3: Generate testbench with test patterns (without golden outputs).
"""

from typing import Dict, Any, List
from .llm_client import LLMClient


class TestbenchGenerator:
    """Generate Verilog testbench with test patterns using LLM."""
    
    def __init__(self, llm_client: LLMClient):
        """
        Initialize testbench generator.
        
        Args:
            llm_client: LLM client instance
        """
        self.llm_client = llm_client
    
    def generate_testbench(self, description: str, verilog_code: str) -> Dict[str, Any]:
        """
        Generate testbench with comprehensive test patterns.
        
        Args:
            description: Natural language description of the Verilog module
            verilog_code: Verilog code to be tested
            
        Returns:
            Dictionary containing:
                - testbench_code: Verilog testbench code (without expected outputs)
                - test_patterns: List of test input patterns
                - module_info: Information about module (name, inputs, outputs)
        """
        # First, extract module information
        module_info = self._extract_module_info(verilog_code)
        
        # Generate comprehensive test patterns
        system_prompt = """You are an expert in Verilog testbench generation. 
Your task is to generate comprehensive test patterns for a given Verilog module.
Generate test patterns that cover:
1. All corner cases
2. Boundary values
3. Typical use cases
4. Edge cases
5. Random values for thorough testing"""

        user_prompt = f"""Given the following Verilog module and its natural language description, 
generate a comprehensive Verilog testbench that includes ALL possible test patterns.

Natural Language Description:
{description}

Verilog Module Code:
{verilog_code}

Generate a Verilog testbench that:
1. Declares all necessary signals
2. Instantiates the module under test
3. Includes a systematic set of test patterns covering all cases
4. Uses $display to show inputs for each test (all $display statements are in the initial block and before $finish statement.)
5. Does NOT include expected outputs or assertions yet (we will add those later)
6. Numbers each test case

Please provide:
1. The complete testbench code
2. A list of test patterns in JSON format with test number and input values

Format your response as:
TESTBENCH_CODE:
```verilog
[testbench code here]
```

TEST_PATTERNS (a list of dictionaries. Each dictionary contains only input signal names mapped to their values as plain binary strings (no prefixes like 0b, no spaces). Do not include any test_number field):
```json
[array of test patterns]
```
"""

        response = self.llm_client.generate(user_prompt, system_prompt, max_tokens=4000)
        
        # Parse the response
        testbench_code = self._extract_section(response, "TESTBENCH_CODE:", "```verilog", "```")
        test_patterns_json = self._extract_section(response, "TEST_PATTERNS:", "```json", "```")
        
        try:
            test_patterns = eval(test_patterns_json) if test_patterns_json else []
        except:
            test_patterns = []
            print("Warning: Could not parse test patterns JSON")
        
        return {
            "testbench_code": testbench_code,
            "test_patterns": test_patterns,
            "module_info": module_info,
            "raw_response": response
        }
    
    def _extract_module_info(self, verilog_code: str) -> Dict[str, Any]:
        """
        Extract module information (name, inputs, outputs) from Verilog code.
        
        Args:
            verilog_code: Verilog module code
            
        Returns:
            Dictionary with module information
        """
        lines = verilog_code.strip().split('\n')
        module_name = ""
        inputs = []
        outputs = []
        
        for line in lines:
            line = line.strip()
            if line.startswith('module'):
                # Extract module name
                parts = line.split()
                if len(parts) >= 2:
                    module_name = parts[1].split('(')[0]
            elif 'input' in line:
                # Extract input signals
                input_part = line.replace('input', '').replace(';', '').replace(',', '').strip()
                if input_part:
                    inputs.append(input_part)
            elif 'output' in line:
                # Extract output signals
                output_part = line.replace('output', '').replace(';', '').replace(',', '').strip()
                if output_part:
                    outputs.append(output_part)
        
        return {
            "module_name": module_name,
            "inputs": inputs,
            "outputs": outputs
        }
    
    def _extract_section(self, text: str, marker: str, start_delim: str, end_delim: str) -> str:
        """
        Extract a section from the LLM response between delimiters.
        
        Args:
            text: Full response text
            marker: Section marker to find
            start_delim: Start delimiter (e.g., "```verilog")
            end_delim: End delimiter (e.g., "```")
            
        Returns:
            Extracted section content
        """
        try:
            # Find the marker
            marker_idx = text.find(marker)
            if marker_idx == -1:
                return ""
            
            # Find the start delimiter after the marker
            start_idx = text.find(start_delim, marker_idx)
            if start_idx == -1:
                return ""
            start_idx += len(start_delim)
            
            # Find the end delimiter
            end_idx = text.find(end_delim, start_idx)
            if end_idx == -1:
                return ""
            
            return text[start_idx:end_idx].strip()
        except Exception as e:
            print(f"Error extracting section: {e}")
            return ""


## Section 6: Golden Model Generator (Step 4)

The `GoldenModelGenerator` class creates a Python reference implementation from the natural language description. It:
- Converts the description into executable Python code
- Runs all test patterns through the Python model
- Captures expected outputs for verification

In [None]:
"""
Step 4: Generate Python golden model and compute golden outputs.
"""

import sys
import io
import json
from typing import Dict, Any, List
from .llm_client import LLMClient


class GoldenModelGenerator:
    """Generate Python golden model and compute expected outputs."""
    
    def __init__(self, llm_client: LLMClient):
        """
        Initialize golden model generator.
        
        Args:
            llm_client: LLM client instance
        """
        self.llm_client = llm_client
    
    def generate_python_model(self, description: str, module_info: Dict[str, Any]) -> str:
        """
        Generate Python implementation based on natural language description.
        
        Args:
            description: Natural language description of the module
            module_info: Module information (name, inputs, outputs)
            
        Returns:
            Python code implementing the module functionality
        """
        system_prompt = """You are an expert in hardware design and Python programming.
Your task is to create a Python function that implements the exact same functionality
as described in the natural language specification."""

        user_prompt = f"""Given the following natural language description of a hardware module,
create a Python function that implements this functionality.

Natural Language Description:
{description}

Module Information:
- Module Name: {module_info.get('module_name', 'unknown')}
- Inputs: {module_info.get('inputs', [])}
- Outputs: {module_info.get('outputs', [])}

Create a Python function named '{module_info.get('module_name', 'module')}_golden' that:
1. Takes the input signals as parameters
2. Computes and returns the output signals
3. Implements the exact functionality described
4. Handles all edge cases properly
5. Returns outputs as a dictionary with output signal names as keys

Provide ONLY the Python function code, no explanations.
Start with 'def {module_info.get('module_name', 'module')}_golden(' and include complete implementation.
"""

        response = self.llm_client.generate(user_prompt, system_prompt, temperature=0.2, max_tokens=3000)
        
        # Extract Python code
        python_code = self._extract_python_code(response)
        
        return python_code
    
    def compute_golden_outputs(self, python_code: str, test_patterns: List[Dict[str, Any]], 
                               module_info: Dict[str, Any]) -> List[Dict[str, Any]]:
        """
        Execute the Python golden model with test patterns to get expected outputs.
        
        Args:
            python_code: Python golden model code
            test_patterns: List of test input patterns
            module_info: Module information
            
        Returns:
            List of test patterns with golden outputs added
        """
        results = []
        
        # Execute the Python code in a safe namespace
        namespace = {}
        try:
            exec(python_code, namespace)
        except Exception as e:
            print(f"Error executing Python code: {e}")
            return results
        
        # Find the golden function
        function_name = f"{module_info.get('module_name', 'module')}_golden"
        golden_func = namespace.get(function_name)
        
        if not golden_func:
            print(f"Error: Could not find function {function_name}")
            return results
        
        # Run each test pattern through the golden model
        for pattern in test_patterns:
            try:
                # Extract input values from the pattern
                inputs = pattern.get('inputs', pattern)
                
                # Call the golden function
                if isinstance(inputs, dict):
                    outputs = golden_func(**inputs)
                else:
                    # If inputs is not a dict, try to call with positional args
                    outputs = golden_func(*inputs.values()) if hasattr(inputs, 'values') else golden_func(inputs)
                
                # Add outputs to the pattern
                result = pattern.copy()
                result['expected_outputs'] = outputs
                results.append(result)
            except Exception as e:
                print(f"Error computing golden output for pattern {pattern}: {e}")
                result = pattern.copy()
                result['expected_outputs'] = None
                result['error'] = str(e)
                results.append(result)
        
        return results
    
    def _extract_python_code(self, text: str) -> str:
        """
        Extract Python code from LLM response.
        
        Args:
            text: LLM response text
            
        Returns:
            Extracted Python code
        """
        # Try to find code between ```python and ```
        if "```python" in text:
            start = text.find("```python") + len("```python")
            end = text.find("```", start)
            if end != -1:
                return text[start:end].strip()
        
        # Try to find code between ``` and ```
        if "```" in text:
            parts = text.split("```")
            if len(parts) >= 3:
                return parts[1].strip()
        
        # If no code blocks found, look for def statement
        if "def " in text:
            lines = text.split('\n')
            code_lines = []
            in_function = False
            for line in lines:
                if line.strip().startswith('def '):
                    in_function = True
                if in_function:
                    code_lines.append(line)
            return '\n'.join(code_lines)
        
        return text.strip()


## Section 7: Testbench Updater (Step 5)

The `TestbenchUpdater` class enhances the testbench with verification logic. It:
- Injects expected outputs from the golden model
- Adds pass/fail checking for each test case
- Implements test summary reporting
- Maintains proper Verilog formatting

In [None]:
"""
Step 5: Update testbench with golden outputs.
"""

from typing import Dict, Any, List
import re
import json
from .llm_client import LLMClient


class TestbenchUpdater:
    """Update generated testbench with golden outputs using LLM."""
    
    def __init__(self, llm_client: LLMClient):
        """
        Initialize testbench updater.
        
        Args:
            llm_client: LLM client instance
        """
        self.llm_client = llm_client
    
    def update_testbench(self, testbench_code: str, test_patterns_with_outputs: List[Dict[str, Any]], 
                        module_info: Dict[str, Any]) -> str:
        """
        Update testbench code to include expected outputs and verification.
        
        Args:
            testbench_code: Original testbench code without expected outputs
            test_patterns_with_outputs: Test patterns with golden outputs
            module_info: Module information
            
        Returns:
            Updated testbench code with assertions and expected outputs
        """
        # Use LLM if available, otherwise fall back to rule-based approach
        if self.llm_client.is_available():
            updated_code = self._llm_update_testbench(testbench_code, test_patterns_with_outputs, module_info)
        else:
            # Fallback to rule-based approach
            updated_code = self._add_verification_logic(testbench_code, test_patterns_with_outputs, module_info)
        
        return updated_code
    
    def _llm_update_testbench(self, testbench_code: str, test_patterns_with_outputs: List[Dict[str, Any]], 
                             module_info: Dict[str, Any]) -> str:
        """
        Use LLM to update testbench with verification logic.
        
        Args:
            testbench_code: Original testbench code
            test_patterns_with_outputs: Test patterns with golden outputs
            module_info: Module information
            
        Returns:
            Updated testbench code with verification logic
        """
        system_prompt = """You are an expert in Verilog testbench development and verification.
Your task is to update a testbench by adding comprehensive verification logic and expected output checks.
You should:
1. Add verification for each test case using the provided expected outputs
2. Track passed and failed test counts
3. Display clear pass/fail messages for each output signal
4. Generate a comprehensive test summary at the end
5. Maintain the original testbench structure and style
6. Use proper Verilog syntax and best practices"""

        # Prepare test patterns data for the LLM
        patterns_str = json.dumps(test_patterns_with_outputs, indent=2)
        
        user_prompt = f"""Given the following Verilog testbench and test patterns with expected outputs,
update the testbench to include verification logic that checks the actual outputs against the expected outputs.

Original Testbench Code:
```verilog
{testbench_code}
```

Module Information:
- Module Name: {module_info.get('module_name', 'unknown')}
- Inputs: {module_info.get('inputs', [])}
- Outputs: {module_info.get('outputs', [])}

Test Patterns with Expected Outputs:
```json
{patterns_str}
```

Please update the testbench to:
1. Add integer variables 'passed_tests' and 'failed_tests' at the beginning of the initial block (initialized to 0)
2. After each test case (identified by $display statements), add a delay (#10) for outputs to settle
3. For each output signal, compare the actual value against the expected value from the test patterns
4. Display "✓" for passing checks and "✗" for failing checks with actual and expected values
5. Increment passed_tests for each passing check and failed_tests for each failing check
6. At the end of the initial block (before 'end'), add a test summary showing:
   - Total tests run
   - Number passed
   - Number failed
7. Preserve all original test case displays and structure
8. Use proper indentation and formatting

Provide ONLY the complete updated testbench code, no explanations.
Format your response as:
```verilog
[updated testbench code here]
```
"""

        response = self.llm_client.generate(user_prompt, system_prompt, max_tokens=8000)
        
        # Extract the Verilog code from the response
        updated_code = self._extract_verilog_code(response)
        
        # If extraction failed, fall back to original with rule-based update
        if not updated_code or len(updated_code) < len(testbench_code) // 2:
            print("Warning: LLM response extraction failed, using rule-based approach")
            updated_code = self._add_verification_logic(testbench_code, test_patterns_with_outputs, module_info)
        
        return updated_code
    
    def _extract_verilog_code(self, text: str) -> str:
        """
        Extract Verilog code from LLM response.
        
        Args:
            text: LLM response text
            
        Returns:
            Extracted Verilog code
        """
        # Try to find code between ```verilog and ```
        if "```verilog" in text:
            start = text.find("```verilog") + len("```verilog")
            end = text.find("```", start)
            if end != -1:
                return text[start:end].strip()
        
        # Try to find code between ``` and ```
        if "```" in text:
            parts = text.split("```")
            if len(parts) >= 3:
                # Get the first code block
                code = parts[1].strip()
                # If it starts with a language identifier, remove it
                if code.startswith("verilog\n") or code.startswith("verilog "):
                    code = code.split('\n', 1)[1] if '\n' in code else code
                return code.strip()
        
        # If no code blocks found, look for module or testbench keywords
        if "module " in text or "initial " in text:
            return text.strip()
        
        return ""
    
    def _add_verification_logic(self, testbench_code: str, test_patterns: List[Dict[str, Any]], 
                               module_info: Dict[str, Any]) -> str:
        """
        Add verification logic to the testbench.
        
        Args:
            testbench_code: Original testbench code
            test_patterns: Test patterns with expected outputs
            module_info: Module information
            
        Returns:
            Testbench code with verification logic added
        """
        lines = testbench_code.split('\n')
        updated_lines = []
        
        # Track if we're in the initial block
        in_initial = False
        indent_level = 0
        test_case_num = 0
        
        for i, line in enumerate(lines):
            stripped = line.strip()
            
            # Detect initial block
            if 'initial' in stripped and 'begin' in stripped:
                in_initial = True
                updated_lines.append(line)
                # Add test result tracking variables after initial begin
                updated_lines.append("    integer passed_tests = 0;")
                updated_lines.append("    integer failed_tests = 0;")
                updated_lines.append("")
                continue
            
            # Check for test case markers (e.g., $display for test cases)
            if in_initial and '$display' in stripped and ('Test' in stripped or 'test' in stripped):
                # This is likely a test case display
                updated_lines.append(line)
                
                # Add delay to let outputs settle
                indent = len(line) - len(line.lstrip())
                indent_str = ' ' * indent
                updated_lines.append(f"{indent_str}#10; // Wait for outputs to settle")
                
                # Add verification for this test case if we have expected outputs
                if test_case_num < len(test_patterns):
                    pattern = test_patterns[test_case_num]
                    if 'expected_outputs' in pattern and pattern['expected_outputs']:
                        verification_lines = self._generate_verification(
                            pattern, module_info, indent
                        )
                        updated_lines.extend(verification_lines)
                    test_case_num += 1
                continue
            
            # Check for end of initial block
            if in_initial and (stripped == 'end' or stripped.startswith('end')):
                # Add final summary before the end
                indent = len(line) - len(line.lstrip())
                indent_str = ' ' * indent
                updated_lines.append("")
                updated_lines.append(f"{indent_str}// Test Summary")
                updated_lines.append(f'{indent_str}$display("\\n========== Test Summary ==========");')
                updated_lines.append(f'{indent_str}$display("Total Tests: %0d", passed_tests + failed_tests);')
                updated_lines.append(f'{indent_str}$display("Passed: %0d", passed_tests);')
                updated_lines.append(f'{indent_str}$display("Failed: %0d", failed_tests);')
                updated_lines.append(f'{indent_str}$display("==================================\\n");')
                updated_lines.append("")
                updated_lines.append(line)
                in_initial = False
                continue
            
            updated_lines.append(line)
        
        return '\n'.join(updated_lines)
    
    def _generate_verification(self, pattern: Dict[str, Any], module_info: Dict[str, Any], 
                              indent: int) -> List[str]:
        """
        Generate verification code for a single test pattern.
        
        Args:
            pattern: Test pattern with expected outputs
            expected_outputs: Expected output values
            module_info: Module information
            indent: Indentation level
            
        Returns:
            List of verification code lines
        """
        lines = []
        indent_str = ' ' * indent
        
        expected = pattern.get('expected_outputs', {})
        if not expected:
            return lines
        
        # Get output signals
        outputs = module_info.get('outputs', [])
        
        # Generate verification for each output
        for output in outputs:
            output_name = output.split('[')[0].strip()  # Remove bit width if present
            output_name = output_name.split()[-1]  # Get the signal name
            
            if output_name in expected:
                expected_value = expected[output_name]
                
                # Check if expected value is boolean
                if isinstance(expected_value, bool):
                    expected_value = 1 if expected_value else 0
                
                # Generate comparison
                lines.append(f"{indent_str}if ({output_name} === {expected_value}) begin")
                lines.append(f'{indent_str}    $display("  ✓ {output_name} = %b (expected: {expected_value})", {output_name});')
                lines.append(f"{indent_str}    passed_tests = passed_tests + 1;")
                lines.append(f"{indent_str}end else begin")
                lines.append(f'{indent_str}    $display("  ✗ {output_name} = %b (expected: {expected_value})", {output_name});')
                lines.append(f"{indent_str}    failed_tests = failed_tests + 1;")
                lines.append(f"{indent_str}end")
        
        return lines

## Section 8: Pipeline Orchestrator

The `TestbenchPipeline` class coordinates all steps in the generation process. It:
- Manages the flow from description to final testbench
- Handles file I/O for all artifacts
- Provides progress reporting
- Implements graceful degradation when LLM is unavailable

In [None]:
"""
Main pipeline orchestrating the entire testbench generation process.
"""

import json
import os
from typing import Dict, Any, Optional
from .llm_client import LLMClient
from .testbench_generator import TestbenchGenerator
from .golden_model_generator import GoldenModelGenerator
from .testbench_updater import TestbenchUpdater
import subprocess

class TestbenchPipeline:
    """
    Main pipeline for LLM-aided testbench generation.
    
    Steps:
    1. Accept natural language description and Verilog code
    2. Generate testbench with test patterns (no golden outputs)
    3. Generate Python golden model from description
    4. Execute golden model with test patterns to get expected outputs
    5. Update testbench with golden outputs and verification logic
    """
    
    def __init__(self, api_key: Optional[str] = None, model: str = "gpt-4", provider: str = "openai"):
        """
        Initialize the pipeline.
        
        Args:
            api_key: API key for LLM provider
            model: Model name to use
            provider: LLM provider name
        """
        self.llm_client = LLMClient(api_key, model, provider)
        self.testbench_gen = TestbenchGenerator(self.llm_client)
        self.golden_gen = GoldenModelGenerator(self.llm_client)
        self.testbench_updater = TestbenchUpdater(self.llm_client)
        
    def run(self, description: str, verilog_code: str, output_dir: str = "output") -> Dict[str, Any]:
        """
        Run the complete testbench generation pipeline.
        
        Args:
            description: Natural language description of the module
            verilog_code: Verilog code to be tested
            output_dir: Directory to save output files
            
        Returns:
            Dictionary containing all generated artifacts
        """
        print("=" * 80)
        print("LLM-Aided Testbench Generation Pipeline")
        print("=" * 80)
        
        # Create output directory
        os.makedirs(output_dir, exist_ok=True)
        
        # Step 1 & 2: Input handling (description and verilog code are already provided)
        print("\n[Step 1-2] Input: Natural language description and Verilog code received")
        print(f"Description length: {len(description)} characters")
        print(f"Verilog code length: {len(verilog_code)} characters")
        
        # Step 3: Generate testbench with test patterns
        print("\n[Step 3] Generating testbench with test patterns...")
        if not self.llm_client.is_available():
            print("WARNING: LLM client not configured. Using mock generation.")
            testbench_result = self._mock_testbench_generation(verilog_code)
        else:
            testbench_result = self.testbench_gen.generate_testbench(description, verilog_code)
        
        print(f"  - Generated testbench with {len(testbench_result['test_patterns'])} test patterns")
        print(f"  - Module: {testbench_result['module_info']['module_name']}")
        
        # Save initial testbench (without golden outputs)
        initial_tb_path = os.path.join(output_dir, "testbench_initial.v")
        with open(initial_tb_path, 'w') as f:
            f.write(testbench_result['testbench_code'])
        print(f"  - Saved initial testbench to: {initial_tb_path}")
        
        # Step 4: Generate Python golden model and compute golden outputs
        print("\n[Step 4] Generating Python golden model and computing expected outputs...")
        if not self.llm_client.is_available():
            print("WARNING: LLM client not configured. Using mock golden model.")
            python_code = self._mock_python_model(testbench_result['module_info'])
        else:
            python_code = self.golden_gen.generate_python_model(
                description, 
                testbench_result['module_info']
            )
        
        print(f"  - Generated Python golden model ({len(python_code)} characters)")
        
        # change testbench_result['test_patterns'] value to integral values
        patterns = []
        for pattern in testbench_result['test_patterns']:
            for key in pattern:
                if isinstance(pattern[key], str):
                    pattern[key] = int(pattern[key], 2)
            patterns.append(pattern)
        testbench_result['test_patterns'] = patterns

        # Save Python golden model
        python_path = os.path.join(output_dir, "golden_model.py")
        with open(python_path, 'w') as f:
            f.write(python_code)
        print(f"  - Saved Python golden model to: {python_path}")
        
        # Compute golden outputs
        print("  - Computing golden outputs for all test patterns...")
        test_patterns_with_outputs = self.golden_gen.compute_golden_outputs(
            python_code,
            testbench_result['test_patterns'],
            testbench_result['module_info']
        )
        
        successful_patterns = sum(1 for p in test_patterns_with_outputs 
                                 if 'expected_outputs' in p and p['expected_outputs'] is not None)
        print(f"  - Successfully computed outputs for {successful_patterns}/{len(test_patterns_with_outputs)} patterns")
        
        # Save test patterns with golden outputs
        patterns_path = os.path.join(output_dir, "test_patterns_with_golden.json")
        with open(patterns_path, 'w') as f:
            json.dump(test_patterns_with_outputs, f, indent=2)
        print(f"  - Saved test patterns with golden outputs to: {patterns_path}")
        
        # Step 5: Update testbench with golden outputs
        print("\n[Step 5] Updating testbench with golden outputs and verification logic...")
        final_testbench = self.testbench_updater.update_testbench(
            testbench_result['testbench_code'],
            test_patterns_with_outputs,
            testbench_result['module_info']
        )
        
        # Save final testbench
        final_tb_path = os.path.join(output_dir, "testbench_final.v")
        with open(final_tb_path, 'w') as f:
            f.write(final_testbench)
        print(f"  - Saved final testbench to: {final_tb_path}")
        
        print("\n" + "=" * 80)
        print("Pipeline completed successfully!")
        print("=" * 80)
        print(f"\nGenerated files in '{output_dir}':")
        print(f"  - testbench_initial.v    : Initial testbench with test patterns")
        print(f"  - golden_model.py        : Python reference implementation")
        print(f"  - test_patterns_with_golden.json : Test patterns with expected outputs")
        print(f"  - testbench_final.v      : Final testbench with verification")
        print()
        
        return {
            'description': description,
            'verilog_code': verilog_code,
            'module_info': testbench_result['module_info'],
            'test_patterns': testbench_result['test_patterns'],
            'initial_testbench': testbench_result['testbench_code'],
            'python_golden_model': python_code,
            'test_patterns_with_outputs': test_patterns_with_outputs,
            'final_testbench': final_testbench,
            'output_dir': output_dir
        }
    
    def _mock_testbench_generation(self, verilog_code: str) -> Dict[str, Any]:
        """Mock testbench generation when LLM is not available."""
        return {
            'testbench_code': '// Mock testbench - LLM not configured\n' + verilog_code,
            'test_patterns': [],
            'module_info': {
                'module_name': 'unknown',
                'inputs': [],
                'outputs': []
            }
        }
    
    def _mock_python_model(self, module_info: Dict[str, Any]) -> str:
        """Mock Python model generation when LLM is not available."""
        return f"# Mock Python model - LLM not configured\ndef {module_info['module_name']}_golden():\n    pass\n"

## Section 9: Example 1 - 2-to-1 Multiplexer

Now let's run a complete example! We'll generate a testbench for a simple 2-to-1 multiplexer.

### Natural Language Description

In [None]:
# Natural language description of the module
mux_description = """
A 2-to-1 multiplexer (MUX).

The module takes two 1-bit input signals (a and b) and one 1-bit select signal (sel).

Functionality:
- Input 'a': First data input (1-bit)
- Input 'b': Second data input (1-bit)
- Input 'sel': Select signal (1-bit)
- Output 'y': Selected output (1-bit)

When sel is 0, the output y should be equal to input a.
When sel is 1, the output y should be equal to input b.

This is a combinational logic circuit with no state or memory.

"""

print("Natural Language Description:")
print("=" * 80)
print(mux_description)
print("=" * 80)

### Verilog Module Code

In [None]:
# Verilog module to be tested
mux_verilog_code = """
module mux2to1 (
    input wire a,
    input wire b,
    input wire sel,
    output wire y
);
    assign y = sel ? b : a;
endmodule

"""

print("Verilog Module:")
print("=" * 80)
print(mux_verilog_code)
print("=" * 80)

# Save the Verilog code for later simulation
with open(f"{output_dir}/mux2to1.v", "w") as f:
    f.write(mux_verilog_code)

### Run the Complete Pipeline

Now we execute the 5-step pipeline to generate the testbench:

In [None]:
# Initialize and run the pipeline
print("\n" + "=" * 80)
print("Running LLM-Aided Testbench Generation Pipeline for MUX")
print("=" * 80 + "\n")

# Create pipeline instance
pipeline = TestbenchPipeline(
    api_key=os.environ.get('OPENAI_API_KEY'),
    model='gpt-4o',
    provider='openai'
)

# Run the complete pipeline
try:
    result = pipeline.run(
        description=mux_description,
        verilog_code=mux_verilog_code,
        output_dir=f"{output_dir}/mux"
    )
    print("\n✓ MUX testbench generation completed successfully!")
    print(f"\nGenerated files are in: {output_dir}/mux/")
except Exception as e:
    print(f"\n✗ Error: {e}")
    import traceback
    traceback.print_exc()

### View Generated Files

Let's examine what was generated:

In [None]:
# List generated files
import os

print("Generated Files:")
print("=" * 80)
mux_output_dir = f"{output_dir}/mux"
if os.path.exists(mux_output_dir):
    for filename in sorted(os.listdir(mux_output_dir)):
        filepath = os.path.join(mux_output_dir, filename)
        if os.path.isfile(filepath):
            size = os.path.getsize(filepath)
            print(f"  - {filename:35s} ({size:,} bytes)")
else:
    print("  No files generated yet")
print("=" * 80)

### Examine the Initial Testbench

The initial testbench includes test patterns but not the expected outputs:

In [None]:
# Display initial testbench (first 50 lines)
try:
    with open(f"{output_dir}/mux/testbench_initial.v", "r") as f:
        lines = f.readlines()
        print("Initial Testbench (first 50 lines):")
        print("=" * 80)
        for i, line in enumerate(lines[:50], 1):
            print(f"{i:3d}: {line}", end="")
        if len(lines) > 50:
            print(f"\n... ({len(lines) - 50} more lines)")
        print("=" * 80)
except FileNotFoundError:
    print("Initial testbench file not found")

### Examine the Python Golden Model

The golden model implements the expected behavior in Python:

In [None]:
# Display Python golden model
try:
    with open(f"{output_dir}/mux/golden_model.py", "r") as f:
        golden_code = f.read()
        print("Python Golden Model:")
        print("=" * 80)
        print(golden_code)
        print("=" * 80)
except FileNotFoundError:
    print("Golden model file not found")

### Examine Test Patterns with Expected Outputs

The test patterns JSON file contains all test cases with their expected outputs:

In [None]:
# Display test patterns with golden outputs
try:
    with open(f"{output_dir}/mux/test_patterns_with_golden.json", "r") as f:
        patterns = json.load(f)
        print(f"Test Patterns with Golden Outputs ({len(patterns)} patterns):")
        print("=" * 80)
        for i, pattern in enumerate(patterns[:5], 1):  # Show first 5
            print(f"\nPattern {i}:")
            print(f"  Inputs:  {pattern.get('inputs', pattern)}")
            print(f"  Expected: {pattern.get('expected_outputs', 'N/A')}")
        if len(patterns) > 5:
            print(f"\n... ({len(patterns) - 5} more patterns)")
        print("=" * 80)
except FileNotFoundError:
    print("Test patterns file not found")
except json.JSONDecodeError:
    print("Error decoding test patterns JSON")

### Examine the Final Testbench with Verification

The final testbench includes all verification logic:

In [None]:
# Display final testbench (first 60 lines)
try:
    with open(f"{output_dir}/mux/testbench_final.v", "r") as f:
        lines = f.readlines()
        print(f"Final Testbench with Verification ({len(lines)} lines total):")
        print("=" * 80)
        for i, line in enumerate(lines[:60], 1):
            print(f"{i:3d}: {line}", end="")
        if len(lines) > 60:
            print(f"\n... ({len(lines) - 60} more lines)")
        print("=" * 80)
except FileNotFoundError:
    print("Final testbench file not found")

### Simulate the Testbench

If you have Icarus Verilog installed, we can simulate the testbench to see the results:

In [None]:
# Simulate the testbench with Icarus Verilog
try:
    # Compile
    print("Compiling testbench...")
    compile_result = subprocess.run(
        f"iverilog -g2012 -o {output_dir}/mux/sim {output_dir}/mux2to1.v {output_dir}/mux/testbench_final.v",
        shell=True,
        capture_output=True,
        text=True,
        timeout=30
    )
    
    if compile_result.returncode != 0:
        print(f"Compilation failed:")
        print(compile_result.stderr)
    else:
        print("✓ Compilation successful\n")
        
        # Run simulation
        print("Running simulation...")
        sim_result = subprocess.run(
            f"vvp {output_dir}/mux/sim",
            shell=True,
            capture_output=True,
            text=True,
            timeout=30
        )
        
        print("Simulation Output:")
        print("=" * 80)
        print(sim_result.stdout)
        print("=" * 80)
        
        if sim_result.stderr:
            print("Warnings/Errors:")
            print(sim_result.stderr)
            
except subprocess.TimeoutExpired:
    print("Simulation timed out")
except FileNotFoundError:
    print("iverilog not found. Please install Icarus Verilog to run simulations.")
    print("On Ubuntu/Debian: sudo apt-get install iverilog")
    print("On macOS: brew install icarus-verilog")
except Exception as e:
    print(f"Simulation error: {e}")

## Section 10: Example 2 - 4-bit Adder

Let's try another example with a more complex module - a 4-bit adder with carry output.

### Natural Language Description

In [None]:
# Natural language description of the 4-bit adder
adder_description = """
A simple 4-bit adder module.

The module takes two 4-bit input signals (a and b) and produces a 4-bit sum output and a 1-bit carry output.

Functionality:
- Input 'a': 4-bit unsigned number
- Input 'b': 4-bit unsigned number  
- Output 'sum': 4-bit result of a + b (lower 4 bits)
- Output 'carry': 1-bit carry-out flag (set to 1 if result exceeds 15)

The adder performs unsigned addition of the two 4-bit inputs.
If the result is greater than 15 (0xF), the carry output should be set to 1.

"""

print("Natural Language Description:")
print("=" * 80)
print(adder_description)
print("=" * 80)

### Verilog Module Code

In [None]:
# Verilog module to be tested
adder_verilog_code = """
module adder4bit (
    input wire [3:0] a,
    input wire [3:0] b,
    output wire [3:0] sum,
    output wire carry
);
    wire [4:0] result;
    assign result = a + b;
    assign sum = result[3:0];
    assign carry = result[4];
endmodule

"""

print("Verilog Module:")
print("=" * 80)
print(adder_verilog_code)
print("=" * 80)

# Save the Verilog code
with open(f"{output_dir}/adder4bit.v", "w") as f:
    f.write(adder_verilog_code)

### Run the Pipeline for Adder

Execute the complete pipeline for the 4-bit adder:

In [None]:
# Run pipeline for the adder
print("\n" + "=" * 80)
print("Running LLM-Aided Testbench Generation Pipeline for 4-bit Adder")
print("=" * 80 + "\n")

try:
    result = pipeline.run(
        description=adder_description,
        verilog_code=adder_verilog_code,
        output_dir=f"{output_dir}/adder"
    )
    print("\n✓ Adder testbench generation completed successfully!")
    print(f"\nGenerated files are in: {output_dir}/adder/")
except Exception as e:
    print(f"\n✗ Error: {e}")
    import traceback
    traceback.print_exc()

### Simulate the Adder Testbench

Run the simulation for the adder:

In [None]:
# Simulate the adder testbench
try:
    # Compile
    print("Compiling adder testbench...")
    compile_result = subprocess.run(
        f"iverilog -g2012 -o {output_dir}/adder/sim {output_dir}/adder4bit.v {output_dir}/adder/testbench_final.v",
        shell=True,
        capture_output=True,
        text=True,
        timeout=30
    )
    
    if compile_result.returncode != 0:
        print(f"Compilation failed:")
        print(compile_result.stderr)
    else:
        print("✓ Compilation successful\n")
        
        # Run simulation
        print("Running simulation...")
        sim_result = subprocess.run(
            f"vvp {output_dir}/adder/sim",
            shell=True,
            capture_output=True,
            text=True,
            timeout=30
        )
        
        print("Simulation Output:")
        print("=" * 80)
        print(sim_result.stdout)
        print("=" * 80)
            
except FileNotFoundError:
    print("iverilog not found. Skipping simulation.")
except Exception as e:
    print(f"Simulation error: {e}")

## Section 11: Summary and Next Steps

### What We've Accomplished

In this notebook, we've demonstrated a complete LLM-aided testbench generation system that:

1. ✅ Accepts natural language descriptions and Verilog code
2. ✅ Generates comprehensive test patterns using LLM
3. ✅ Creates Python golden models for reference
4. ✅ Computes expected outputs automatically
5. ✅ Produces Verilog testbenches with full verification logic
6. ✅ Runs simulations to validate the design

### Generated Artifacts

For each module, the pipeline produces:
- **testbench_initial.v**: Testbench with test patterns (no verification)
- **golden_model.py**: Python reference implementation
- **test_patterns_with_golden.json**: Test data with expected outputs
- **testbench_final.v**: Complete testbench with verification logic

### Next Steps

You can extend this system by:
- Adding support for sequential circuits and FSMs
- Implementing coverage-driven test generation
- Integrating with formal verification tools
- Adding waveform generation and analysis
- Supporting SystemVerilog and UVM testbenches

### Using Your Own Modules

To generate testbenches for your own Verilog modules:

1. Prepare a clear natural language description
2. Provide the Verilog module code
3. Run the pipeline:
   ```python
   result = pipeline.run(
       description=your_description,
       verilog_code=your_verilog_code,
       output_dir="your_output_dir"
   )
   ```

### Resources

- **Repository**: [github.com/FCHXWH823/LLM-aided-Testbench-Generation](https://github.com/FCHXWH823/LLM-aided-Testbench-Generation)
- **Documentation**: See README.md and USAGE_GUIDE.md in the repository
- **Examples**: Check the `examples/` directory for more samples

Thank you for using LLM-Aided Testbench Generation! 🚀

## Appendix: View All Generated Files

List all files generated during this notebook session:

In [None]:
# List all generated files
import os

print("All Generated Files:")
print("=" * 80)

def list_files(directory, prefix=""):
    """Recursively list all files in a directory"""
    if not os.path.exists(directory):
        print(f"{prefix}(Directory not found)")
        return
        
    for item in sorted(os.listdir(directory)):
        path = os.path.join(directory, item)
        if os.path.isfile(path):
            size = os.path.getsize(path)
            print(f"{prefix}{item:35s} ({size:,} bytes)")
        elif os.path.isdir(path):
            print(f"{prefix}{item}/")
            list_files(path, prefix + "  ")

list_files(output_dir)
print("=" * 80)