# Context-Aware Documentation Generator - Examples and Usage

This notebook demonstrates how to use the Context-Aware Code Documentation Generator with various examples.

## Setup

In [None]:
# Install packages if running in Colab
import sys
import os
from pathlib import Path

# Add project root to path
project_root = Path.cwd()
if 'notebooks' in str(project_root):
    project_root = project_root.parent
sys.path.append(str(project_root))

print(f"Project root: {project_root}")

In [None]:
# Import our modules
from src.parser import create_parser
from src.rag import create_rag_system
from src.llm import create_documentation_generator
from src.git_handler import create_git_handler

import json
import tempfile
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

## Example 1: Basic Code Parsing

In [None]:
# Create sample Python code file
sample_python_code = '''
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

class Calculator:
    def __init__(self):
        self.history = []
    
    def add(self, a, b):
        result = a + b
        self.history.append(f"{a} + {b} = {result}")
        return result
    
    def multiply(self, a, b):
        result = a * b
        self.history.append(f"{a} * {b} = {result}")
        return result
'''

# Write to temporary file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
    f.write(sample_python_code)
    temp_file_path = f.name

print(f"Created temporary file: {temp_file_path}")
print("Sample code:")
print(sample_python_code)

In [None]:
# Initialize parser
parser = create_parser()

# Parse the file
parsed_result = parser.parse_file(temp_file_path)

if parsed_result:
    print("\n📊 Parsing Results:")
    print(f"Language: {parsed_result['language']}")
    print(f"Functions found: {len(parsed_result['functions'])}")
    print(f"Classes found: {len(parsed_result['classes'])}")
    
    print("\n🔧 Functions:")
    for func in parsed_result['functions']:
        print(f"  - {func['name']} (lines {func['start_line']}-{func['end_line']})")
    
    print("\n📦 Classes:")
    for cls in parsed_result['classes']:
        print(f"  - {cls['name']} (lines {cls['start_line']}-{cls['end_line']})")
else:
    print("❌ Failed to parse file")

# Cleanup
os.unlink(temp_file_path)

## Example 2: RAG System Demonstration

In [None]:
# Create a mock parsed codebase
mock_codebase = {
    'files': {
        'math_utils.py': {
            'language': 'python',
            'file_path': 'math_utils.py',
            'functions': [
                {
                    'name': 'add',
                    'text': 'def add(a, b):\n    return a + b',
                    'start_line': 1,
                    'end_line': 2
                },
                {
                    'name': 'multiply',
                    'text': 'def multiply(a, b):\n    return a * b',
                    'start_line': 4,
                    'end_line': 5
                }
            ],
            'classes': [],
            'imports': ['import math'],
            'comments': []
        },
        'string_utils.py': {
            'language': 'python',
            'file_path': 'string_utils.py',
            'functions': [
                {
                    'name': 'reverse_string',
                    'text': 'def reverse_string(s):\n    return s[::-1]',
                    'start_line': 1,
                    'end_line': 2
                }
            ],
            'classes': [],
            'imports': [],
            'comments': []
        }
    },
    'summary': {
        'total_files': 2,
        'languages': ['python'],
        'total_functions': 3,
        'total_classes': 0
    }
}

print("Mock codebase created with:")
print(f"- {mock_codebase['summary']['total_files']} files")
print(f"- {mock_codebase['summary']['total_functions']} functions")
print(f"- Languages: {', '.join(mock_codebase['summary']['languages'])}")

In [None]:
# Initialize RAG system
print("Initializing RAG system...")
rag_system = create_rag_system()

# Prepare code chunks
print("Preparing code chunks...")
code_chunks = rag_system.prepare_code_chunks(mock_codebase)

print(f"\n📊 Prepared {len(code_chunks)} code chunks:")
for i, chunk in enumerate(code_chunks):
    print(f"  {i+1}. {chunk['type']} - {chunk['metadata'].get('name', 'N/A')}")

# Build index
print("\nBuilding FAISS index...")
rag_system.build_index(code_chunks)
print("✅ Index built successfully!")

In [None]:
# Test search functionality
search_queries = [
    "mathematical operations",
    "string manipulation",
    "addition function",
    "reverse text"
]

print("🔍 Testing search functionality:")
print("=" * 50)

for query in search_queries:
    print(f"\nQuery: '{query}'")
    results = rag_system.search(query, k=2)
    
    for i, result in enumerate(results, 1):
        chunk = result['chunk']
        score = result['score']
        print(f"  {i}. Score: {score:.3f} | Type: {chunk['type']} | {chunk['metadata'].get('name', 'N/A')}")
        print(f"     Content preview: {chunk['content'][:100]}...")

## Example 3: Documentation Generation

In [None]:
# Initialize documentation generator
print("Initializing documentation generator...")
print("⚠️  This may take a few minutes to download the model on first run")

try:
    doc_generator = create_documentation_generator()
    print("✅ Documentation generator initialized successfully!")
except Exception as e:
    print(f"❌ Error initializing generator: {e}")
    print("This might be due to memory constraints or network issues.")
    doc_generator = None

In [None]:
# Test docstring generation
if doc_generator:
    test_functions = [
        {
            'code': 'def quicksort(arr):\n    if len(arr) <= 1:\n        return arr\n    pivot = arr[len(arr) // 2]\n    left = [x for x in arr if x < pivot]\n    middle = [x for x in arr if x == pivot]\n    right = [x for x in arr if x > pivot]\n    return quicksort(left) + middle + quicksort(right)',
            'language': 'python',
            'name': 'quicksort'
        },
        {
            'code': 'class BankAccount:\n    def __init__(self, balance=0):\n        self._balance = balance\n    \n    def deposit(self, amount):\n        self._balance += amount\n        return self._balance\n    \n    def withdraw(self, amount):\n        if amount <= self._balance:\n            self._balance -= amount\n            return self._balance\n        raise ValueError("Insufficient funds")',
            'language': 'python',
            'name': 'BankAccount'
        }
    ]
    
    print("🔖 Generating docstrings:")
    print("=" * 50)
    
    for func in test_functions:
        print(f"\n📝 Generating documentation for: {func['name']}")
        print(f"Language: {func['language']}")
        print("\nOriginal code:")
        print(func['code'])
        
        try:
            # Get context from RAG
            context = rag_system.get_context_for_documentation(
                func['code'], 
                'function' if 'def ' in func['code'] else 'class'
            )
            
            # Generate docstring
            docstring = doc_generator.generate_docstring(
                code=func['code'],
                language=func['language'],
                context=context,
                style='google'
            )
            
            print("\n🎯 Generated docstring:")
            print(docstring)
            
        except Exception as e:
            print(f"❌ Error generating docstring: {e}")
        
        print("-" * 50)
else:
    print("⏭️  Skipping docstring generation (model not available)")

## Example 4: Processing a GitHub Repository

In [None]:
# Example of processing a small GitHub repository
# Using a simple, lightweight repository for demonstration

# Initialize git handler
git_handler = create_git_handler()

# Example repository (replace with any small public repo)
repo_url = "https://github.com/octocat/Hello-World.git"

print(f"📥 Attempting to clone repository: {repo_url}")
print("⚠️  This requires internet connection and the repository to be accessible")

try:
    # Clone repository
    repo_path = git_handler.clone_repository(repo_url)
    print(f"✅ Repository cloned to: {repo_path}")
    
    # Get repository info
    repo_info = git_handler.get_repository_info(repo_path)
    print(f"\n📊 Repository Information:")
    print(f"  - Files: {repo_info.get('files_count', 0)}")
    print(f"  - Size: {repo_info.get('total_size_mb', 0)} MB")
    print(f"  - Languages: {', '.join(repo_info.get('languages', []))}")
    
    # Parse the codebase
    if repo_info.get('files_count', 0) > 0:
        print("\n🔍 Parsing codebase...")
        parsed_codebase = parser.parse_codebase(repo_path)
        
        print(f"\n📈 Parsing Results:")
        print(f"  - Files processed: {parsed_codebase['summary']['total_files']}")
        print(f"  - Functions found: {parsed_codebase['summary']['total_functions']}")
        print(f"  - Classes found: {parsed_codebase['summary']['total_classes']}")
        print(f"  - Languages: {', '.join(parsed_codebase['summary']['languages'])}")
        
        # Show some file details
        if parsed_codebase['files']:
            print("\n📁 Files analyzed:")
            for file_path, file_data in list(parsed_codebase['files'].items())[:3]:  # Show first 3 files
                print(f"  - {file_path} ({file_data['language']})")
                if file_data['functions']:
                    print(f"    Functions: {[f['name'] for f in file_data['functions']]}")
                if file_data['classes']:
                    print(f"    Classes: {[c['name'] for c in file_data['classes']]}")
    
    # Cleanup
    git_handler.cleanup(repo_path)
    print("\n🧹 Cleaned up temporary files")
    
except Exception as e:
    print(f"❌ Error processing repository: {e}")
    print("This might be due to network issues, repository access, or parsing errors.")

## Example 5: Multi-Language Support

In [None]:
# Test multi-language parsing
code_samples = {
    'python': '''
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1
''',
    'javascript': '''
function validatePassword(password) {
    const minLength = 8;
    const hasUpperCase = /[A-Z]/.test(password);
    const hasLowerCase = /[a-z]/.test(password);
    const hasNumbers = /\d/.test(password);
    const hasSpecialChar = /[!@#$%^&*(),.?":{}|<>]/.test(password);
    
    return password.length >= minLength && 
           hasUpperCase && 
           hasLowerCase && 
           hasNumbers && 
           hasSpecialChar;
}
''',
    'java': '''
public class LinkedList<T> {
    private Node<T> head;
    private int size;
    
    private static class Node<T> {
        T data;
        Node<T> next;
        
        Node(T data) {
            this.data = data;
        }
    }
    
    public void add(T data) {
        Node<T> newNode = new Node<>(data);
        if (head == null) {
            head = newNode;
        } else {
            Node<T> current = head;
            while (current.next != null) {
                current = current.next;
            }
            current.next = newNode;
        }
        size++;
    }
}
'''
}

print("🌍 Testing multi-language support:")
print("=" * 50)

for language, code in code_samples.items():
    print(f"\n📝 Language: {language.upper()}")
    
    # Create temporary file
    extensions = {'python': '.py', 'javascript': '.js', 'java': '.java'}
    with tempfile.NamedTemporaryFile(mode='w', suffix=extensions[language], delete=False) as f:
        f.write(code)
        temp_file = f.name
    
    try:
        # Parse the file
        parsed = parser.parse_file(temp_file)
        
        if parsed:
            print(f"✅ Parsed successfully")
            print(f"   Language detected: {parsed['language']}")
            print(f"   Functions: {len(parsed['functions'])}")
            print(f"   Classes: {len(parsed['classes'])}")
            
            # Show function/class names
            if parsed['functions']:
                func_names = [f['name'] for f in parsed['functions']]
                print(f"   Function names: {', '.join(func_names)}")
            
            if parsed['classes']:
                class_names = [c['name'] for c in parsed['classes']]
                print(f"   Class names: {', '.join(class_names)}")
        else:
            print(f"❌ Failed to parse {language} code")
            
    except Exception as e:
        print(f"❌ Error parsing {language}: {e}")
    
    finally:
        # Cleanup
        if os.path.exists(temp_file):
            os.unlink(temp_file)
    
    print("-" * 30)

## Example 6: Complete Workflow Demonstration

In [None]:
# Create a small mock project structure
import os
import tempfile
import shutil

# Create temporary directory
temp_project_dir = tempfile.mkdtemp(prefix='demo_project_')
print(f"📁 Created temporary project directory: {temp_project_dir}")

# Create mock project files
files_to_create = {
    'main.py': '''
#!/usr/bin/env python3
"""
Main entry point for the application.
"""

from utils import helper_function
from models import DataProcessor

def main():
    processor = DataProcessor()
    result = processor.process([1, 2, 3, 4, 5])
    print(f"Result: {result}")
    
    helper_result = helper_function("test")
    print(f"Helper result: {helper_result}")

if __name__ == "__main__":
    main()
''',
    'utils.py': '''
def helper_function(input_string):
    return input_string.upper()

def calculate_average(numbers):
    if not numbers:
        return 0
    return sum(numbers) / len(numbers)

def find_max(numbers):
    return max(numbers) if numbers else None
''',
    'models.py': '''
class DataProcessor:
    def __init__(self):
        self.processed_count = 0
    
    def process(self, data):
        self.processed_count += 1
        return [x * 2 for x in data]
    
    def get_stats(self):
        return {"processed_count": self.processed_count}

class Config:
    DEBUG = True
    VERSION = "1.0.0"
'''
}

# Write files
for filename, content in files_to_create.items():
    file_path = os.path.join(temp_project_dir, filename)
    with open(file_path, 'w') as f:
        f.write(content.strip())
    print(f"  ✅ Created {filename}")

print(f"\n📊 Demo project created with {len(files_to_create)} files")

In [None]:
# Run complete workflow on the demo project
print("🚀 Running complete documentation generation workflow:")
print("=" * 60)

try:
    # Step 1: Parse the entire codebase
    print("\n1️⃣ Parsing codebase...")
    parsed_codebase = parser.parse_codebase(temp_project_dir)
    
    print(f"   ✅ Parsed {parsed_codebase['summary']['total_files']} files")
    print(f"   📊 Found {parsed_codebase['summary']['total_functions']} functions")
    print(f"   📦 Found {parsed_codebase['summary']['total_classes']} classes")
    
    # Step 2: Build RAG index
    print("\n2️⃣ Building RAG index...")
    code_chunks = rag_system.prepare_code_chunks(parsed_codebase)
    rag_system.build_index(code_chunks)
    print(f"   ✅ Built index with {len(code_chunks)} chunks")
    
    # Step 3: Generate documentation (if model is available)
    if doc_generator:
        print("\n3️⃣ Generating documentation...")
        
        doc_count = 0
        for file_path, file_data in parsed_codebase['files'].items():
            print(f"\n   📝 Processing {os.path.basename(file_path)}:")
            
            # Document functions
            for func in file_data['functions'][:2]:  # Limit to first 2 functions per file
                try:
                    context = rag_system.get_context_for_documentation(
                        func.get('text', ''), 'function'
                    )
                    
                    docstring = doc_generator.generate_docstring(
                        code=func.get('text', ''),
                        language=file_data['language'],
                        context=context[:100],  # Limit context for demo
                        style='google'
                    )
                    
                    print(f"     🔧 {func['name']}: Generated docstring ({len(docstring)} chars)")
                    doc_count += 1
                    
                except Exception as e:
                    print(f"     ❌ Error documenting {func['name']}: {e}")
            
            # Document classes
            for cls in file_data['classes'][:1]:  # Limit to first class per file
                try:
                    context = rag_system.get_context_for_documentation(
                        cls.get('text', ''), 'class'
                    )
                    
                    docstring = doc_generator.generate_docstring(
                        code=cls.get('text', ''),
                        language=file_data['language'],
                        context=context[:100],  # Limit context for demo
                        style='google'
                    )
                    
                    print(f"     📦 {cls['name']}: Generated docstring ({len(docstring)} chars)")
                    doc_count += 1
                    
                except Exception as e:
                    print(f"     ❌ Error documenting {cls['name']}: {e}")
        
        print(f"\n   ✅ Generated documentation for {doc_count} items")
        
        # Generate markdown documentation
        print("\n4️⃣ Generating project README...")
        try:
            markdown_docs = doc_generator.generate_markdown_docs(
                parsed_codebase, 
                "Demo project showcasing multi-file documentation generation"
            )
            print(f"   ✅ Generated README ({len(markdown_docs)} characters)")
            print(f"   📄 README preview: {markdown_docs[:150]}...")
        except Exception as e:
            print(f"   ❌ Error generating README: {e}")
    
    else:
        print("\n3️⃣ ⏭️  Skipping documentation generation (model not available)")
    
    print("\n🎉 Workflow completed successfully!")
    
except Exception as e:
    print(f"\n❌ Workflow error: {e}")

finally:
    # Cleanup
    if os.path.exists(temp_project_dir):
        shutil.rmtree(temp_project_dir)
        print(f"\n🧹 Cleaned up temporary project directory")

## Summary and Next Steps

This notebook demonstrated the key capabilities of the Context-Aware Code Documentation Generator:

### ✅ What We Covered:
1. **Multi-language parsing** with tree-sitter
2. **RAG system** for contextual understanding
3. **Documentation generation** with Phi-3 model
4. **GitHub repository processing**
5. **End-to-end workflow** demonstration

### 🚀 Next Steps:
1. **Scale up**: Try with larger, real-world repositories
2. **Fine-tune**: Use the training notebook to customize the model
3. **Integrate**: Set up the web interface for easier usage
4. **Customize**: Adapt prompts and styles for your specific needs
5. **Deploy**: Use the CLI tool for batch processing

### 💡 Tips for Better Results:
- Use repositories with clean, well-structured code
- Ensure sufficient GPU memory for larger codebases
- Fine-tune the model on domain-specific code
- Experiment with different documentation styles
- Leverage the RAG system for better context awareness

### 🔧 For Production Use:
- Set up proper environment variables
- Configure logging and monitoring
- Implement error handling and retries
- Add caching for frequently processed repositories
- Consider using the FastAPI backend for scalability