# Code2Vec-BugHunter: Deep Learning for Bug Detection

This notebook demonstrates how to use the Code2Vec-BugHunter system to detect bugs in Python code using deep learning.

## 1. Setup

First, let's import the necessary modules and set up the environment.

In [None]:
import sys
import os
import logging
import matplotlib.pyplot as plt
import numpy as np

# Add the parent directory to the path to import modules
sys.path.append('..')

# Import project modules
from model import Code2VecBugHunter
from inference import run_inference
from utils.visualization import visualize_attention
from utils.ast_utils import normalize_and_parse_code, extract_ast_paths

# Configure logging
logging.basicConfig(level=logging.INFO)

# Check for model directory
os.makedirs('../models', exist_ok=True)

## 2. Load or Create Model

We'll load a pre-trained model if available, or create a dummy model for demonstration.

In [None]:
model_path = '../models/code2vec_bughunter.pt'

if not os.path.exists(model_path):
    print("No pre-trained model found. Creating a dummy model for demonstration...")
    
    # Create a dummy model
    model = Code2VecBugHunter(
        path_vocab_size=100,
        embedding_dim=128,
        hidden_dim=128,
        num_layers=2
    )
    
    model.save(model_path)
    print(f"Dummy model created and saved to {model_path}")
else:
    print(f"Loading pre-trained model from {model_path}")
    model = Code2VecBugHunter.load(model_path)
    print("Model loaded successfully")

## 3. Code Analysis Examples

Let's analyze some example code snippets to see how the bug detection works.

### Example 1: Off-by-one error (Missing bounds check)

In [None]:
code_example_1 = """
def get_element(arr, index):
    return arr[index]  # Missing bounds check
"""

result_1 = run_inference(model_path, code_example_1)

print(f"Bug Detection Result:")
print(f"  - Buggy: {'Yes' if result_1['is_buggy'] else 'No'}")
print(f"  - Confidence: {result_1['confidence']:.4f}")
print(f"  - Top attention areas:")
for i, (node, weight) in enumerate(result_1['attention'][:5], 1):
    print(f"    {i}. {node}: {weight:.4f}")

### Example 2: Null reference (Key error)

In [None]:
code_example_2 = """
def process_data(data):
    result = data['key']  # KeyError if key doesn't exist
    return result * 2
"""

result_2 = run_inference(model_path, code_example_2)

print(f"Bug Detection Result:")
print(f"  - Buggy: {'Yes' if result_2['is_buggy'] else 'No'}")
print(f"  - Confidence: {result_2['confidence']:.4f}")
print(f"  - Top attention areas:")
for i, (node, weight) in enumerate(result_2['attention'][:5], 1):
    print(f"    {i}. {node}: {weight:.4f}")

### Example 3: Division by zero

In [None]:
code_example_3 = """
def calculate_average(numbers):
    total = sum(numbers)
    return total / len(numbers)  # Division by zero if empty
"""

result_3 = run_inference(model_path, code_example_3)

print(f"Bug Detection Result:")
print(f"  - Buggy: {'Yes' if result_3['is_buggy'] else 'No'}")
print(f"  - Confidence: {result_3['confidence']:.4f}")
print(f"  - Top attention areas:")
for i, (node, weight) in enumerate(result_3['attention'][:5], 1):
    print(f"    {i}. {node}: {weight:.4f}")

### Example 4: Safe code (no bugs)

In [None]:
code_example_4 = """
def get_element(arr, index):
    if 0 <= index < len(arr):
        return arr[index]
    return None
"""

result_4 = run_inference(model_path, code_example_4)

print(f"Bug Detection Result:")
print(f"  - Buggy: {'Yes' if result_4['is_buggy'] else 'No'}")
print(f"  - Confidence: {result_4['confidence']:.4f}")
print(f"  - Top attention areas:")
for i, (node, weight) in enumerate(result_4['attention'][:5], 1):
    print(f"    {i}. {node}: {weight:.4f}")

## 4. Visualizing the Model's Attention

Now let's look at how we can visualize the model's attention to understand why it's flagging certain code as buggy.

In [None]:
# Plot attention weights for the first example
attention_weights = [weight for _, weight in result_1['attention'][:10]]
labels = [f"Path {i+1}" for i in range(len(attention_weights))]

plt.figure(figsize=(10, 6))
plt.bar(labels, attention_weights, color='royalblue')
plt.xlabel('Path Context')
plt.ylabel('Attention Weight')
plt.title('Attention Distribution for Buggy Code')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 5. Understanding AST Paths

Let's look at how source code is converted to AST paths for the model input.

In [None]:
# Parse code to AST
ast_tree = normalize_and_parse_code(code_example_1)

# Extract paths
paths = extract_ast_paths(ast_tree, max_paths=10, max_length=8)

# Display the paths
print(f"AST Paths extracted from the code (showing {len(paths)} paths):")
for i, path in enumerate(paths, 1):
    print(f"Path {i}:")
    print(f"  Start Token: {path['start_token']}")
    print(f"  Path: {path['path']}")
    print(f"  End Token: {path['end_token']}")
    print()

## 6. Try Your Own Code

Let's analyze a custom code sample.

In [None]:
# Enter your own code to analyze
custom_code = """
def find_max(numbers):
    max_value = numbers[0]  # Will fail if numbers is empty
    for num in numbers:
        if num > max_value:
            max_value = num
    return max_value
"""

custom_result = run_inference(model_path, custom_code)

print(f"Bug Detection Result:")
print(f"  - Buggy: {'Yes' if custom_result['is_buggy'] else 'No'}")
print(f"  - Confidence: {custom_result['confidence']:.4f}")
print(f"  - Top attention areas:")
for i, (node, weight) in enumerate(custom_result['attention'][:5], 1):
    print(f"    {i}. {node}: {weight:.4f}")

## 7. Summary

In this notebook, we've demonstrated:
1. Loading and using the Code2Vec-BugHunter model
2. Analyzing code for potential bugs
3. Visualizing model attention to understand bug detection
4. How source code is converted to AST paths for model input

This approach enables deep learning to understand code structure and detect bugs by learning patterns from examples.