🔍 Commit Analyzer Agent

An AI-powered agent that analyzes stack traces and GitHub commits to identify potential root cause commits for issues. This tool helps developers quickly pinpoint which commit introduced a bug by leveraging AI models (Gemini or Claude) to analyze relationships between stack traces and commit history.

✨ Features

🔍 Stack Trace Analysis: Parse and extract meaningful information from stack traces in multiple languages (Python, Java, JavaScript, C#)
📊 GitHub Integration: Fetch commit history, diffs, and metadata from GitHub repositories
🤖 Multi-Model AI Support: Use Gemini, Claude, or AWS Bedrock to analyze relationships between commits and stack traces
🎯 Root Cause Identification: Identify the most likely commit that caused the issue with confidence scores
🔄 Intelligent Fallback: Automatic fallback between AI models for maximum reliability
🌐 Web Interface: Beautiful Streamlit-based UI for easy interaction
🔌 API Endpoint: FastAPI backend for programmatic access
📈 Export Results: Export analysis results in JSON, CSV, or Markdown formats
🔄 Batch Processing: Analyze multiple stack traces in batch
📊 Statistics: Repository commit statistics and analysis metrics
🏢 Enterprise Ready: AWS Bedrock integration for enterprise environments

🚀 Quick Start

1. Installation

# Clone the repository
git clone <your-repo-url>
cd commit_analyzer_agent

# Install dependencies
pip install -r requirements.txt

# Run setup script
python setup.py

2. Configuration

Create a .env file with your API keys:

# GitHub API Configuration
GITHUB_TOKEN=your_github_personal_access_token_here

# AI Model API Keys
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# AWS Bedrock Configuration (alternative to direct Anthropic API)
AWS_ACCESS_KEY_ID=your_aws_access_key_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_key_here
AWS_REGION=us-west-2
BEDROCK_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0

# Application Configuration
DEFAULT_MODEL=gemini  # Options: gemini, claude
MAX_COMMITS_TO_ANALYZE=50
MAX_STACK_TRACE_LENGTH=10000

3. Get API Keys

GitHub Token: Create a personal access token
Gemini API Key: Get from Google AI Studio
Claude API Key: Get from Anthropic Console
AWS Credentials: Configure AWS credentials for Bedrock access

4. Usage

Web Interface

streamlit run app.py

API Server

uvicorn api:app --reload

Python Script

from commit_analyzer import CommitAnalyzer

# Initialize analyzer
analyzer = CommitAnalyzer(model_name="gemini")

# Analyze stack trace
stack_trace = """
Traceback (most recent call last):
  File "app.py", line 25, in <module>
    result = process_data(data)
  File "utils.py", line 42, in process_data
    return data['value'] / data['count']
KeyError: 'count'
"""

results = analyzer.analyze_stack_trace(
    stack_trace=stack_trace,
    repo_url="https://github.com/user/repo",
    branch="main",
    lookback_days=30,
    max_commits=50
)

# Get most likely commit
most_likely = results.get('most_likely_commit')
if most_likely:
    print(f"Most likely commit: {most_likely['commit_sha']}")
    print(f"Confidence: {most_likely['confidence_score']:.2%}")
    print(f"Reasoning: {most_likely['reasoning']}")

🤖 AI Models

The Commit Analyzer Agent supports multiple AI models for analysis, each with different strengths:

Gemini (Google)

Model: gemini-pro
API Key: GEMINI_API_KEY
Strengths: Fast analysis, good code understanding, cost-effective
Use Case: Quick analysis, development environments

Claude (Anthropic)

Model: claude-3-sonnet-20240229
API Key: ANTHROPIC_API_KEY
Strengths: Detailed reasoning, excellent code analysis, comprehensive explanations
Use Case: Deep analysis, production environments

Claude via AWS Bedrock

Model: anthropic.claude-3-sonnet-20240229-v1:0
AWS Credentials: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Region: AWS_REGION (default: us-west-2)
Strengths: Enterprise integration, AWS security, compliance, cost management
Use Case: Enterprise environments, AWS infrastructure

Model Selection

# Use Gemini
agent = CommitAnalyzerAgent(model_name="gemini")

# Use Claude (direct API)
agent = CommitAnalyzerAgent(model_name="claude")

# Use Claude via AWS Bedrock
agent = CommitAnalyzerAgent(model_name="bedrock")

Fallback Mechanism

The agent includes intelligent fallback:

If the selected model fails, it automatically tries other available models
If Bedrock fails, it falls back to direct Claude API, then Gemini
If Claude fails, it falls back to Gemini
Clear error messages indicate which model is being used

📖 Detailed Usage

Stack Trace Analysis

The agent can parse stack traces from various programming languages:

Python

stack_trace = """
Traceback (most recent call last):
  File "app.py", line 25, in <module>
    result = process_data(data)
  File "utils.py", line 42, in process_data
    return data['value'] / data['count']
KeyError: 'count'
"""

JavaScript

TypeError: Cannot read property 'length' of undefined
    at processArray (app.js:15:8)
    at main (app.js:8:4)
    at Object.<anonymous> (app.js:25:1)

Java

Exception in thread "main" java.lang.NullPointerException
    at com.example.App.processData(App.java:42)
    at com.example.App.main(App.java:15)

API Usage

Analyze Stack Trace

curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "your stack trace here",
    "repo_url": "https://github.com/user/repo",
    "branch": "main",
    "lookback_days": 30,
    "max_commits": 50,
    "model": "gemini"
  }'

Batch Analysis

curl -X POST "http://localhost:8000/batch-analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_traces": ["trace1", "trace2"],
    "repo_url": "https://github.com/user/repo"
  }'

Search Commits

curl -X POST "http://localhost:8000/search-commits" \
  -H "Content-Type: application/json" \
  -d '{
    "repo_url": "https://github.com/user/repo",
    "query": "fix bug"
  }'

Analyze Specific Commits with Stacktrace

curl -X POST "http://localhost:8000/analyze-commits-with-stacktrace" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "your stack trace here",
    "commits": [
      {
        "sha": "abc123def456",
        "message": "Fix database connection issue",
        "author": "john@company.com",
        "date": "2024-01-15T10:30:00",
        "files_changed": ["database/connection.py"],
        "additions": 25,
        "deletions": 8,
        "branch": "main",
        "repo_url": "https://github.com/company/repo"
      }
    ],
    "model": "gemini",
    "include_reasoning": true,
    "include_suggestions": true
  }'

Export Results

from utils import ResultExporter

# Export to JSON
json_data = ResultExporter.to_json(results, "results.json")

# Export to CSV
csv_data = ResultExporter.to_csv(results, "results.csv")

# Export to Markdown
md_data = ResultExporter.to_markdown(results, "results.md")

🏗️ Architecture

commit_analyzer_agent/
├── commit_analyzer.py      # Main analysis logic
├── stack_trace_parser.py   # Stack trace parsing utilities
├── github_client.py        # GitHub API integration
├── ai_models.py           # AI model integration (Gemini/Claude)
├── utils.py               # Utility functions
├── app.py                 # Streamlit web interface
├── api.py                 # FastAPI backend
├── setup.py               # Setup script
├── requirements.txt       # Dependencies
├── examples/              # Example scripts
│   ├── basic_usage.py
│   └── quick_test.py
└── training/              # Model training
    └── train_model.py

🤖 AI Model Integration

Gemini (Google)

Uses Google's Gemini Pro model
Requires GEMINI_API_KEY
Good for general analysis and reasoning

Claude (Anthropic)

Uses Claude 3 Sonnet model
Requires ANTHROPIC_API_KEY
Excellent for detailed analysis and explanations

Model Selection

# Use Gemini
analyzer = CommitAnalyzer(model_name="gemini")

# Use Claude
analyzer = CommitAnalyzer(model_name="claude")

📊 Analysis Results

The agent provides comprehensive analysis results:

{
  "stack_trace_analysis": {
    "error_type": "KeyError",
    "error_message": "KeyError: 'count'",
    "language": "python",
    "framework": null,
    "affected_files": ["utils.py"],
    "function_names": ["process_data"],
    "line_numbers": [42]
  },
  "commits_analyzed": 25,
  "analysis_results": [
    {
      "commit_sha": "abc123def",
      "confidence_score": 0.85,
      "reasoning": "This commit modifies data access patterns...",
      "relevant_changes": ["Changed data access from dict[key] to dict.get(key)"],
      "impact_assessment": "High impact - directly affects data access",
      "suggested_fixes": ["Add default value handling", "Use dict.get() method"],
      "model_used": "gemini"
    }
  ],
  "most_likely_commit": {
    "commit_sha": "abc123def",
    "confidence_score": 0.85,
    "reasoning": "This commit modifies data access patterns...",
    "relevant_changes": ["Changed data access from dict[key] to dict.get(key)"],
    "impact_assessment": "High impact - directly affects data access",
    "suggested_fixes": ["Add default value handling", "Use dict.get() method"],
    "model_used": "gemini"
  },
  "model_used": "gemini",
  "analysis_timestamp": "2024-01-15T10:30:00"
}

🎯 Confidence Scoring

The agent provides confidence scores (0.0 to 1.0) for each commit:

0.8-1.0: Very likely to be the root cause
0.6-0.8: Likely to be the root cause
0.4-0.6: Possibly related
0.2-0.4: Unlikely to be related
0.0-0.2: Very unlikely to be related

🎯 Commit Analysis with Stacktrace

The web interface includes a dedicated tab for analyzing specific commits against a stacktrace. This feature allows you to:

Features

Manual Commit Entry: Add commits one by one with detailed information
JSON Upload: Upload a JSON file containing commit data
Repository Fetch: Automatically fetch recent commits from a repository
Model Selection: Choose between Gemini, Claude, or Bedrock models
Analysis Options: Include/exclude detailed reasoning and suggested fixes
Ranked Results: View commits ranked by confidence score
Visual Charts: See confidence distribution and ranking charts
Export Results: Download analysis results in JSON format

Usage

Navigate to the "🎯 Commit Analysis" tab in the web interface
Enter a stack trace in the left panel
Choose your AI model (Gemini, Claude, or Bedrock)
Select commit input method:
- Manual Entry: Add commits with SHA, message, author, etc.
- JSON Upload: Upload a JSON file with commit data
- Repository Fetch: Use commits from repository settings
Click "🚀 Analyze Commits" to start the analysis
View results including:
- Most potential root cause commit
- Confidence scores and reasoning
- Suggested fixes
- All ranked commits
- Confidence distribution chart

Sample JSON Format

[
  {
    "sha": "abc123def456",
    "message": "Fix database connection timeout issue",
    "author": "john@company.com",
    "date": "2024-01-15T10:30:00",
    "files_changed": [
      "database/connection.py",
      "database/postgres.py",
      "config/database.yml"
    ],
    "additions": 25,
    "deletions": 8,
    "branch": "main",
    "repo_url": "https://github.com/company/user-service"
  }
]

Testing the API

Use the provided test script to verify the API functionality:

python test_commit_analysis_api.py

This will test the /analyze-commits-with-stacktrace endpoint with sample data and display the results.

🎯 Specific Commits Analysis

Overview

The commit analyzer now supports analyzing stack traces against specific commit SHAs instead of just using a time-based approach. This allows you to:

Target specific commits: Analyze only the commits you suspect might be related to the issue
Reduce analysis time: Focus on a smaller set of commits for faster results
Improve accuracy: Get more targeted analysis when you have specific suspects

Usage

Web Interface

Select "Specific Commit SHAs" in the Analysis Mode section
Enter commit SHAs (one per line) in the text area:
```
abc1234def5678
ghi9012jkl3456
mno7890pqr1234
```
Run analysis as usual - the system will analyze only the specified commits

Python API

from commit_analyzer import CommitAnalyzer

analyzer = CommitAnalyzer("gemini")

# Define specific commit SHAs to analyze
commit_shas = [
    "abc1234def5678",
    "ghi9012jkl3456", 
    "mno7890pqr1234"
]

# Analyze against specific commits
results = analyzer.analyze_stack_trace_with_specific_commits(
    stack_trace=stack_trace,
    repo_url="https://github.com/user/repo",
    commit_shas=commit_shas,
    branch="main"
)

# Get the most likely commit
most_likely = results.get('most_likely_commit')
if most_likely:
    print(f"Most likely commit: {most_likely['commit_sha']}")
    print(f"Confidence: {most_likely['confidence_score']:.2%}")

REST API

curl -X POST "http://localhost:8000/analyze-specific-commits" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "Your stack trace here...",
    "repo_url": "https://github.com/user/repo",
    "commit_shas": ["abc1234", "def5678", "ghi9012"],
    "branch": "main",
    "model": "gemini"
  }'

Benefits

🎯 Precision: Analyze only commits you suspect
⚡ Speed: Faster analysis with fewer commits
🔍 Focus: Get more detailed analysis of specific commits
📊 Control: Full control over which commits to analyze

Best Practices

Start with suspects: Begin with commits you suspect might be related
Include context: Add commits around the suspected time period
Validate SHAs: Ensure commit SHAs are valid and accessible
Use short SHAs: 7+ characters are sufficient (full SHA not required)

🔧 Advanced Configuration

Custom Analysis Parameters

analyzer = CommitAnalyzer(model_name="claude")

results = analyzer.analyze_stack_trace(
    stack_trace=stack_trace,
    repo_url=repo_url,
    branch="develop",           # Custom branch
    lookback_days=60,          # Look back 60 days
    max_commits=100            # Analyze up to 100 commits
)

Filtering Results

from utils import CommitAnalyzer as Utils

# Filter by confidence threshold
high_confidence = Utils.filter_by_confidence(results['analysis_results'], threshold=0.7)

# Sort by confidence
sorted_results = Utils.sort_by_confidence(results['analysis_results'])

# Get top commits
top_5 = Utils.get_top_commits(results['analysis_results'], top_n=5)

Performance Monitoring

from utils import PerformanceMonitor

monitor = PerformanceMonitor()
monitor.start()

# Perform analysis
results = analyzer.analyze_stack_trace(stack_trace, repo_url)

monitor.end()
performance = monitor.get_summary()
print(f"Analysis took {performance['duration_seconds']} seconds")

🚀 Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

Environment Variables

# Production settings
export GITHUB_TOKEN=your_token
export GEMINI_API_KEY=your_key
export ANTHROPIC_API_KEY=your_key
export DEFAULT_MODEL=gemini
export MAX_COMMITS_TO_ANALYZE=100
export LOG_LEVEL=INFO

📈 Model Training

The agent includes a training framework for fine-tuning models:

# Generate training data
python training/train_model.py

# This will:
# 1. Generate synthetic training data
# 2. Train the model on commit-stack trace relationships
# 3. Evaluate model performance
# 4. Save trained model

🧪 Testing

# Run basic test
python examples/quick_test.py

# Run comprehensive tests
python -m pytest tests/

# Test API endpoints
curl http://localhost:8000/health

📚 Examples

See the examples/ directory for complete usage examples:

basic_usage.py: Basic analysis workflow
quick_test.py: Quick test script

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - see LICENSE file for details

🆘 Support

Issues: Create an issue on GitHub
Documentation: Check the README and docstrings
API Docs: Visit http://localhost:8000/docs when running the API server

🔮 Future Enhancements

Support for more programming languages
Integration with CI/CD pipelines
Real-time monitoring and alerts
Advanced ML models for better accuracy
Integration with issue tracking systems
Support for private repositories
Batch processing improvements
Custom model training interface

Happy debugging! 🐛✨

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
examples		examples
training		training
README.md		README.md
ai_models.py		ai_models.py
api.py		api.py
app.py		app.py
check_bedrock_model_access.py		check_bedrock_model_access.py
check_model_availability.py		check_model_availability.py
check_proxies.py		check_proxies.py
commit_analyzer.py		commit_analyzer.py
fix_bedrock_access.py		fix_bedrock_access.py
fix_model_access.py		fix_model_access.py
github_client.py		github_client.py
requirements.txt		requirements.txt
setup.py		setup.py
stack_trace_parser.py		stack_trace_parser.py
utils.py		utils.py

Vars-07/commit_analyser

Folders and files

Latest commit

History

Repository files navigation

🔍 Commit Analyzer Agent

✨ Features

🚀 Quick Start

1. Installation

2. Configuration

3. Get API Keys

4. Usage

Web Interface

API Server

Python Script

🤖 AI Models

Gemini (Google)

Claude (Anthropic)

Claude via AWS Bedrock

Model Selection

Fallback Mechanism

📖 Detailed Usage

Stack Trace Analysis

Python

JavaScript

Java

API Usage

Analyze Stack Trace

Batch Analysis

Search Commits

Analyze Specific Commits with Stacktrace

Export Results

🏗️ Architecture

🤖 AI Model Integration

Gemini (Google)

Claude (Anthropic)

Model Selection

📊 Analysis Results

🎯 Confidence Scoring

🎯 Commit Analysis with Stacktrace

Features

Usage

Sample JSON Format

Testing the API

🎯 Specific Commits Analysis

Overview

Usage

Web Interface

Python API

REST API

Benefits

Best Practices

🔧 Advanced Configuration

Custom Analysis Parameters

Filtering Results

Performance Monitoring

🚀 Deployment

Docker Deployment

Environment Variables

📈 Model Training

🧪 Testing

📚 Examples

🤝 Contributing

📄 License

🆘 Support

🔮 Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages