Skip to content

Vars-07/commit_analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” Commit Analyzer Agent

An AI-powered agent that analyzes stack traces and GitHub commits to identify potential root cause commits for issues. This tool helps developers quickly pinpoint which commit introduced a bug by leveraging AI models (Gemini or Claude) to analyze relationships between stack traces and commit history.

โœจ Features

  • ๐Ÿ” Stack Trace Analysis: Parse and extract meaningful information from stack traces in multiple languages (Python, Java, JavaScript, C#)
  • ๐Ÿ“Š GitHub Integration: Fetch commit history, diffs, and metadata from GitHub repositories
  • ๐Ÿค– Multi-Model AI Support: Use Gemini, Claude, or AWS Bedrock to analyze relationships between commits and stack traces
  • ๐ŸŽฏ Root Cause Identification: Identify the most likely commit that caused the issue with confidence scores
  • ๐Ÿ”„ Intelligent Fallback: Automatic fallback between AI models for maximum reliability
  • ๐ŸŒ Web Interface: Beautiful Streamlit-based UI for easy interaction
  • ๐Ÿ”Œ API Endpoint: FastAPI backend for programmatic access
  • ๐Ÿ“ˆ Export Results: Export analysis results in JSON, CSV, or Markdown formats
  • ๐Ÿ”„ Batch Processing: Analyze multiple stack traces in batch
  • ๐Ÿ“Š Statistics: Repository commit statistics and analysis metrics
  • ๐Ÿข Enterprise Ready: AWS Bedrock integration for enterprise environments

๐Ÿš€ Quick Start

1. Installation

# Clone the repository
git clone <your-repo-url>
cd commit_analyzer_agent

# Install dependencies
pip install -r requirements.txt

# Run setup script
python setup.py

2. Configuration

Create a .env file with your API keys:

# GitHub API Configuration
GITHUB_TOKEN=your_github_personal_access_token_here

# AI Model API Keys
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# AWS Bedrock Configuration (alternative to direct Anthropic API)
AWS_ACCESS_KEY_ID=your_aws_access_key_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_key_here
AWS_REGION=us-west-2
BEDROCK_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0

# Application Configuration
DEFAULT_MODEL=gemini  # Options: gemini, claude
MAX_COMMITS_TO_ANALYZE=50
MAX_STACK_TRACE_LENGTH=10000

3. Get API Keys

4. Usage

Web Interface

streamlit run app.py

API Server

uvicorn api:app --reload

Python Script

from commit_analyzer import CommitAnalyzer

# Initialize analyzer
analyzer = CommitAnalyzer(model_name="gemini")

# Analyze stack trace
stack_trace = """
Traceback (most recent call last):
  File "app.py", line 25, in <module>
    result = process_data(data)
  File "utils.py", line 42, in process_data
    return data['value'] / data['count']
KeyError: 'count'
"""

results = analyzer.analyze_stack_trace(
    stack_trace=stack_trace,
    repo_url="https://github.com/user/repo",
    branch="main",
    lookback_days=30,
    max_commits=50
)

# Get most likely commit
most_likely = results.get('most_likely_commit')
if most_likely:
    print(f"Most likely commit: {most_likely['commit_sha']}")
    print(f"Confidence: {most_likely['confidence_score']:.2%}")
    print(f"Reasoning: {most_likely['reasoning']}")

๐Ÿค– AI Models

The Commit Analyzer Agent supports multiple AI models for analysis, each with different strengths:

Gemini (Google)

  • Model: gemini-pro
  • API Key: GEMINI_API_KEY
  • Strengths: Fast analysis, good code understanding, cost-effective
  • Use Case: Quick analysis, development environments

Claude (Anthropic)

  • Model: claude-3-sonnet-20240229
  • API Key: ANTHROPIC_API_KEY
  • Strengths: Detailed reasoning, excellent code analysis, comprehensive explanations
  • Use Case: Deep analysis, production environments

Claude via AWS Bedrock

  • Model: anthropic.claude-3-sonnet-20240229-v1:0
  • AWS Credentials: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  • Region: AWS_REGION (default: us-west-2)
  • Strengths: Enterprise integration, AWS security, compliance, cost management
  • Use Case: Enterprise environments, AWS infrastructure

Model Selection

# Use Gemini
agent = CommitAnalyzerAgent(model_name="gemini")

# Use Claude (direct API)
agent = CommitAnalyzerAgent(model_name="claude")

# Use Claude via AWS Bedrock
agent = CommitAnalyzerAgent(model_name="bedrock")

Fallback Mechanism

The agent includes intelligent fallback:

  1. If the selected model fails, it automatically tries other available models
  2. If Bedrock fails, it falls back to direct Claude API, then Gemini
  3. If Claude fails, it falls back to Gemini
  4. Clear error messages indicate which model is being used

๐Ÿ“– Detailed Usage

Stack Trace Analysis

The agent can parse stack traces from various programming languages:

Python

stack_trace = """
Traceback (most recent call last):
  File "app.py", line 25, in <module>
    result = process_data(data)
  File "utils.py", line 42, in process_data
    return data['value'] / data['count']
KeyError: 'count'
"""

JavaScript

TypeError: Cannot read property 'length' of undefined
    at processArray (app.js:15:8)
    at main (app.js:8:4)
    at Object.<anonymous> (app.js:25:1)

Java

Exception in thread "main" java.lang.NullPointerException
    at com.example.App.processData(App.java:42)
    at com.example.App.main(App.java:15)

API Usage

Analyze Stack Trace

curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "your stack trace here",
    "repo_url": "https://github.com/user/repo",
    "branch": "main",
    "lookback_days": 30,
    "max_commits": 50,
    "model": "gemini"
  }'

Batch Analysis

curl -X POST "http://localhost:8000/batch-analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_traces": ["trace1", "trace2"],
    "repo_url": "https://github.com/user/repo"
  }'

Search Commits

curl -X POST "http://localhost:8000/search-commits" \
  -H "Content-Type: application/json" \
  -d '{
    "repo_url": "https://github.com/user/repo",
    "query": "fix bug"
  }'

Analyze Specific Commits with Stacktrace

curl -X POST "http://localhost:8000/analyze-commits-with-stacktrace" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "your stack trace here",
    "commits": [
      {
        "sha": "abc123def456",
        "message": "Fix database connection issue",
        "author": "john@company.com",
        "date": "2024-01-15T10:30:00",
        "files_changed": ["database/connection.py"],
        "additions": 25,
        "deletions": 8,
        "branch": "main",
        "repo_url": "https://github.com/company/repo"
      }
    ],
    "model": "gemini",
    "include_reasoning": true,
    "include_suggestions": true
  }'

Export Results

from utils import ResultExporter

# Export to JSON
json_data = ResultExporter.to_json(results, "results.json")

# Export to CSV
csv_data = ResultExporter.to_csv(results, "results.csv")

# Export to Markdown
md_data = ResultExporter.to_markdown(results, "results.md")

๐Ÿ—๏ธ Architecture

commit_analyzer_agent/
โ”œโ”€โ”€ commit_analyzer.py      # Main analysis logic
โ”œโ”€โ”€ stack_trace_parser.py   # Stack trace parsing utilities
โ”œโ”€โ”€ github_client.py        # GitHub API integration
โ”œโ”€โ”€ ai_models.py           # AI model integration (Gemini/Claude)
โ”œโ”€โ”€ utils.py               # Utility functions
โ”œโ”€โ”€ app.py                 # Streamlit web interface
โ”œโ”€โ”€ api.py                 # FastAPI backend
โ”œโ”€โ”€ setup.py               # Setup script
โ”œโ”€โ”€ requirements.txt       # Dependencies
โ”œโ”€โ”€ examples/              # Example scripts
โ”‚   โ”œโ”€โ”€ basic_usage.py
โ”‚   โ””โ”€โ”€ quick_test.py
โ””โ”€โ”€ training/              # Model training
    โ””โ”€โ”€ train_model.py

๐Ÿค– AI Model Integration

Gemini (Google)

  • Uses Google's Gemini Pro model
  • Requires GEMINI_API_KEY
  • Good for general analysis and reasoning

Claude (Anthropic)

  • Uses Claude 3 Sonnet model
  • Requires ANTHROPIC_API_KEY
  • Excellent for detailed analysis and explanations

Model Selection

# Use Gemini
analyzer = CommitAnalyzer(model_name="gemini")

# Use Claude
analyzer = CommitAnalyzer(model_name="claude")

๐Ÿ“Š Analysis Results

The agent provides comprehensive analysis results:

{
  "stack_trace_analysis": {
    "error_type": "KeyError",
    "error_message": "KeyError: 'count'",
    "language": "python",
    "framework": null,
    "affected_files": ["utils.py"],
    "function_names": ["process_data"],
    "line_numbers": [42]
  },
  "commits_analyzed": 25,
  "analysis_results": [
    {
      "commit_sha": "abc123def",
      "confidence_score": 0.85,
      "reasoning": "This commit modifies data access patterns...",
      "relevant_changes": ["Changed data access from dict[key] to dict.get(key)"],
      "impact_assessment": "High impact - directly affects data access",
      "suggested_fixes": ["Add default value handling", "Use dict.get() method"],
      "model_used": "gemini"
    }
  ],
  "most_likely_commit": {
    "commit_sha": "abc123def",
    "confidence_score": 0.85,
    "reasoning": "This commit modifies data access patterns...",
    "relevant_changes": ["Changed data access from dict[key] to dict.get(key)"],
    "impact_assessment": "High impact - directly affects data access",
    "suggested_fixes": ["Add default value handling", "Use dict.get() method"],
    "model_used": "gemini"
  },
  "model_used": "gemini",
  "analysis_timestamp": "2024-01-15T10:30:00"
}

๐ŸŽฏ Confidence Scoring

The agent provides confidence scores (0.0 to 1.0) for each commit:

  • 0.8-1.0: Very likely to be the root cause
  • 0.6-0.8: Likely to be the root cause
  • 0.4-0.6: Possibly related
  • 0.2-0.4: Unlikely to be related
  • 0.0-0.2: Very unlikely to be related

๐ŸŽฏ Commit Analysis with Stacktrace

The web interface includes a dedicated tab for analyzing specific commits against a stacktrace. This feature allows you to:

Features

  • Manual Commit Entry: Add commits one by one with detailed information
  • JSON Upload: Upload a JSON file containing commit data
  • Repository Fetch: Automatically fetch recent commits from a repository
  • Model Selection: Choose between Gemini, Claude, or Bedrock models
  • Analysis Options: Include/exclude detailed reasoning and suggested fixes
  • Ranked Results: View commits ranked by confidence score
  • Visual Charts: See confidence distribution and ranking charts
  • Export Results: Download analysis results in JSON format

Usage

  1. Navigate to the "๐ŸŽฏ Commit Analysis" tab in the web interface
  2. Enter a stack trace in the left panel
  3. Choose your AI model (Gemini, Claude, or Bedrock)
  4. Select commit input method:
    • Manual Entry: Add commits with SHA, message, author, etc.
    • JSON Upload: Upload a JSON file with commit data
    • Repository Fetch: Use commits from repository settings
  5. Click "๐Ÿš€ Analyze Commits" to start the analysis
  6. View results including:
    • Most potential root cause commit
    • Confidence scores and reasoning
    • Suggested fixes
    • All ranked commits
    • Confidence distribution chart

Sample JSON Format

[
  {
    "sha": "abc123def456",
    "message": "Fix database connection timeout issue",
    "author": "john@company.com",
    "date": "2024-01-15T10:30:00",
    "files_changed": [
      "database/connection.py",
      "database/postgres.py",
      "config/database.yml"
    ],
    "additions": 25,
    "deletions": 8,
    "branch": "main",
    "repo_url": "https://github.com/company/user-service"
  }
]

Testing the API

Use the provided test script to verify the API functionality:

python test_commit_analysis_api.py

This will test the /analyze-commits-with-stacktrace endpoint with sample data and display the results.

๐ŸŽฏ Specific Commits Analysis

Overview

The commit analyzer now supports analyzing stack traces against specific commit SHAs instead of just using a time-based approach. This allows you to:

  • Target specific commits: Analyze only the commits you suspect might be related to the issue
  • Reduce analysis time: Focus on a smaller set of commits for faster results
  • Improve accuracy: Get more targeted analysis when you have specific suspects

Usage

Web Interface

  1. Select "Specific Commit SHAs" in the Analysis Mode section
  2. Enter commit SHAs (one per line) in the text area:
    abc1234def5678
    ghi9012jkl3456
    mno7890pqr1234
    
  3. Run analysis as usual - the system will analyze only the specified commits

Python API

from commit_analyzer import CommitAnalyzer

analyzer = CommitAnalyzer("gemini")

# Define specific commit SHAs to analyze
commit_shas = [
    "abc1234def5678",
    "ghi9012jkl3456", 
    "mno7890pqr1234"
]

# Analyze against specific commits
results = analyzer.analyze_stack_trace_with_specific_commits(
    stack_trace=stack_trace,
    repo_url="https://github.com/user/repo",
    commit_shas=commit_shas,
    branch="main"
)

# Get the most likely commit
most_likely = results.get('most_likely_commit')
if most_likely:
    print(f"Most likely commit: {most_likely['commit_sha']}")
    print(f"Confidence: {most_likely['confidence_score']:.2%}")

REST API

curl -X POST "http://localhost:8000/analyze-specific-commits" \
  -H "Content-Type: application/json" \
  -d '{
    "stack_trace": "Your stack trace here...",
    "repo_url": "https://github.com/user/repo",
    "commit_shas": ["abc1234", "def5678", "ghi9012"],
    "branch": "main",
    "model": "gemini"
  }'

Benefits

  • ๐ŸŽฏ Precision: Analyze only commits you suspect
  • โšก Speed: Faster analysis with fewer commits
  • ๐Ÿ” Focus: Get more detailed analysis of specific commits
  • ๐Ÿ“Š Control: Full control over which commits to analyze

Best Practices

  1. Start with suspects: Begin with commits you suspect might be related
  2. Include context: Add commits around the suspected time period
  3. Validate SHAs: Ensure commit SHAs are valid and accessible
  4. Use short SHAs: 7+ characters are sufficient (full SHA not required)

๐Ÿ”ง Advanced Configuration

Custom Analysis Parameters

analyzer = CommitAnalyzer(model_name="claude")

results = analyzer.analyze_stack_trace(
    stack_trace=stack_trace,
    repo_url=repo_url,
    branch="develop",           # Custom branch
    lookback_days=60,          # Look back 60 days
    max_commits=100            # Analyze up to 100 commits
)

Filtering Results

from utils import CommitAnalyzer as Utils

# Filter by confidence threshold
high_confidence = Utils.filter_by_confidence(results['analysis_results'], threshold=0.7)

# Sort by confidence
sorted_results = Utils.sort_by_confidence(results['analysis_results'])

# Get top commits
top_5 = Utils.get_top_commits(results['analysis_results'], top_n=5)

Performance Monitoring

from utils import PerformanceMonitor

monitor = PerformanceMonitor()
monitor.start()

# Perform analysis
results = analyzer.analyze_stack_trace(stack_trace, repo_url)

monitor.end()
performance = monitor.get_summary()
print(f"Analysis took {performance['duration_seconds']} seconds")

๐Ÿš€ Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

Environment Variables

# Production settings
export GITHUB_TOKEN=your_token
export GEMINI_API_KEY=your_key
export ANTHROPIC_API_KEY=your_key
export DEFAULT_MODEL=gemini
export MAX_COMMITS_TO_ANALYZE=100
export LOG_LEVEL=INFO

๐Ÿ“ˆ Model Training

The agent includes a training framework for fine-tuning models:

# Generate training data
python training/train_model.py

# This will:
# 1. Generate synthetic training data
# 2. Train the model on commit-stack trace relationships
# 3. Evaluate model performance
# 4. Save trained model

๐Ÿงช Testing

# Run basic test
python examples/quick_test.py

# Run comprehensive tests
python -m pytest tests/

# Test API endpoints
curl http://localhost:8000/health

๐Ÿ“š Examples

See the examples/ directory for complete usage examples:

  • basic_usage.py: Basic analysis workflow
  • quick_test.py: Quick test script

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details

๐Ÿ†˜ Support

  • Issues: Create an issue on GitHub
  • Documentation: Check the README and docstrings
  • API Docs: Visit http://localhost:8000/docs when running the API server

๐Ÿ”ฎ Future Enhancements

  • Support for more programming languages
  • Integration with CI/CD pipelines
  • Real-time monitoring and alerts
  • Advanced ML models for better accuracy
  • Integration with issue tracking systems
  • Support for private repositories
  • Batch processing improvements
  • Custom model training interface

Happy debugging! ๐Ÿ›โœจ

About

Commit Analyser helps you analyse the most probable commit that might have caused the issue.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages