🔥 Git Spark

Enterprise-Grade Git Repository Analytics & Reporting

Transform your Git history into actionable insights with beautiful, interactive reports

🎨 Live Demo • 📖 Documentation • 🚀 Quick Start • 💡 Examples

🌟 Why Git Spark?

Git Spark provides transparent, honest insights into Git repository health, team organization patterns, and code quality through comprehensive analysis of commit history. Built for enterprise environments with performance, reliability, security, and analytical integrity in mind.

✨ What Makes Git Spark Special

🎯 Analytical Honesty - Report only what can be accurately derived from Git data, never guess or fabricate metrics
🎨 Beautiful Reports - Interactive HTML dashboards with GitHub-style contributions calendar and dark mode
⚡ Lightning Fast - Process 100k+ commits in minutes with intelligent caching and streaming
🔒 Security First - CSP/SRI hardened reports, input validation, and optional email redaction
📊 Multiple Formats - HTML, JSON, Markdown, CSV, and console output for any workflow
🛠️ Developer Friendly - Intuitive CLI and TypeScript API with comprehensive documentation

Current Version: v1.0 - Full-featured analytics with daily trends, contributions calendar, and complete transparency

🎨 Live Demo & Examples

📊 View Live Interactive Demo →

Experience the full Git Spark analytics dashboard with real data:

Interactive charts and visualizations
GitHub-style contributions calendar
Detailed author profiles and metrics
Dark mode toggle
Real-time data export

Sample reports generated from open-source repositories to showcase all features

✨ Features

📊 Comprehensive Analytics

Repository Health Scoring - Overall health assessment with actionable recommendations
Daily Activity Trends - Comprehensive day-by-day analysis including all days in the specified range (not just active days)
GitHub-style Contributions Calendar - Visual activity heatmap with GitHub-style color coding and intensity levels
Team Organization Analysis - File ownership patterns, developer specialization, and team structure
Code Quality Metrics - Risk assessment, hotspot identification, and governance scoring
Timeline Visualization - Activity patterns and trend analysis
Bus Factor Calculation - Knowledge concentration and team resilience metrics

🔍 Analytical Integrity Promise: All metrics are transparently calculated from Git repository data only. We clearly document what each metric measures, its limitations, and what it cannot determine. No guesswork, no extrapolation beyond available data.

🎯 Enterprise-Ready

High Performance - Process 100k+ commits efficiently with memory optimization
Multiple Output Formats - HTML, JSON, Markdown, CSV, and console output
Configurable Analysis - Customizable thresholds, weights, and filtering options
Security Focused - Input validation, safe file handling, and email redaction
CI/CD Integration - JSON output for automated analysis and reporting
Email Privacy - Optional email redaction via --redact-emails flag

🛠️ Developer Experience

CLI Interface - Intuitive command-line tool with progress indicators and multiple commands
Programmatic API - TypeScript/JavaScript library for custom integrations
Interactive Reports - Rich, security‑hardened HTML reports with advanced visualizations
Comprehensive Documentation - Examples, tutorials, and best practices
Live Development Server - Built-in HTTP server for local report viewing

🖥️ Interactive HTML Report (v1.0)

Enterprise-focused, accessible, and secure analytics dashboard:

Multi‑Series Timeline – Commits, churn (lines changed), and active authors with dataset toggles
Daily Activity Trends – Comprehensive daily analysis showing all days in the specified range with activity summaries
GitHub-style Contributions Calendar – Visual activity heatmap with color-coded intensity levels and interactive tooltips
Risk Factors Bar Chart – Visual breakdown of churn, recency, ownership, coupling potential, and knowledge concentration inputs
Governance Radar Chart – Conventional commit adherence, traceability, message quality, WIP/revert penalties
Dark Mode Toggle (Persistent) – Remembers preference via localStorage; charts dynamically re-theme
One‑Click Data Export – Download embedded JSON or CSV bundles directly from the report (offline capable)
Progressive Table Pagination – “Show more” incremental reveal for large author/file sets (performance friendly)
Dataset Toggles & Live Updates – Enable/disable series without reloading page
Open Graph Preview Image – Auto‑generated SVG summary for social sharing & link unfurling
Accessibility Enhancements – ARIA live region announcements for sorting; keyboard focus management; reduced‑motion compliance
Security‑First Delivery – Strict CSP with SHA‑256 hashed inline script & style blocks (no unsafe-inline), SRI‑pinned Chart.js, single trusted CDN origin
Email Redaction Option – Controlled via CLI flag (--redact-emails) for privacy sensitive audits
Transparent Metrics Documentation – Every team metric includes comprehensive limitations and data source explanations

Analytical Integrity: All analytics data are embedded (no external calls) ensuring the report is a self-contained artifact suitable for air‑gapped review workflows. Every metric includes honest explanations of what Git data can and cannot reveal.

🚀 Quick Start

Installation

# Global installation for CLI usage
npm install -g git-spark

# Local installation for programmatic usage
npm install git-spark

Basic Usage

# Analyze current repository (last 30 days)
git-spark --days=30

# Generate HTML report with built-in command
git-spark html --days=30 --output=./reports

# Serve HTML report with live HTTP server
git-spark html --days=30 --serve --port=3000

# Open HTML report in browser automatically
git-spark html --days=30 --open

# Analyze specific date range
git-spark --since=2024-01-01 --until=2024-12-31 --format=html

# Quick health check
git-spark health

# Validate environment and Git repository
git-spark validate

# Enable heavy analysis for detailed insights  
git-spark --heavy --format=html --output=./reports

# Analyze with comprehensive daily trends (shows all days, not just active days)
git-spark --days=60 --format=html --output=./reports

# Generate report with email redaction for privacy
git-spark --days=30 --format=html --redact-emails

Programmatic Usage

import { GitSpark, analyze } from 'git-spark';

// Quick analysis
const report = await analyze('/path/to/repo', { days: 30 });

// Advanced usage with options
const gitSpark = new GitSpark({
  repoPath: '/path/to/repo',
  since: '2024-01-01',
  format: 'html',
  output: './reports'
});

const report = await gitSpark.analyze();
await gitSpark.export('html', './reports');

📖 Documentation

CLI Commands

`git-spark [options]`

Main analysis command with comprehensive options:

git-spark [options]

Options:
  -d, --days <number>        analyze last N days
  -s, --since <date>         start date (YYYY-MM-DD)
  -u, --until <date>         end date (YYYY-MM-DD)
  -f, --format <format>      output format (html|json|console|markdown|csv)
  -o, --output <path>        output directory (default: "./reports")
  -c, --config <path>        configuration file
  -b, --branch <name>        analyze specific branch
  -a, --author <name>        filter by author
  -p, --path <glob>          filter by file path pattern
  --heavy                    enable expensive analyses
  --log-level <level>        logging verbosity (error|warn|info|debug|verbose)
  --no-cache                 disable caching
  --redact-emails            redact email addresses in reports
  -h, --help                 display help for command

`git-spark analyze`

Detailed analysis with additional options:

git-spark analyze [options]

Options:
  -r, --repo <path>          repository path (default: current directory)
  [all main command options]

`git-spark health`

Quick repository health assessment:

git-spark health [options]

Options:
  -r, --repo <path>          repository path (default: current directory)

`git-spark html`

Generate comprehensive HTML report with additional options:

git-spark html [options]

Options:
  -r, --repo <path>          repository path (default: current directory)
  -d, --days <number>        analyze last N days
  -s, --since <date>         start date (YYYY-MM-DD)
  -u, --until <date>         end date (YYYY-MM-DD)
  -o, --output <path>        output directory (default: "./reports")
  -b, --branch <name>        analyze specific branch
  -a, --author <name>        filter by author
  -p, --path <glob>          filter by file path pattern
  --open                     open HTML report in browser after generation
  --serve                    start HTTP server to serve the report
  --port <number>            port for HTTP server (default: 3000)
  --heavy                    enable expensive analyses for detailed insights

Examples:

# Generate HTML report for last 30 days
git-spark html --days=30

# Generate and open in browser
git-spark html --days=30 --open

# Generate and serve on local web server
git-spark html --days=60 --serve --port=8080

# Heavy analysis with detailed insights
git-spark html --days=90 --heavy --output=./detailed-reports

`git-spark validate`

Environment and requirements validation:

git-spark validate

Daily Trends Analysis Examples

# Analyze last 7 days with comprehensive daily trends
git-spark --days=7 --format=html

# Extended 60-day analysis with contributions calendar
git-spark --days=60 --format=html --output=./reports

# Generate JSON with complete daily trends data for external processing
git-spark --days=30 --format=json --output=./data

# Heavy analysis with all features including detailed daily patterns
git-spark --days=30 --heavy --format=html

Configuration

Create a .git-spark.json configuration file to customize analysis:

Note: Configuration file support is available for basic analysis options. The configuration system supports custom thresholds, weights, and exclusion patterns for fine-tuned analysis.

{
  "version": "1.0",
  "analysis": {
    "excludePaths": [
      "node_modules/**",
      "dist/**",
      "build/**",
      ".git/**"
    ],
    "excludeAuthors": [
      "dependabot[bot]",
      "github-actions[bot]"
    ],
    "thresholds": {
      "largeCommitLines": 500,
      "smallCommitLines": 50,
      "staleBranchDays": 30,
      "largeFileKB": 300,
      "hotspotAuthorThreshold": 3
    },
    "weights": {
      "risk": {
        "churn": 0.35,
        "recency": 0.25,
        "ownership": 0.20,
        "entropy": 0.10,
        "coupling": 0.10
      }
    }
  },
  "output": {
    "defaultFormat": "html",
    "outputDir": "./reports",
    "redactEmails": false
  },
  "performance": {
    "maxBuffer": 200,
    "enableCaching": true,
    "cacheDir": ".git-spark-cache",
    "chunkSize": 1000
  }
}

API Reference

`GitSpark` Class

class GitSpark {
  constructor(options: GitSparkOptions, progressCallback?: ProgressCallback)
  
  async analyze(): Promise<AnalysisReport>
  async export(format: OutputFormat, outputPath: string): Promise<void>
  
  static getDefaultConfig(): GitSparkConfig
}

`GitSparkOptions` Interface

interface GitSparkOptions {
  repoPath?: string;           // Repository path
  since?: string;              // Start date (YYYY-MM-DD)
  until?: string;              // End date (YYYY-MM-DD)
  days?: number;               // Last N days
  branch?: string;             // Specific branch
  author?: string;             // Author filter
  path?: string;               // Path filter
  format?: OutputFormat;       // Output format
  output?: string;             // Output directory
  config?: string;             // Config file path
  heavy?: boolean;             // Enable expensive analyses
  logLevel?: LogLevel;         // Log level
  noCache?: boolean;           // Disable caching
}

Quick Functions

// Quick analysis function
async function analyze(
  repoPath?: string, 
  options?: Partial<GitSparkOptions>
): Promise<AnalysisReport>

// Quick export function
async function exportReport(
  report: AnalysisReport,
  format: OutputFormat,
  outputPath: string
): Promise<void>

� Analytical Integrity & Data Limitations

Our Commitment to Honest Analytics

Git Spark is built on a foundation of complete transparency about what can and cannot be determined from Git repository data alone. We never guess, estimate, or fabricate metrics from unavailable data sources.

What Git Data CAN Provide

✅ Commit metadata: Author, committer, timestamp, message
✅ File changes: Additions, deletions, modifications
✅ Branch and merge history: Repository structure and workflow patterns
✅ Temporal patterns: When changes occurred based on commit timing
✅ Contribution patterns: Who worked on what files and when

What Git Data CANNOT Provide

❌ Code review data: No reviewer information, approval status, or review comments
❌ Pull/merge request metadata: No PR numbers, descriptions, or review workflows
❌ Issue tracking: No bug reports, feature requests, or issue relationships
❌ Deployment information: No production deployments, rollbacks, or environment data
❌ Team structure: No organizational hierarchy, roles, or responsibilities
❌ Work hours/timezone: No actual working hours, vacation schedules, or availability
❌ Performance metrics: No build times, test results, or runtime performance

Our Honest Metric Approach

Transparent Naming: Metric names clearly indicate data source limitations
Comprehensive Documentation: Every metric includes limitation warnings
Platform Detection: We identify hosting platforms but acknowledge Git data is fundamentally the same
Educational Focus: We help users understand what metrics do and don't measure
No False Claims: We never imply Git data provides complete team performance insights

User Education & Responsible Usage

All team-related metrics include detailed explanations of:

What the metric actually measures from Git data
Known limitations and potential misinterpretations
Recommended approaches for supplementing Git analytics
Warnings against using metrics for performance reviews without context

�📊 Report Formats

HTML Reports

Interactive reports with transparent analytics and comprehensive limitations documentation:

Executive Summary - Health rating with activity index breakdown and clear data source explanations
Limitations Section - Comprehensive documentation of what Git data can and cannot reveal (positioned before detailed metrics)
Top Contributors - Author metrics table with detailed activity patterns
Team Activity Patterns - Aggregate repository metrics showing overall activity distribution
File Activity Hotspots - Source code files with highest activity (filtered for relevant code files)
Author Activity Details - Detailed profile cards for each contributor with commit patterns, file focus, and insights
Daily Activity Trends - Comprehensive day-by-day analysis with GitHub-style contributions calendar (optional)
Calculation Documentation - Transparent methodology for all metrics including formulas and measurement principles
Report Metadata - Generation details, Git branch information, and processing statistics
Interactive visualizations with dark mode support
Progressive table pagination for performance
Sortable columns with accessibility features
Export capabilities for downstream analysis
CSP/SRI security hardening

Transparency First: Every metric in the HTML report includes clear explanations of what it measures, its data sources, and its limitations. The limitations section is prominently positioned before detailed calculations to ensure users understand data constraints upfront.

JSON Reports

Machine-readable format for:

CI/CD integration
Custom tooling integration
Data processing and analysis
API consumption

Console Output

Terminal-friendly format with:

Color-coded health indicators
Tabular data presentation
Progress indicators
Quick insights and recommendations

Markdown Reports

Documentation-friendly format for:

README integration
Wiki documentation
Version control tracking
Collaboration and sharing

CSV Exports

Spreadsheet-compatible format with separate files for:

authors.csv - Author statistics and metrics
files.csv - File-level analysis and risk scores
timeline.csv - Daily activity and trends (includes all days in analysis period)

🔍 Analysis Details

Repository Health Score

Composite metric based on:

Commit Frequency - Regular development activity
Author Diversity - Distributed knowledge and contributions
Commit Size Distribution - Balanced change patterns
Governance Adherence - Code quality and standards compliance

Daily Activity Trends

Comprehensive daily analysis providing:

Complete Date Range Coverage - Shows all days in the specified period, including days with zero activity
Activity Metrics - Commits, authors, file changes, and code volume per day
GitHub-style Contributions Calendar - Visual heatmap with intensity levels (0-4) matching GitHub's color scheme
Interactive Tooltips - Hover to see exact commit counts and dates
Week-based Organization - Calendar view organized by weeks for easy pattern recognition
JSON Export Support - All daily trends data available for external processing and analysis

Enhanced Coverage: Unlike traditional analytics that only show active days, Git Spark's daily trends include every day in your analysis period, providing complete visibility into work patterns and identifying both active and quiet periods.

Risk Analysis

File-level risk assessment (activity scoring) considering:

Code Churn - Frequency and volume of changes
Author Count - Number of different contributors
Recency - How recently files were modified
Ownership Distribution - Knowledge concentration across files

Risk metrics help identify files that may need attention due to high activity levels, but do not indicate code quality or defect likelihood.

Team Analytics

Team organization and specialization insights covering:

Developer Specialization - Measures how unique each developer's file set is compared to others, promoting clear areas of responsibility
File Ownership Clarity - Percentage of files with single-author ownership, indicating clear responsibility boundaries
Organization Efficiency - Low file overlap between developers, suggesting better task distribution and reduced conflicts
Commit Time Patterns - Work timing analysis based on commit timestamps (not actual working hours)
Team Active Coverage - Days with multiple contributors (estimated pattern, not actual vacation coverage)

⚠️ Important Approach: The Team Organization Score measures specialization and clear ownership rather than traditional collaboration. High scores indicate well-organized teams with distinct areas of responsibility, which typically reduces conflicts and improves efficiency. Very high scores may sometimes indicate knowledge silos, while very low scores suggest unclear ownership or coordination issues. All metrics include comprehensive limitation documentation to prevent misinterpretation.

🏗️ CI/CD Integration

GitHub Actions

name: Repository Analysis
on: [push, pull_request]

jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Full history for analysis
      
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      
      - name: Install git-spark
        run: npm install -g git-spark
      
      - name: Run analysis
        run: git-spark --format=json --output=./reports
      
      - name: Upload reports
        uses: actions/upload-artifact@v3
        with:
          name: git-spark-reports
          path: ./reports/

GitLab CI

repository_analysis:
  stage: analysis
  image: node:18
  script:
    - npm install -g git-spark
    - git-spark --format=json --output=./reports
  artifacts:
    paths:
      - reports/
    expire_in: 1 week
  only:
    - main
    - develop

🔒 Security Considerations

Git Spark is designed with security in mind:

Input Validation - All user inputs are validated and sanitized
Path Traversal Protection - Safe file path handling
Email Redaction - Optional email address anonymization
Buffer Limits - Configurable limits to prevent DoS attacks
No Arbitrary Execution - Git commands are parameterized and safe
Dependency Security - Minimal dependencies with security auditing
Strict Content Security Policy - Inline script & style blocks hashed (SHA‑256); no unsafe-inline or dynamic eval
Subresource Integrity (SRI) - External Chart.js resource locked to integrity hash
Single External Origin - Minimizes supply chain surface
Escaped Dynamic Content - All user / repo derived strings safely encoded in HTML output

🎯 Performance

Optimized for large repositories:

Streaming Processing - Handle massive repositories without memory issues
Intelligent Caching - Avoid redundant Git operations
Chunked Analysis - Process commits in configurable batches
Memory Management - Efficient data structures and garbage collection
Progress Tracking - Real-time progress indicators for long operations

Benchmarks:

10k commits: ~10 seconds
100k commits: ~2 minutes
Memory usage: <500MB for 100k commits

🧪 Testing

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Run specific test suite
npm test -- --testNamePattern="GitSpark"

# Run integration tests
npm run test:integration

🤝 Contributing

We welcome contributions! Please see our GitHub Issues to get started or open a new issue to discuss your ideas.

Development Setup

# Clone repository
git clone https://github.com/MarkHazleton/git-spark.git
cd git-spark

# Install dependencies
npm install

# Build TypeScript
npm run build

# Run tests
npm test

# Start development mode
npm run dev

Code Standards

TypeScript - Full type safety and modern JavaScript features
ESLint + Prettier - Consistent code formatting and quality
Jest Testing - Comprehensive test coverage (>80%)
Semantic Versioning - Clear version management
Conventional Commits - Structured commit messages

📋 Roadmap

✅ Completed (v1.0)

Core Analytics Engine - Comprehensive Git repository analysis
Multiple Output Formats - HTML, JSON, Markdown, CSV, and console formats
Transparent Team Metrics - Honest metric terminology with comprehensive limitations documentation
Analytical Integrity Framework - Clear separation between what Git data can and cannot provide
Enhanced User Education - Comprehensive warnings and guidance about metric interpretation
Daily Activity Trends - Comprehensive daily analysis showing all days in specified range (including zero-activity days)
GitHub-style Contributions Calendar - Interactive activity heatmap with color-coded intensity levels and tooltips
Activity Index Calculation - Transparent breakdown of commit frequency, author participation, and consistency components
Author Profile Cards - Detailed individual contributor analysis with commit patterns and file focus
Secure HTML Reports - Strict CSP + SHA-256 hashed inline content (no unsafe-inline)
Dark Mode - Persistent theme preference with adaptive styling
Progressive Tables - Pagination & performance safeguards for large datasets
Sortable Data Tables - Column sorting with ARIA live announcements
Accessibility Features - ARIA live regions, keyboard navigation, reduced motion support
OG Image Generation - Auto-generated SVG summaries for social/link previews
Email Redaction - Privacy-focused email anonymization option
CLI Commands - Main analysis, health check, validation, and dedicated HTML report generation
HTTP Server - Built-in web server for local report viewing (--serve option)
Auto-Open Browser - Automatic browser launch after report generation (--open option)

These capabilities establish a foundation of analytical honesty and transparency that guides all development.

v1.1 (Planned)

Branch comparison and diff analysis (--compare option)
Continuous monitoring mode (--watch option)
Historical trend analysis and forecasting
Advanced temporal coupling analysis
Custom risk scoring models

v1.2 (Future)

API server mode for remote analysis
Machine learning-based anomaly detection
Integration with code quality tools (SonarQube, CodeClimate)
Real-time monitoring and alerting
Multi-repository analysis and benchmarking
Advanced visualization with D3.js

v2.0 (Vision)

Web dashboard and UI
Database persistence (SQLite/PostgreSQL)
User authentication and authorization
Team management and permissions
Webhook integrations and notifications

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Git Community - For the powerful version control system
Open Source Contributors - For the excellent libraries and tools
Enterprise Users - For feedback and requirements validation
TypeScript Team - For the robust type system

📞 Support

Documentation: GitHub Wiki
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: mark@markhazleton.com

👨‍💻 About the Author

Mark Hazleton is a passionate software architect and developer with decades of experience building enterprise-scale solutions. As the creator of the WebSpark family of tools and frameworks, Mark is committed to empowering developers with practical, honest, and high-quality open-source solutions.

🌐 Connect with Mark

Website: markhazleton.com - Portfolio, blog, and technical articles
GitHub: @MarkHazleton - Open source projects and contributions
LinkedIn: Mark Hazleton - Professional network
Email: mark@markhazleton.com - Direct contact

💼 Professional Focus

Mark specializes in:

Enterprise application architecture and design
Open-source tooling and developer productivity
Code quality, analytics, and automation
Mentoring and knowledge sharing in the developer community

⚡ The WebSpark Family

Git Spark is part of the WebSpark ecosystem - a growing family of tools, frameworks, and demonstrations designed to solve real-world development challenges with elegance and precision.

🎯 Other WebSpark Projects

WebSpark Demos - Interactive demonstrations of modern web technologies and architectural patterns
WebSpark Tools - Productivity utilities for developers and teams
WebSpark Frameworks - Reusable components and libraries for enterprise applications

🚀 The WebSpark Philosophy

The WebSpark family is built on core principles:

✨ Practical Excellence - Tools that solve real problems elegantly
🔍 Transparency First - Honest about capabilities and limitations
🎓 Education Focused - Empowering developers with knowledge
🤝 Community Driven - Open source and collaborative
🏢 Enterprise Ready - Production-grade quality and reliability

🌟 Explore More

Visit markhazleton.com to explore the full WebSpark ecosystem and discover tools that can transform your development workflow.

Built with ❤️ and unwavering commitment to analytical honesty for the developer community

Git Spark prioritizes transparency, accuracy, and user education above all else. We believe developers deserve honest, reliable analytics that clearly communicate both capabilities and limitations.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
.husky		.husky
.vscode		.vscode
bin		bin
copilot		copilot
docs		docs
scripts		scripts
src		src
test-charts		test-charts
test-contributions-graph		test-contributions-graph
test-final-json		test-final-json
test-output-html		test-output-html
test		test
.git-spark.json		.git-spark.json
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
eslint.config.cjs		eslint.config.cjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

markhazleton/git-spark

Folders and files

Latest commit

History

Repository files navigation