A comprehensive Git analytics application featuring real-time GitHub integration, machine learning-powered merge conflict prediction, and ROI calculation for quantifying engineering bottlenecks.
Personal Project | June 2025 – Present
GitFlow AI Analytics Platform is a full-stack application that helps development teams predict merge conflicts before they happen, identify engineering bottlenecks, and quantify the ROI of their development processes.
- Real-time GitHub integration via Chrome extension for repository activity and pull request data capture
- Machine learning pipeline using Gradient Boosting for merge conflict prediction
- ROI calculation engine for quantifying engineering bottlenecks and cost analysis
- Analytics dashboard with visualizations of key metrics and trends
- Automated CI/CD pipeline with GitHub Actions for testing, building, and deployment
- Production monitoring stack with Prometheus, Grafana, and Loki
Frontend
- React 18 with TypeScript
- Vite for build tooling
- TailwindCSS for styling
- Recharts for data visualization
- React Query for state management
Backend
- Node.js with Express
- TypeScript
- PostgreSQL for data persistence
- Redis for caching
- OAuth 2.0 (GitHub) for authentication
ML Pipeline
- Python 3.11
- scikit-learn, XGBoost, LightGBM
- Flask API server
- Gradient Boosting for conflict prediction
Infrastructure
- Docker & Docker Compose
- GitHub Actions for CI/CD
- Prometheus & Grafana for monitoring
Chrome Extension
- Manifest V3
- Real-time GitHub page integration
- Background service worker for API communication
gitflow-ai-platform/
├── backend/                 # Node.js/Express API server
│   ├── src/
│   │   ├── controllers/    # Request handlers
│   │   ├── services/       # Business logic (ROI calculator, analytics)
│   │   ├── models/         # Database models
│   │   ├── routes/         # API routes
│   │   ├── middleware/     # Auth, error handling, rate limiting
│   │   └── utils/          # Logger, monitoring
│   ├── database/           # SQL schemas and migrations
│   └── tests/
│
├── frontend/               # React SPA
│   ├── src/
│   │   ├── components/    # Reusable UI components
│   │   ├── pages/         # Dashboard, Analytics, Repositories
│   │   ├── services/      # API client
│   │   └── hooks/         # Custom React hooks
│   └── public/
│
├── ml-pipeline/           # Python ML service
│   ├── api/              # Flask API server
│   ├── models/           # ML models (conflict predictor)
│   ├── training/         # Model training scripts
│   ├── utils/            # Feature extraction
│   └── data/             # Training data and cache
│
├── chrome-extension/      # GitHub integration extension
│   ├── src/
│   │   ├── background.js # Service worker
│   │   ├── content.js    # GitHub page injection
│   │   └── content.css   # Extension styling
│   └── public/
│
├── .github/
│   └── workflows/        # CI/CD pipelines
│       ├── ci.yml        # Tests and builds
│       └── deploy.yml    # Deployment
│
├── monitoring/           # Observability stack
│   ├── prometheus.yml
│   └── grafana/
│
└── docker-compose.yml    # Multi-container orchestration
- Node.js 18+
- Python 3.11+
- Docker & Docker Compose
- PostgreSQL 15+ (or use Docker)
- GitHub OAuth App credentials
- Clone the repository
git clone https://github.com/yourusername/gitflow-ai-platform.git
cd gitflow-ai-platform- Set up environment variables
cp .env.example .env
# Edit .env with your GitHub OAuth credentials and other settings- Install all dependencies
npm run install:all- Start with Docker (Recommended)
docker-compose up -dOr start services individually:
# Terminal 1 - Backend
cd backend && npm run dev
# Terminal 2 - Frontend
cd frontend && npm run dev
# Terminal 3 - ML Pipeline
cd ml-pipeline && python api/server.py- Initialize the database
npm run db:migrate
npm run db:seed  # Optional: seed with sample data- Install Chrome Extension
- Open Chrome and go to chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked"
- Select the chrome-extensiondirectory
- Frontend: http://localhost:5173
- Backend API: http://localhost:3001
- ML API: http://localhost:5000
- Prometheus: http://localhost:9090 (if monitoring enabled)
- Grafana: http://localhost:3000 (admin/admin)
The ML pipeline implements a Gradient Boosting classifier trained on pull request features to predict merge conflicts. The model analyzes:
- File change patterns (number and type of files modified)
- Code churn metrics (lines added/deleted)
- Branch characteristics (age, divergence from base)
- Historical conflict data in similar contexts
- Developer experience and contribution history
- Temporal activity patterns
Implementation Details:
The model architecture uses scikit-learn's Gradient Boosting with 10 engineered features extracted from pull request data. For demonstration purposes, the system uses heuristic-based predictions when historical training data is not available. In a production environment, the model would be trained on actual repository commit and merge history from your organization's GitHub data.
Target model performance metrics:
- Accuracy: 87.3%
- Precision: 81.2%
- Recall: 85.4%
- AUC-ROC: 0.891
API Endpoint:
POST /api/ml/predict-conflicts
{
  "owner": "username",
  "repo": "repository",
  "prNumber": 123,
  "filesChanged": 15,
  "additions": 450,
  "deletions": 120
}Quantifies engineering bottlenecks and calculates cost savings:
Metrics Analyzed:
- Time saved from avoided merge conflicts
- Cost savings (developer hourly rate × time saved)
- CI/CD efficiency improvements
- Code review time optimization
- Deployment frequency impact
Bottleneck Detection:
- Long PR review times
- Frequent merge conflicts
- Slow CI/CD pipelines
- Delayed deployments
Implementation: backend/src/services/roiCalculator.ts
The Chrome extension integrates with GitHub's web interface to capture pull request data:
- Detects pull request pages and extracts metadata (files changed, additions, deletions)
- Communicates with the backend API via background service worker
- Displays conflict risk predictions directly on GitHub PR pages
- Provides color-coded visual indicators for risk levels (high/medium/low)
GitHub Actions workflows for automated testing and deployment:
CI Pipeline (.github/workflows/ci.yml):
- Backend tests with PostgreSQL and Redis service containers
- Frontend tests and ESLint validation
- ML pipeline tests using pytest
- Docker image build verification for all services
- Security vulnerability scanning with Trivy
Deployment Pipeline (.github/workflows/deploy.yml):
- Automated Docker image builds and pushes
- Multi-stage deployments with version tagging
- Production deployment on main branch merges
Observability infrastructure for metrics and logs:
- Prometheus for metrics collection and alerting
- Grafana for dashboard visualization
- Loki for centralized log aggregation
- Promtail for log shipping from application services
- Custom application metrics for request tracking, error rates, and performance monitoring
# All tests
npm test
# Backend tests
npm run test:backend
# Frontend tests
npm run test:frontend
# ML pipeline tests
npm run test:ml# All projects
npm run lint
# Backend only
npm run lint:backend
# Frontend only
npm run lint:frontend# Build all
npm run build
# Build backend
npm run build:backend
# Build frontend
npm run build:frontend
# Build Chrome extension
npm run build:extension# Run migrations
npm run db:migrate
# Seed database
npm run db:seed
# Reset database
npm run db:resetAll API requests require authentication via GitHub OAuth:
GET /api/auth/github/login
GET /api/auth/github/callback
POST /api/auth/logoutGET    /api/repositories              # List user repositories
GET    /api/repositories/:id          # Get repository details
POST   /api/repositories/:id/sync     # Sync repository dataGET    /api/analytics/dashboard/:id          # Get dashboard data
POST   /api/analytics/calculate-roi          # Calculate ROI
GET    /api/analytics/bottlenecks/:id        # Identify bottlenecksPOST   /api/ml/predict-conflicts      # Predict merge conflicts
GET    /api/ml/model-metrics          # Get model performance
POST   /api/ml/model/retrain          # Trigger retraining# Production deployment
docker-compose up -d
# With monitoring
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml up -dRequired environment variables (see .env.example):
- GITHUB_CLIENT_ID: GitHub OAuth app client ID
- GITHUB_CLIENT_SECRET: GitHub OAuth app secret
- DATABASE_URL: PostgreSQL connection string
- REDIS_URL: Redis connection string
- JWT_SECRET: Secret for JWT token signing
- Caching: Redis caching for frequently accessed data
- Query Optimization: Indexed database queries
- Connection Pooling: PostgreSQL connection pooling
- Code Splitting: Frontend lazy loading
- CDN Integration: Static asset delivery
- Rate Limiting: API request throttling
- OAuth 2.0: Secure GitHub authentication
- JWT Tokens: Stateless authentication
- Helmet.js: HTTP header security
- Rate Limiting: DDoS protection
- Input Validation: SQL injection prevention
- CORS: Cross-origin request control
- Security Scanning: Automated vulnerability detection
This is a personal portfolio project. For questions or suggestions, please open an issue.
MIT License - see LICENSE file for details
Built with: React, TypeScript, Node.js, PostgreSQL, Python, scikit-learn, Docker
Author: Felipe Sanchez
Developed: June 2025 – Present