Skip to content

divyanshu12-fullstack/github-analyzer

Repository files navigation

Repolens

Evaluate GitHub repositories like a senior engineer — with deterministic, explainable scoring.

Next.js TypeScript Tailwind CSS Jest License: MIT

Features · Architecture · Getting Started · API Reference · Scoring Methodology · Tech Stack · Contributing


Overview

Repolens (previously GitHub Project Analyzer) is a production-grade web application that evaluates GitHub repositories using recruiter-inspired engineering metrics. It produces a deterministic, explainable 0–100 score with a hiring confidence level, risk flags, and optionally AI-enhanced improvement suggestions via Google Gemini.

It simulates how a technical recruiter or senior engineer evaluates a candidate's GitHub project — not by looking at stars or commit count alone, but by analyzing:

  • README quality — Is the project well-documented? Are there installation instructions, usage examples, architecture sections?
  • Commit discipline — Are commits spread over time or crammed in one day? Are messages meaningful?
  • Tech stack maturity — Does the project use testing, linting, TypeScript, CI/CD?
  • Architectural complexity — Is the codebase well-structured with separation of concerns?

What this is NOT

  • A GitHub stats viewer (stars, forks, contributors)
  • A simple API wrapper around GitHub's REST API
  • An AI-generated score — AI is used only for enhancing suggestion text, never for scoring

Features

  • Deterministic scoring — Same repository always produces the same score. No randomness, no AI in the scoring pipeline.
  • Explainability — Every point earned or lost traces back to a specific metric with a specific threshold.
  • Actionable output — Reports don't just say "your README is bad" — they specify exactly what sections are missing and what to add.
  • 4 independent analyzers — README, Commits, Tech Stack, and Complexity — each producing structured findings, strengths, and suggestions.
  • Hiring confidence levels — Low (0–40), Moderate (41–70), Strong (71–100) with detailed reasoning.
  • Risk flag detection — Critical, warning, and info-level flags surfaced from analyzer metadata.
  • AI-enhanced suggestions (optional) — Google Gemini rewrites raw suggestions into professional, prioritized improvement roadmaps.
  • Full-stack — API routes + interactive dashboard with charts, animations, and JSON export.
  • Rate-limit aware — GitHub API client with exponential backoff, retry logic, and rate limit tracking.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        Frontend (Next.js App Router)                    │
│                                                                         │
│  page.tsx ───────────────────────────────────────────────────────────── │
│  │  SearchBar → POST /api/analyze → Display Results                     │
│  │                                                                      │
│  │  ScoreOverview   RadarChart   CommitTimeline                         │
│  │  CategoryBreakdown   TechStackBadges   ImprovementRoadmap            │
│  │  ExportButton                                                        │
└─────────────┬───────────────────────────────────────────────────────────┘
              │ HTTP POST /api/analyze
              ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     API Layer (app/api/analyze/route.ts)                │
│                                                                         │
│  1. Parse & validate request body (Zod)                                 │
│  2. Call orchestrator                                                   │
│  3. Return AnalysisReport or structured error                           │
└─────────────┬───────────────────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Orchestrator (lib/services/orchestrator.ts)         │
│                                                                         │
│  Stage 1: Resolve repo identifier (URL or owner/repo)                   │
│  Stage 2: Fetch all repo data from GitHub API (parallel)                │
│  Stage 3: Run all 4 analyzers (synchronous, pure functions)             │
│  Stage 4: Compute weighted score + hiring confidence + risk flags       │
│  Stage 5: Optionally enhance with Gemini AI                             │
│  Stage 6: Assemble and return AnalysisReport                            │
└──────┬──────────────┬───────────────┬───────────────┬───────────────────┘
       │              │               │               │
       ▼              ▼               ▼               ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌─────────────────┐
│   GitHub   │ │  Analyzer  │ │  Scoring   │ │   AI Layer      │
│  Service   │ │  Engine    │ │  Engine    │ │   (Optional)    │
│            │ │            │ │            │ │                 │
│ client.ts  │ │ readme     │ │ engine.ts  │ │ geminiClient.ts │
│ repoSvc.ts │ │ commits    │ │ normalizer │ │ enhancer.ts     │
│ types.ts   │ │ techStack  │ │            │ │                 │
│            │ │ complexity │ │ Weights:   │ │ Never affects   │
│ Axios +    │ │            │ │ R:25%      │ │ scoring.        │
│ retry +    │ │ All pure   │ │ C:20%      │ │ Only rewrites   │
│ rate limit │ │ functions  │ │ T:30%      │ │ suggestions.    │
│            │ │            │ │ X:25%      │ │                 │
└────────────┘ └────────────┘ └────────────┘ └─────────────────┘

Data Flow

User enters GitHub URL
  → POST /api/analyze { repoUrl: "https://github.com/owner/repo" }
  → Zod validates input (repoUrl or owner+repo)
  → Orchestrator resolves repo identifier
  → 6 parallel GitHub API fetches (metadata, commits, tree, README, languages, package.json)
  → 4 analyzers run on fetched data (pure functions, no side effects)
  → Scoring engine normalizes + weights results → deterministic 0–100 score
  → (Optional) Gemini AI enhances suggestions into a professional roadmap
  → Final AnalysisReport returned as JSON

Getting Started

Prerequisites

  • Node.js >= 18.x
  • npm >= 9.x (or pnpm / yarn)
  • A GitHub Personal Access TokenGenerate one here with public_repo scope (or repo for private repos)
  • (Optional) A Google Gemini API KeyGet one here for AI-enhanced suggestions

Installation

# 1. Clone the repository
git clone https://github.com/divyanshu12-fullstack/github-analyzer.git
cd github-analyzer

# 2. Install dependencies
npm install

# 3. Set up environment variables
cp .env.example .env.local

Environment Variables

Edit .env.local with your keys:

# Required — GitHub Personal Access Token
GITHUB_TOKEN=ghp_your_token_here

# Optional — Google Gemini API key (enables AI suggestions)
GEMINI_API_KEY=your_gemini_key_here

# Optional — Override log level (trace | debug | info | warn | error | fatal)
LOG_LEVEL=debug
Variable Required Description
GITHUB_TOKEN Yes GitHub PAT with public_repo scope. Rate limit: 5,000 req/hr with token, 60/hr without.
GEMINI_API_KEY No Enables AI suggestion enhancement. App works fully without it (aiSuggestions: null).
LOG_LEVEL No Defaults to debug in dev, info in production.

Running Locally

# Development server (with hot reload)
npm run dev

# Production build
npm run build
npm start

Open http://localhost:3000 to use the dashboard.

Available Scripts

Script Command Description
npm run dev next dev Start development server with hot reload
npm run build next build Create optimized production build
npm start next start Start production server
npm run lint eslint . Run ESLint
npm run lint:fix eslint . --fix Auto-fix lint issues
npm run format prettier --write . Format all files with Prettier
npm run format:check prettier --check . Check formatting without writing
npm test jest Run test suite
npm run test:watch jest --watch Run tests in watch mode
npm run test:coverage jest --coverage Run tests with coverage report

API Reference

POST /api/analyze

Analyze a GitHub repository and return a full scoring report.

Request Body:

{ "repoUrl": "https://github.com/vercel/next.js" }

Or alternatively:

{ "owner": "vercel", "repo": "next.js" }

Success Response (200):

{
  "success": true,
  "data": {
    "repoName": "next.js",
    "repoUrl": "https://github.com/vercel/next.js",
    "repoDescription": "The React Framework",
    "repoPrimaryLanguage": "TypeScript",
    "repoStars": 128000,
    "totalScore": 62.1,
    "categoryScores": {
      "readme": { "normalizedScore": 78, "weight": 0.25, "weightedScore": 19.5, "raw": { "..." : "..." } },
      "commits": { "normalizedScore": 55, "weight": 0.20, "weightedScore": 11.0, "raw": { "..." : "..." } },
      "techStack": { "normalizedScore": 70, "weight": 0.30, "weightedScore": 21.0, "raw": { "..." : "..." } },
      "complexity": { "normalizedScore": 42, "weight": 0.25, "weightedScore": 10.6, "raw": { "..." : "..." } }
    },
    "hiringConfidence": { "level": "Moderate", "score": 62.1, "reasoning": "..." },
    "riskFlags": [ { "severity": "warning", "message": "...", "category": "commits" } ],
    "aiSuggestions": null,
    "analyzedAt": "2026-02-20T10:30:00.000Z",
    "analysisVersion": "1.0.0",
    "processingTimeMs": 4523
  }
}

Error Responses:

Status Condition
400 Validation failed (missing/invalid URL)
404 Repository not found
415 Wrong content type (non-JSON)
429 GitHub API rate limit exceeded (includes Retry-After header)
500 Unexpected server error

GET /api/health

Health check endpoint for monitoring and deployment verification.

Response (200):

{
  "status": "ok",
  "version": "1.0.0",
  "timestamp": "2026-02-20T10:30:00.000Z",
  "services": {
    "github": {
      "configured": true,
      "rateLimit": { "remaining": 4985, "limit": 5000, "resetsAt": "2026-02-20T11:00:00.000Z" }
    },
    "gemini": { "configured": true }
  }
}

Scoring Methodology

Category Weights

Category Weight Rationale
Tech Stack 30% Directly reflects engineering maturity — testing, TypeScript, CI/CD, linting
README 25% A good README signals the developer thinks about users and documentation
Complexity 25% Structural quality shows software engineering fundamentals
Commits 20% Important but slightly less since commit patterns can vary by workflow

Hiring Confidence Levels

Score Level Meaning
0–40 Low Project needs significant work before it demonstrates engineering quality
41–70 Moderate Decent fundamentals with clear room for improvement
71–100 Strong Production-quality engineering signals across most categories

Analyzer Breakdown

README Analyzer (max 100 pts)

Metric Points Description
Presence 5 README exists and is non-empty
Word count 20 Tiered: <100 (5), 100–299 (10), 300–799 (15), 800+ (20)
Required sections 15 5 pts each for Installation, Usage, Features
Bonus sections 12+ 3 pts each for Contributing, License, Deployment, Architecture, etc.
Code blocks 15 First block (10), 3+ blocks (+5)
Screenshots/demo 8 Image links or demo URL patterns
Badges 4 Build/coverage badge patterns
Architecture section 8 Heading match for architecture/design/structure

Commit Analyzer (max 100 pts)

Metric Points Description
Commit count 15 Tiered: 1–4 (3), 5–14 (7), 15–29 (10), 30+ (15)
Message length 20 Average characters: <10 (0), 10–29 (5), 30–49 (10), 50–72 (20), >72 (15)
Conventional commits 20 Ratio of prefixed messages (feat:, fix:, docs:, etc.)
Temporal spread 25 Unique active days: 1 (5), 2–3 (10), 4–6 (15), 7–13 (20), 14+ (25)
Penalty: Burst -10 >10 commits in a single day (sliding window)
Penalty: Duplicates -10 >20% identical commit messages
Penalty: Low-effort -5 >30% messages matching "update", "fix", "wip", etc.

Tech Stack Analyzer (max 100 pts)

Metric Points Description
Framework detection 15 React, Next.js, Express, Vue, Angular, Svelte, etc.
State management 10 Redux, Zustand, Jotai, MobX, Pinia, etc.
Testing 20 Jest, Vitest, Cypress, Playwright + test file detection
Linting 10 ESLint, Prettier + config file presence
TypeScript 15 typescript in deps or tsconfig.json in tree
Environment config 5 .env files in tree
CI/CD 10 .github/workflows, .gitlab-ci.yml, Jenkinsfile, etc.
Containerization 5 Dockerfile or docker-compose.yml

Outputs a maturity tier: Basic / Intermediate / Advanced

Complexity Analyzer (max 100 pts)

Metric Points Description
Folder depth 15 Max nesting: 1–2 (3), 3–4 (7), 5–6 (11), 7+ (15)
File count 10 Code files: <5 (2), 5–19 (5), 20–49 (7), 50+ (10)
Separation of concerns 20 Detects: models, utils, helpers, middleware, hooks, lib, services
API routes 10 Detects: api, routes, graphql, resolvers folders
Authentication 10 Detects: auth, login, signup, session folders
Error handling 10 Detects: error, errors, exception patterns
Config abstraction 10 Detects: config, constants, env, settings folders
Modularity bonus 15 Shannon entropy of file distribution across folders
Penalty: Anti-patterns -5 each (max 3) Files >500 lines detected in tree
Penalty: Monolith -15 Single file >40% of total codebase

Key Principle: Determinism

The scoring pipeline is 100% deterministic. Same repo data always produces the same score. AI is never involved in scoring — it only rewrites suggestion text and generates the improvement roadmap.


Tech Stack

Layer Technology Why
Framework Next.js 16 (App Router) Full-stack in one project, API routes + React UI
Language TypeScript 5 (strict) Type safety across the entire stack
Styling Tailwind CSS v4 Utility-first, fast iteration
GitHub API Axios Interceptors for rate-limit tracking, retry support
Validation Zod Runtime type safety on API boundaries, TS inference
Logging Pino + pino-pretty Fastest Node.js logger, structured JSON in prod
AI Google Gemini (@google/generative-ai) Suggestion enhancement only, generous free tier
Charts Recharts React-native, good radar chart support
Icons Lucide React Consistent, tree-shakeable icon set
Animation Framer Motion Declarative animations with mount/unmount support
Toasts Sonner Beautiful toast notifications, minimal setup
Testing Jest + ts-jest Full TypeScript support, standard testing stack

Project Structure

github-analyzer/
├── app/
│   ├── api/
│   │   ├── analyze/route.ts          # POST /api/analyze — main analysis endpoint
│   │   └── health/route.ts           # GET /api/health — status check
│   ├── globals.css                    # Global styles & design tokens
│   ├── layout.tsx                     # Root layout with metadata
│   └── page.tsx                       # Dashboard page
├── components/
│   ├── ui/                            # Reusable UI primitives
│   │   ├── Badge.tsx, Button.tsx, Card.tsx,
│   │   ├── Input.tsx, Progress.tsx, Skeleton.tsx
│   └── analyzer/                      # Dashboard-specific components
│       ├── SearchBar.tsx              # GitHub URL input + validation
│       ├── ScoreOverview.tsx          # Score gauge + hiring confidence
│       ├── CategoryBreakdown.tsx      # 4 category cards with details
│       ├── RadarChart.tsx             # Spider diagram of scores
│       ├── CommitTimeline.tsx         # Commit distribution bar chart
│       ├── TechStackBadges.tsx        # Detected tech as pill badges
│       ├── ImprovementRoadmap.tsx     # AI roadmap or raw suggestions
│       └── ExportButton.tsx           # Download analysis as JSON
├── lib/
│   ├── config/
│   │   ├── thresholds.ts             # All scoring weights & point allocations
│   │   └── github.ts                 # GitHub API config, URL regex, cache config
│   ├── services/
│   │   ├── github/
│   │   │   ├── client.ts             # Axios instance, retry, rate limit tracking
│   │   │   ├── repoService.ts        # 6 parallel data fetchers
│   │   │   └── types.ts              # Zod schemas for API responses
│   │   ├── analyzers/
│   │   │   ├── readmeAnalyzer.ts     # README quality (8 metrics, max 100)
│   │   │   ├── commitAnalyzer.ts     # Commit discipline (6 metrics + 3 penalties)
│   │   │   ├── techStackAnalyzer.ts  # Tech maturity (8 metrics + tiers)
│   │   │   └── complexityAnalyzer.ts # Structural complexity (10 metrics + 2 penalties)
│   │   ├── scoring/
│   │   │   ├── engine.ts             # Weighted scoring + confidence + risk flags
│   │   │   └── normalizer.ts         # Score normalization utilities
│   │   ├── ai/
│   │   │   ├── geminiClient.ts       # Gemini SDK wrapper with retry/timeout
│   │   │   └── enhancer.ts           # AI suggestion rewriting + roadmap
│   │   └── orchestrator.ts           # 6-stage analysis pipeline
│   └── utils/
│       ├── errors.ts                  # Custom error classes
│       └── logger.ts                  # Pino structured logging
├── types/
│   ├── github.ts                      # GitHub API TypeScript interfaces
│   └── analysis.ts                    # Analysis/scoring/report interfaces
├── __tests__/                         # Jest test suite
│   ├── analyzers/                     # Analyzer unit tests
│   ├── scoring/                       # Scoring engine tests
│   └── api/                           # API route tests
├── .env.example                       # Environment variable template
├── jest.config.ts                     # Jest + ts-jest configuration
├── tsconfig.json                      # Strict TypeScript + path aliases
├── next.config.ts                     # Next.js configuration
└── package.json                       # Dependencies + scripts

Testing

# Run all tests
npm test

# Watch mode
npm run test:watch

# With coverage report
npm run test:coverage

Tests cover all 4 analyzers, the scoring engine, and API routes with edge cases including empty inputs, maximum scores, penalty triggers, and boundary conditions.


Known Limitations

  1. Last 100 commits only — GitHub API pagination limit per request
  2. File tree may be truncated — Trees API caps at ~100K entries for very large repos
  3. README analysis is pattern-based — Heading keyword matching, not semantic NLP
  4. No runtime analysis — Cannot verify code compiles, runs, or passes tests
  5. Node.js/JS ecosystem focuspackage.json based detection; Python/Go/Rust deps not analyzed
  6. Single-repo only — No multi-repo portfolio scoring (designed for future extension)

Contributing

Contributions are welcome! To get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feat/my-feature
  3. Make changes and ensure tests pass: npm test
  4. Ensure code is formatted: npm run format
  5. Ensure linting passes: npm run lint
  6. Submit a pull request

Adding a New Analyzer

The architecture is designed for easy extensibility. To add a new analysis category:

  1. Create lib/services/analyzers/yourAnalyzer.ts implementing (input) => AnalyzerResult
  2. Add thresholds to lib/config/thresholds.ts
  3. Register in the orchestrator (lib/services/orchestrator.ts)
  4. Add weight to SCORING_WEIGHTS in thresholds config
  5. Add tests in __tests__/analyzers/

License

This project is open source and available under the MIT License.


Built with Next.js, TypeScript, and a passion for engineering quality.

About

Repolens is a production-grade web application that evaluates GitHub repositories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages