You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The country-based leaderboard can be a valuable feature for DevImpact. It can help users discover impactful open-source developers by region and make the project more useful for exploring public GitHub activity.
This discussion is about how we can build the leaderboard in a scalable, reliable, and production-ready way.
The main goal is to avoid calculating scores for many users during page load or inside a public API request. Leaderboard scoring can be expensive because each GitHub user may require multiple GitHub API calls for repositories, pull requests, issues, and discussions.
Main problem
A country leaderboard can contain many users.
If we calculate all scores live when someone opens a country page, this can cause several problems:
Too many GitHub API requests
GitHub rate-limit issues
GitHub GraphQL resource-limit issues
Slow page loads
Vercel/serverless timeout risk
Expensive repeated requests
Poor user experience
Possible abuse of public scoring endpoints
For example, if one country has 200 users and each user requires multiple GitHub API calls, one page visit could trigger hundreds or thousands of GitHub requests.
That is not safe for production.
Core principle
The leaderboard page should read cached results only.
It should not calculate scores live.
The score calculation should happen separately in a scheduled/background process.
Recommended flow:
Scheduled job / worker / GitHub Actions / VPS cron
→ fetch country users
→ calculate scores gradually
→ store results in cache/database
/api/leaderboard/[country]
→ receive only the country slug
→ read cached leaderboard data
→ return paginated results
/leaderboard/[country]
→ display cached leaderboard results
This keeps the UI fast, protects GitHub API limits, and gives us more control over score refreshes.
Proposed architecture
1. Country leaderboard page
Route:
/leaderboard/[country]
Responsibilities:
Validate the country slug
Fetch leaderboard data from /api/leaderboard/[country]
Show loading, empty, error, and cache-not-ready states
Display leaderboard rows
Support pagination
Show metadata such as lastUpdatedAt and scoreVersion
The page should not send a list of usernames to the scoring API.
The refresh job should continue scoring the remaining users.
Avoid replacing good cache with partial failed data
We should avoid overwriting a working leaderboard with incomplete results if a refresh job fails halfway.
Recommended flow:
1. Read current active leaderboard cache.
2. Build new results in a temporary cache key.
3. Complete scoring for the country.
4. Validate the final result.
5. Replace the active cache only if the refresh succeeds.
6. If the refresh fails, keep the old cache.
This avoids unnecessary work and gives cleaner behavior.
Duplicate usernames
Country source data may contain duplicate usernames.
Before scoring, usernames should be normalized:
trim
lowercase
remove duplicates
This avoids duplicated GitHub API calls and duplicated leaderboard rows.
Rate-limit and concurrency control
The scheduled job should avoid running too many GitHub requests at the same time.
Recommended protections:
Limit concurrency
Add delay between batches if needed
Retry failed requests carefully
Stop safely when GitHub rate limits are close
Store progress/logs
Avoid refreshing the same country twice at the same time
A lock key can help:
leaderboard:refresh-lock:{country}
Example:
leaderboard:refresh-lock:yemen
This prevents duplicate jobs from scoring the same country at the same time.
GitHub fetch limits
For normal scoring, we should keep GitHub fetch limits reasonable.
Suggested temporary limits:
PRs: 100
Issues: 100
Discussions: 50
These values are safer than fetching very large amounts for every user.
Later, we can support deeper fetching in scheduled jobs, where we can control rate limits and cache results.
Difference between compare scoring and leaderboard scoring
The normal compare feature and the leaderboard feature should not be treated exactly the same.
Compare feature
Used when a user compares two GitHub profiles.
Recommended behavior:
- Real-time scoring is acceptable
- Usually only 2 users
- Should be cached
- Should stay fast
Leaderboard feature
Used to rank many users in a country.
Recommended behavior:
- Should not score live
- Should use cached results
- Should be refreshed in background
- Should support pagination
- Should protect GitHub rate limits
This separation is important.
Proposed API design
Get country leaderboard
GET /api/leaderboard/[country]?page=1&pageSize=25
Reads cached results only.
Does not calculate scores live.
Refresh country leaderboard
This should not be public.
Possible options:
POST /api/internal/leaderboard/[country]/refresh
or a direct script:
pnpm leaderboard:refresh yemen
or:
pnpm leaderboard:refresh --all
This refresh logic can be used by GitHub Actions, a VPS cron job, or another worker.
Suggested implementation phases
Phase 1: Safe leaderboard foundation
Goal: keep the UI/routes safe while we prepare the backend architecture.
Tasks:
Disable live country scoring
Keep the UI/routes
Add temporary “leaderboard data will be available soon” state
Reduce GitHub fetch limits
Localize the header link
Make the header link responsive
Validate country slugs
Phase 2: Cached leaderboard API
Goal: make the country page read from cached data.
Tasks:
Add /api/leaderboard/[country]
Add cache read logic
Add not_ready response
Add pagination
Add lastUpdatedAt
Add scoreVersion
Add empty/error UI states
Phase 3: Scheduled score refresh
Goal: calculate country scores outside page requests.
Tasks:
Extract leaderboard refresh service
Add country user fetching
Normalize usernames
Score users in batches
Store results in cache
Track failed users
Add deterministic sorting
Add refresh metadata
Avoid replacing good cache with failed partial data
Phase 4: Production hardening
Goal: make the leaderboard safe and reliable in production.
Tasks:
Add rate limiting
Add concurrency control
Add refresh locks
Add retry logic
Add logs
Add admin/manual refresh command
Add tests for cache, pagination, failures, and stale data
Open questions
We still need to decide:
Where should cached leaderboard data be stored?
Redis
database
JSON files generated by scheduled jobs
another storage option
How often should leaderboards refresh?
daily
weekly
manually
based on country size
Should all countries refresh equally?
large countries may need slower/batched refreshes
smaller countries can refresh faster
Should the leaderboard show all users or only top users?
top 25
top 50
top 100
paginated full list
Should failed users be visible in admin/debug output only, or also exposed in API metadata?
Final direction
The leaderboard should be built around this principle:
Do expensive GitHub scoring in the background.
Serve leaderboard pages from cache.
Keep public API requests fast and safe.
This gives DevImpact a much better production foundation and avoids turning the leaderboard into an expensive live GitHub API workload.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Building a production-ready country leaderboard
The country-based leaderboard can be a valuable feature for DevImpact. It can help users discover impactful open-source developers by region and make the project more useful for exploring public GitHub activity.
This discussion is about how we can build the leaderboard in a scalable, reliable, and production-ready way.
The main goal is to avoid calculating scores for many users during page load or inside a public API request. Leaderboard scoring can be expensive because each GitHub user may require multiple GitHub API calls for repositories, pull requests, issues, and discussions.
Main problem
A country leaderboard can contain many users.
If we calculate all scores live when someone opens a country page, this can cause several problems:
For example, if one country has 200 users and each user requires multiple GitHub API calls, one page visit could trigger hundreds or thousands of GitHub requests.
That is not safe for production.
Core principle
The leaderboard page should read cached results only.
It should not calculate scores live.
The score calculation should happen separately in a scheduled/background process.
Recommended flow:
This keeps the UI fast, protects GitHub API limits, and gives us more control over score refreshes.
Proposed architecture
1. Country leaderboard page
Route:
Responsibilities:
/api/leaderboard/[country]lastUpdatedAtandscoreVersionThe page should not send a list of usernames to the scoring API.
2. Public leaderboard API
Route:
Example:
Responsibilities:
Example response:
{ "success": true, "status": "ready", "country": "yemen", "page": 1, "pageSize": 25, "total": 120, "lastUpdatedAt": "2026-06-13T10:00:00Z", "scoreVersion": "v1.0.0", "users": [ { "rank": 1, "username": "example", "name": "Example User", "avatarUrl": "https://github.com/example.png", "profileUrl": "https://github.com/example", "repoScore": 120.5, "prScore": 90.2, "contributionScore": 12.4, "finalScore": 87.6 } ] }Cache-not-ready state
When a country has not been scored yet, the API should not throw an error.
It should return a clean response:
{ "success": true, "status": "not_ready", "country": "yemen", "users": [], "message": "Leaderboard data is not available yet." }The UI can then show:
This is better than showing a failure state.
Separate score calculation from the API endpoint
The leaderboard endpoint should not calculate country scores.
We should extract the leaderboard scoring logic into reusable service functions.
Example structure:
Possible responsibilities:
This separation lets us use the scoring logic from different places:
Scheduled scoring job
The leaderboard should be refreshed in a scheduled job instead of during page load.
Possible options:
Option 1: GitHub Actions schedule
Good for early versions.
Pros:
Cons:
Option 2: VPS cron job
Good for more control.
Pros:
Cons:
Option 3: background worker / queue
Best long-term option.
Pros:
Cons:
For the first production version, GitHub Actions or a simple VPS cron job is probably enough.
Cache key design
Leaderboard cache keys should include the country slug and score version.
Example:
leaderboard:{country}:scoreVersion:{version}Example:
This is important because when the scoring algorithm changes, old leaderboard results may no longer be valid.
We should avoid mixing scores from different scoring versions.
Metadata to store with leaderboard results
Each cached leaderboard should include metadata:
{ "country": "yemen", "scoreVersion": "v1.0.0", "lastUpdatedAt": "2026-06-13T10:00:00Z", "totalUsers": 120, "successfulUsers": 115, "failedUsers": 5 }This helps with:
Handling failed users
Some GitHub users may fail during refresh.
Examples:
A single failed user should not fail the whole country leaderboard.
Recommended behavior:
{ "failedUsers": [ { "username": "example", "reason": "not_found" } ] }The refresh job should continue scoring the remaining users.
Avoid replacing good cache with partial failed data
We should avoid overwriting a working leaderboard with incomplete results if a refresh job fails halfway.
Recommended flow:
Example keys:
This prevents broken or incomplete data from replacing good data.
Pagination
Country leaderboard results should be paginated.
Example:
Default values:
This prevents returning hundreds or thousands of users in one response.
Deterministic sorting
Leaderboard sorting should be stable.
Suggested sorting order:
This prevents users with equal scores from changing order randomly between refreshes.
Country slug validation
Before reading cache or starting a refresh job, we should validate that the country slug exists in our known country list.
Invalid country slugs should return 404.
Example:
This avoids unnecessary work and gives cleaner behavior.
Duplicate usernames
Country source data may contain duplicate usernames.
Before scoring, usernames should be normalized:
This avoids duplicated GitHub API calls and duplicated leaderboard rows.
Rate-limit and concurrency control
The scheduled job should avoid running too many GitHub requests at the same time.
Recommended protections:
A lock key can help:
leaderboard:refresh-lock:{country}Example:
This prevents duplicate jobs from scoring the same country at the same time.
GitHub fetch limits
For normal scoring, we should keep GitHub fetch limits reasonable.
Suggested temporary limits:
These values are safer than fetching very large amounts for every user.
Later, we can support deeper fetching in scheduled jobs, where we can control rate limits and cache results.
Difference between compare scoring and leaderboard scoring
The normal compare feature and the leaderboard feature should not be treated exactly the same.
Compare feature
Used when a user compares two GitHub profiles.
Recommended behavior:
Leaderboard feature
Used to rank many users in a country.
Recommended behavior:
This separation is important.
Proposed API design
Get country leaderboard
Reads cached results only.
Does not calculate scores live.
Refresh country leaderboard
This should not be public.
Possible options:
or a direct script:
or:
This refresh logic can be used by GitHub Actions, a VPS cron job, or another worker.
Suggested implementation phases
Phase 1: Safe leaderboard foundation
Goal: keep the UI/routes safe while we prepare the backend architecture.
Tasks:
Phase 2: Cached leaderboard API
Goal: make the country page read from cached data.
Tasks:
/api/leaderboard/[country]not_readyresponselastUpdatedAtscoreVersionPhase 3: Scheduled score refresh
Goal: calculate country scores outside page requests.
Tasks:
Phase 4: Production hardening
Goal: make the leaderboard safe and reliable in production.
Tasks:
Open questions
We still need to decide:
Where should cached leaderboard data be stored?
How often should leaderboards refresh?
Should all countries refresh equally?
Should the leaderboard show all users or only top users?
Should failed users be visible in admin/debug output only, or also exposed in API metadata?
Final direction
The leaderboard should be built around this principle:
This gives DevImpact a much better production foundation and avoids turning the leaderboard into an expensive live GitHub API workload.
Beta Was this translation helpful? Give feedback.
All reactions