feat: Replace Kickstarter GraphQL/REST APIs with ScrapingBee#11
Merged
Conversation
Replace non-functional Kickstarter GraphQL and REST API clients with ScrapingBee-based implementation. New components: - ScrapingBeeClient: HTTP client with rate limiting and retry logic - KickstarterScrapingService: Main service implementing Search, DiscoverCampaigns, and FetchCategories - HTML parser: Parse Kickstarter discover pages (5 credits/page) - Hardcoded categories: Zero-cost category data Changes: - Update handlers to use KickstarterScrapingService - Update cron service for nightly batch crawl - Add ScrapingBee configuration (API key, max concurrent) - Add goquery dependency for HTML parsing Cost optimization: - Nightly crawl uses HTML parsing (5 credits × 150 pages = 750/night) - Search uses AI extraction (10 credits/request) - Categories are hardcoded (0 credits) - Estimated: ~52K credits/month Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes three critical issues identified in code review: 1. P1: Fix pagination - preserve cursor state across requests - Parse page number from cursor format "page:N" - Build discover URL with correct page parameter - Generate NextCursor for full pages - iOS app will now correctly paginate instead of showing duplicates 2. P1: Use canonical project URLs from Kickstarter - Prioritize urls.web.project over slug-based reconstruction - Prevents 404s for "Back this project" and share actions - AI query now explicitly requests project_url field - parseAIResponse uses full URL when available 3. P1: Fail fast if SCRAPINGBEE_API_KEY is missing - Validate API key at startup before service initialization - Log fatal error with clear message if key is not set - Prevents 500s in production after deployment - Ensures proper configuration before accepting traffic Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Contributor
Author
✅ P1 Issues FixedFixed all three priority-1 issues identified in code review: 1. ✅ Pagination Now Works CorrectlyIssue: Cursor was ignored, always returning page 1, causing iOS app to show duplicates. Fix:
Result: iOS DiscoverViewModel/SearchView will now properly paginate through results. 2. ✅ Project URLs Are Now CanonicalIssue: Building URLs from slug alone ( Fix:
Result: "Back this project" and share actions will now work correctly. 3. ✅ Fail Fast on Missing API KeyIssue: Service would start without API key, pass health checks, then return 500s after deployment. Fix:
Result: Deployment will fail early if API key is not provisioned, preventing production issues. All changes tested and code compiles successfully. Ready for deployment once |
- Emit null (not "") for next_cursor when HasNextPage is false so iOS Optional<String> decodes as nil and hasMore stays false - Add SCRAPINGBEE_API_KEY secret to ECS task definition secrets in deploy-backend.yml (resolve ARN + inject) to prevent boot failure - Extract creator.slug from data-project JSON and use creator_slug+slug for fallback URL construction; remove broken single-slug fallback - Accept creator_slug from AI response struct and build full URL from it; leave ProjectURL empty rather than synthesising a broken URL - Update AI query to explicitly request creator_slug and project_url fields
- Keep ScrapingBee service (drop graph/REST + Webshare proxy) - Add hot sort path from develop (DB velocity_24h query) - Add DB fallback on ScrapingBee error from develop - Add startup crawl trigger from develop - Remove ProxyURL / WEBSHARE_PROXY_URL (not used)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces non-functional Kickstarter GraphQL and REST API clients with ScrapingBee-based implementation. The original APIs stopped working, requiring a complete reimplementation using web scraping.
Changes
New Components
ScrapingBeeClient (
scrapingbee_client.go) - HTTP client with:KickstarterScrapingService (
kickstarter_scraping.go) - Main service implementing:Search()- AI extraction for user searches (10 credits)DiscoverCampaigns()- HTML parsing for batch crawl (5 credits)FetchCategories()- Hardcoded categories (0 credits)HTML Parser (
kickstarter_parser.go) - Extracts campaign data from discover pages using goqueryHardcoded Categories (
categories.go) - 15 root Kickstarter categoriesModified Components
cmd/api/main.go- Wire up ScrapingBee service instead of graph/REST clientsinternal/service/cron.go- Updated to use scraping service for nightly crawlinternal/handler/campaigns.go- Updated handlers to use new serviceinternal/config/config.go- Added ScrapingBee configuration (API key, max concurrent).env.example- AddedSCRAPINGBEE_API_KEYandSCRAPINGBEE_MAX_CONCURRENTTesting
✅ Categories endpoint - Returns 15 hardcoded categories (0 credits)
✅ Campaigns listing - Successfully fetches campaigns from discover pages (5 credits/page)
✅ Compilation - Code builds successfully
✅ API integration - Tested with real ScrapingBee API
Test Results
Cost Optimization
Dependencies Added
github.com/PuerkitoBio/goquery v1.11.0- HTML parsingBreaking Changes
None - maintains same API interface and response format
Future Improvements
Test Plan
🤖 Generated with Claude Code