JobLoop is an intelligent, high-performance web scraping platform that automates the discovery of startup companies and their job openings from job provider portals. Built with concurrency at its core, JobLoop simultaneously scrapes company data, discovers testimonial images using AI vision, and aggregates job listings—all while serving real-time data through a REST API.
JobLoop crawls startup directories like Y Combinator and Peerlist to discover seed companies. For each seed company, it scrapes testimonial images from their websites, uses Anthropic's Claude Vision AI to extract company names mentioned in testimonials, then uses Claude Search to discover URLs for those companies, creating a growing network of discovered companies. Simultaneously, it scrapes job postings from all companies. All data is stored in PostgreSQL and exposed via a clean REST API.
- Multi-Source Scraping: Seeds initial company discovery from Y Combinator and Peerlist
- Recursive Company Discovery: Extracts new companies from testimonials, creating a self-expanding network
- Concurrent Processing: Scrapes jobs and testimonials in parallel using Go's goroutines
- AI-Powered Vision: Uses Anthropic's Claude Vision API to analyze testimonial images and extract company names
- Claude Search Integration: Leverages Anthropic's Claude Search to find URLs for discovered companies
- Headless Browser Automation: Playwright-powered scraping handles JavaScript-rendered content
- RESTful API: Query companies, jobs, and statistics through well-defined endpoints
- PostgreSQL Storage: Robust relational database with proper indexing and constraints
- Docker Support: Containerized deployment with all dependencies included
- Structured Logging: JSON-based logging with zerolog for production monitoring
┌──────────────────────────────────────────────────────────────────┐
│ JobLoop Scraper │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Y Combinator │ │ Peerlist │ │ HTTP API │ │
│ │ Scraper │ │ Scraper │ │ Server │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┘ │
│ │ │ │
│ └───────┬───────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Playwright Browser │ │
│ │ (Chromium) │ │
│ └───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ SEED COMPANIES (DB) │◄────────────────┐ │
│ │ (Root Companies) │ │ │
│ └───────────┬───────────┘ │ │
│ │ │ │
│ ┌───────┴────────┐ │ │
│ ▼ ▼ │ │
│ ┌─────────────┐ ┌──────────────┐ │ │
│ │ Job │ │ Testimonial │ │ │
│ │ Scraper │ │ Scraper │ │ │
│ └──────┬──────┘ └──────┬───────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Anthropic │ │ │
│ │ │ Vision API │ │ │
│ │ │ (Extract Co.)│ │ │
│ │ └──────┬───────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Anthropic │ │ │
│ │ │ Claude Search│ │ │
│ │ │ (Find URLs) │ │ │
│ │ └──────┬───────┘ │ │
│ │ │ │ │
│ │ └─────────────────────┘ │
│ │ (New Seed Companies) │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ PostgreSQL Database │ │
│ │ - seed_companies │ │
│ │ - jobs │ │
│ │ - testimonial_companies│ │
│ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
- Language: Go 1.25
- Database: PostgreSQL with GORM ORM
- Browser Automation: Playwright (Chromium)
- AI/ML:
- Anthropic Claude Vision API (testimonial analysis)
- Anthropic Claude Search (company URL discovery)
- Logging: zerolog with file rotation (lumberjack)
- Containerization: Docker with multi-stage builds
- Concurrency: Native Go goroutines, sync primitives, and errgroups
Before setting up JobLoop locally, ensure you have the following installed:
- Go 1.25+ - Download
- PostgreSQL 14+ - Download
- Docker & Docker Compose (optional, for containerized setup) - Download
- Git - Download
You'll need to obtain the following API key:
Anthropic API Key - For Claude Vision AI and Claude Search
- Sign up at Anthropic Console
- Create a new API key
- This single key is used for both Vision API (testimonial analysis) and Search API (URL discovery)
git clone https://github.com/chandhuDev/JobLoop.git
cd JobLoop# Create the database
psql -U postgres
CREATE DATABASE jobloop;
\qdocker run -d \
--name jobloop-postgres \
-e POSTGRES_PASSWORD=yourpassword \
-e POSTGRES_DB=jobloop \
-p 5432:5432 \
postgres:15-alpineCreate a .env file in the project root:
cp .env.example .env # If example exists, otherwise create manuallyEdit .env with your configuration:
# Required API Key
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Database Configuration
DB_LOCAL_HOST=localhost
DB_USER=postgres
DB_PASSWORD=yourpassword
DB_NAME=jobloop
# Optional: For Docker deployments
DB_HOST=jobloop-postgres
DB_PORT=5432go mod downloadgo run github.com/playwright-community/playwright-go/cmd/playwright install --with-deps chromiumThis downloads the Chromium browser and required system dependencies.
# Build the application
go build -o bin/jobloop ./cmd/api/
# Run it
./bin/jobloopOr run directly:
go run ./cmd/api/main.goYou should see output like:
{"level":"info","time":"2026-02-03T...","message":"seed company scraper started"}
{"level":"info","time":"2026-02-03T...","message":"Starting HTTP server","addr":":8081"}
{"level":"info","time":"2026-02-03T...","message":"worker started for ycombinator"}
Test the API:
# Health check
curl http://localhost:8081/health
# Get statistics
curl http://localhost:8081/api/state
# List companies (after scraping completes)
curl http://localhost:8081/api/companies?limit=10
# List jobs
curl http://localhost:8081/api/jobs?limit=10# Build the Docker image
docker build -t jobloop:latest .
# Run the container
docker run -d \
--name jobloop \
-p 8081:8081 \
-e ANTHROPIC_API_KEY=your_key \
-e DB_HOST=your_postgres_host \
-e DB_USER=postgres \
-e DB_PASSWORD=yourpassword \
-e DB_NAME=jobloop \
jobloop:latestCreate docker-compose.yml:
version: '3.8'
services:
postgres:
image: postgres:15-alpine
container_name: jobloop-postgres
environment:
POSTGRES_DB: jobloop
POSTGRES_USER: postgres
POSTGRES_PASSWORD: yourpassword
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
jobloop:
build: .
container_name: jobloop-app
ports:
- "8081:8081"
environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
DB_HOST: postgres
DB_PORT: 5432
DB_USER: postgres
DB_PASSWORD: yourpassword
DB_NAME: jobloop
depends_on:
postgres:
condition: service_healthy
restart: unless-stopped
volumes:
postgres_data:Run with:
docker-compose up -dThe HTTP server runs on port 8081 and provides the following endpoints:
GET /healthResponse:
{
"status": "ok",
"time": "2026-02-03T10:30:00Z"
}GET /api/stateResponse:
{
"companies": 50,
"jobs": 1247,
"timestamp": "2026-02-03T10:30:00Z"
}GET /api/companies?limit=50&offset=0Query Parameters:
limit(optional): Number of results (1-100, default: 50)offset(optional): Pagination offset (default: 0)
Response:
{
"data": [
{
"id": 1,
"company_name": "Acme Corp",
"company_url": "https://acme.com",
"visited": true,
"testimonial_scraped": true,
"job_scraped": true,
"created_at": "2026-02-03T10:00:00Z"
}
],
"total": 50,
"limit": 50,
"offset": 0
}GET /api/jobs?limit=50&offset=0&company_id=1Query Parameters:
limit(optional): Number of results (1-100, default: 50)offset(optional): Pagination offset (default: 0)company_id(optional): Filter by specific company
Response:
{
"data": [
{
"id": 1,
"seed_company_id": 1,
"job_title": "Senior Software Engineer",
"job_url": "https://acme.com/careers/senior-swe",
"created_at": "2026-02-03T10:15:00Z"
}
],
"total": 25,
"limit": 50,
"offset": 0
}JobLoop/
├── cmd/
│ └── api/
│ └── main.go # Application entry point
├── internal/
│ ├── config/ # Configuration files
│ ├── database/
│ │ └── database_service.go # Database connection & setup
│ ├── interfaces/ # Interface definitions
│ ├── logger/
│ │ └── logger.go # Structured logging setup
│ ├── models/ # Data models & DTOs
│ ├── repository/ # Database operations
│ │ ├── job_repo.go
│ │ ├── seed_company_repo.go
│ │ └── testimonial_repo.go
│ ├── schema/
│ │ └── schema.go # GORM database schemas
│ └── service/
│ ├── browser_service.go # Playwright browser management
│ ├── error_service.go # Error handling
│ ├── http_handler_service.go # API endpoints
│ ├── scraper_service.go # Job scraping logic
│ ├── search_service.go # Claude Search integration
│ ├── seed_company_service.go # Company scraping
│ ├── testimonial_service.go # Testimonial scraping
│ └── vision_service.go # Claude Vision AI integration
├── logs/ # Application logs (gitignored)
├── .env # Environment variables (gitignored)
├── .gitignore
├── Dockerfile # Docker build configuration
├── go.mod # Go module definition
├── go.sum # Dependency checksums
└── README.md # This file
JobLoop starts by scraping seed companies from:
- Y Combinator Companies Directory (
/companies) - Peerlist Jobs Board (
/jobs)
For each source, it:
- Uses Playwright to navigate to the listing page
- Waits for JavaScript-rendered content to load
- Extracts company names and URLs
- Stores companies in PostgreSQL as seed companies with unique constraints
For each seed company, JobLoop concurrently:
- Searches for the company's careers page
- Scrapes available job listings (title, URL)
- Stores jobs with a composite unique index on
(seed_company_id, job_title) - Handles missing careers pages gracefully
In parallel, the testimonial scraper creates a self-expanding company network:
- Scrape Testimonials: For each seed company, scrapes testimonial images from their website
- Extract Companies: Uses Claude Vision API to analyze testimonial images and extract company names mentioned
- Find URLs: Uses Claude Search to discover URLs for the extracted company names
- Create New Seeds: Stores discovered companies as new seed companies in the database
- Repeat: These new seed companies feed back into the job scraping and testimonial discovery cycle
This creates a recursive discovery loop where companies lead to more companies.
- Goroutines: Each company scraper runs in its own goroutine
- Wait Groups: Coordinates completion of scraping batches
- Channels: Passes seed company data between scraper stages
- Error Groups: Manages HTTP server and scraper lifecycles
- Atomic Counters: Limits max companies processed per batch (configurable)
┌─────────────────────────────────────────────────────────────┐
│ Initial Seed Companies │
│ (Y Combinator, Peerlist, etc.) │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Store as Seed Companies (DB) │
└───────────┬───────────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Job Scraper │ │ Testimonial │
│ │ │ Scraper │
│ - Find careers │ │ │
│ - Extract jobs │ │ 1. Scrape images │
│ - Store in DB │ │ 2. Vision API │
└──────────────────┘ │ (extract co.) │
│ 3. Claude Search │
│ (find URLs) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ New Companies │
│ (Back to DB as │
│ Seed Companies) │
└────────┬─────────┘
│
└─────► Cycle Repeats
Edit internal/service/seed_company_service.go to adjust:
// Y Combinator scraper
const maxCompanies = 50 // Line 123
// Peerlist scraper (in UploadSeedCompanyToChannel)
const maxCompanies = 15 // Line 236Modify cmd/api/main.go (lines 140-153) to add/remove sources:
SeedCompanyConfigs := []models.SeedCompany{
{
Name: "Y Combinator",
URL: "http://www.ycombinator.com/companies",
Selector: `a[href^="/companies/"]`,
WaitTime: 3 * time.Second,
},
// Add more sources here
}The application auto-migrates three tables:
seed_companies- Root companies and recursively discovered companiesjobs- Job listings scraped from seed companiestestimonial_companies- Companies extracted from testimonials (before becoming seed companies)
go test ./...CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags="-w -s" \
-o bin/jobloop ./cmd/api/Logs are written to:
- stdout (JSON format for production)
- logs/app.log (file rotation enabled, max 100MB, 30 days retention)
View live logs:
tail -f logs/app.log | jqWe welcome contributions to JobLoop! Here's how to get started:
# Fork the repository on GitHub, then:
git clone https://github.com/YOUR_USERNAME/JobLoop.git
cd JobLoop
git remote add upstream https://github.com/chandhuDev/JobLoop.gitgit checkout -b feature/your-feature-nameFollow the Installation & Setup section above.
- Write clean, idiomatic Go code
- Follow existing code structure and naming conventions
- Add comments for complex logic
- Update tests if applicable
# Run the application
go run ./cmd/api/main.go
# Verify API endpoints
curl http://localhost:8081/health
curl http://localhost:8081/api/companiesgit add .
git commit -m "feat: add your feature description"Follow Conventional Commits:
feat:- New featurefix:- Bug fixdocs:- Documentation changesrefactor:- Code refactoringtest:- Adding testschore:- Maintenance tasks
git push origin feature/your-feature-nameThen open a Pull Request on GitHub with:
- Clear description of changes
- Any related issue numbers
- Screenshots (if UI changes)
- Code Style: Run
gofmtandgolintbefore committing - Error Handling: Always handle errors explicitly, never use
_unless justified - Logging: Use structured logging with appropriate levels (Info, Warn, Error)
- Concurrency: Document any goroutines, channels, or sync primitives
- Database: Use GORM best practices, avoid N+1 queries
- API: Maintain backward compatibility for existing endpoints
make build # Build binary
make run # Run application
make test # Run tests
make docker # Build Docker image
make clean # Clean build artifactsCause: Y Combinator's page is JavaScript-rendered and takes time to load.
Solution: The selector might have changed. Inspect the page manually:
- Visit
https://www.ycombinator.com/companies - Right-click on a company → Inspect
- Find the correct CSS selector
- Update in
cmd/api/main.goline 144
Cause: Missing system dependencies.
Solution:
# Re-install with dependencies
go run github.com/playwright-community/playwright-go/cmd/playwright install --with-deps chromium
# On Linux, you may need:
sudo apt-get install -y libgbm1 libnss3 libatk1.0-0Cause: PostgreSQL not running or wrong credentials.
Solution:
# Check if PostgreSQL is running
pg_isready -h localhost -p 5432
# Verify credentials in .env match your PostgreSQL setup
# Try connecting manually:
psql -h localhost -U postgres -d jobloopCause: Too many concurrent requests to Anthropic APIs.
Solution: Reduce maxCompanies limits or add delays between requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Playwright Go - Browser automation
- Anthropic - Claude Vision AI and Claude Search
- GORM - Go ORM library
- zerolog - Structured logging
For issues, questions, or contributions:
- Open an issue on GitHub Issues
- Start a discussion on GitHub Discussions
Built with ❤️ using Go, PostgreSQL, and AI