JobLoop

JobLoop is an intelligent, high-performance web scraping platform that automates the discovery of startup companies and their job openings from job provider portals. Built with concurrency at its core, JobLoop simultaneously scrapes company data, discovers testimonial images using AI vision, and aggregates job listings—all while serving real-time data through a REST API.

Overview

JobLoop crawls startup directories like Y Combinator and Peerlist to discover seed companies. For each seed company, it scrapes testimonial images from their websites, uses Anthropic's Claude Vision AI to extract company names mentioned in testimonials, then uses Claude Search to discover URLs for those companies, creating a growing network of discovered companies. Simultaneously, it scrapes job postings from all companies. All data is stored in PostgreSQL and exposed via a clean REST API.

Key Features

Multi-Source Scraping: Seeds initial company discovery from Y Combinator and Peerlist
Recursive Company Discovery: Extracts new companies from testimonials, creating a self-expanding network
Concurrent Processing: Scrapes jobs and testimonials in parallel using Go's goroutines
AI-Powered Vision: Uses Anthropic's Claude Vision API to analyze testimonial images and extract company names
Claude Search Integration: Leverages Anthropic's Claude Search to find URLs for discovered companies
Headless Browser Automation: Playwright-powered scraping handles JavaScript-rendered content
RESTful API: Query companies, jobs, and statistics through well-defined endpoints
PostgreSQL Storage: Robust relational database with proper indexing and constraints
Docker Support: Containerized deployment with all dependencies included
Structured Logging: JSON-based logging with zerolog for production monitoring

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        JobLoop Scraper                           │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
│  │ Y Combinator │    │  Peerlist    │    │   HTTP API   │      │
│  │   Scraper    │    │   Scraper    │    │   Server     │      │
│  └──────┬───────┘    └──────┬───────┘    └──────────────┘      │
│         │                   │                                   │
│         └───────┬───────────┘                                   │
│                 ▼                                                │
│     ┌───────────────────────┐                                   │
│     │ Playwright Browser    │                                   │
│     │    (Chromium)         │                                   │
│     └───────────────────────┘                                   │
│                 │                                                │
│                 ▼                                                │
│     ┌───────────────────────┐                                   │
│     │  SEED COMPANIES (DB)  │◄────────────────┐                │
│     │  (Root Companies)     │                 │                │
│     └───────────┬───────────┘                 │                │
│                 │                              │                │
│         ┌───────┴────────┐                     │                │
│         ▼                ▼                     │                │
│  ┌─────────────┐  ┌──────────────┐            │                │
│  │     Job     │  │ Testimonial  │            │                │
│  │   Scraper   │  │   Scraper    │            │                │
│  └──────┬──────┘  └──────┬───────┘            │                │
│         │                │                     │                │
│         │                ▼                     │                │
│         │         ┌──────────────┐             │                │
│         │         │   Anthropic  │             │                │
│         │         │ Vision API   │             │                │
│         │         │ (Extract Co.)│             │                │
│         │         └──────┬───────┘             │                │
│         │                │                     │                │
│         │                ▼                     │                │
│         │         ┌──────────────┐             │                │
│         │         │   Anthropic  │             │                │
│         │         │ Claude Search│             │                │
│         │         │ (Find URLs)  │             │                │
│         │         └──────┬───────┘             │                │
│         │                │                     │                │
│         │                └─────────────────────┘                │
│         │                     (New Seed Companies)              │
│         ▼                                                       │
│  ┌─────────────────────────┐                                   │
│  │   PostgreSQL Database   │                                   │
│  │  - seed_companies       │                                   │
│  │  - jobs                 │                                   │
│  │  - testimonial_companies│                                   │
│  └─────────────────────────┘                                   │
└──────────────────────────────────────────────────────────────────┘

Technology Stack

Language: Go 1.25
Database: PostgreSQL with GORM ORM
Browser Automation: Playwright (Chromium)
AI/ML:
- Anthropic Claude Vision API (testimonial analysis)
- Anthropic Claude Search (company URL discovery)
Logging: zerolog with file rotation (lumberjack)
Containerization: Docker with multi-stage builds
Concurrency: Native Go goroutines, sync primitives, and errgroups

Prerequisites

Before setting up JobLoop locally, ensure you have the following installed:

Go 1.25+ - Download
PostgreSQL 14+ - Download
Docker & Docker Compose (optional, for containerized setup) - Download
Git - Download

API Keys Required

You'll need to obtain the following API key:

Anthropic API Key - For Claude Vision AI and Claude Search

Sign up at Anthropic Console
Create a new API key
This single key is used for both Vision API (testimonial analysis) and Search API (URL discovery)

Installation & Setup

1. Clone the Repository

git clone https://github.com/chandhuDev/JobLoop.git
cd JobLoop

2. Set Up PostgreSQL Database

Option A: Local PostgreSQL

# Create the database
psql -U postgres
CREATE DATABASE jobloop;
\q

Option B: Using Docker

docker run -d \
  --name jobloop-postgres \
  -e POSTGRES_PASSWORD=yourpassword \
  -e POSTGRES_DB=jobloop \
  -p 5432:5432 \
  postgres:15-alpine

3. Configure Environment Variables

Create a .env file in the project root:

cp .env.example .env  # If example exists, otherwise create manually

Edit .env with your configuration:

# Required API Key
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Database Configuration
DB_LOCAL_HOST=localhost
DB_USER=postgres
DB_PASSWORD=yourpassword
DB_NAME=jobloop

# Optional: For Docker deployments
DB_HOST=jobloop-postgres
DB_PORT=5432

4. Install Go Dependencies

go mod download

5. Install Playwright Browsers

go run github.com/playwright-community/playwright-go/cmd/playwright install --with-deps chromium

This downloads the Chromium browser and required system dependencies.

6. Run the Application

# Build the application
go build -o bin/jobloop ./cmd/api/

# Run it
./bin/jobloop

Or run directly:

go run ./cmd/api/main.go

You should see output like:

{"level":"info","time":"2026-02-03T...","message":"seed company scraper started"}
{"level":"info","time":"2026-02-03T...","message":"Starting HTTP server","addr":":8081"}
{"level":"info","time":"2026-02-03T...","message":"worker started for ycombinator"}

7. Verify Installation

Test the API:

# Health check
curl http://localhost:8081/health

# Get statistics
curl http://localhost:8081/api/state

# List companies (after scraping completes)
curl http://localhost:8081/api/companies?limit=10

# List jobs
curl http://localhost:8081/api/jobs?limit=10

Docker Deployment

Build and Run with Docker

# Build the Docker image
docker build -t jobloop:latest .

# Run the container
docker run -d \
  --name jobloop \
  -p 8081:8081 \
  -e ANTHROPIC_API_KEY=your_key \
  -e DB_HOST=your_postgres_host \
  -e DB_USER=postgres \
  -e DB_PASSWORD=yourpassword \
  -e DB_NAME=jobloop \
  jobloop:latest

Using Docker Compose (Recommended)

Create docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    container_name: jobloop-postgres
    environment:
      POSTGRES_DB: jobloop
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: yourpassword
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  jobloop:
    build: .
    container_name: jobloop-app
    ports:
      - "8081:8081"
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      DB_HOST: postgres
      DB_PORT: 5432
      DB_USER: postgres
      DB_PASSWORD: yourpassword
      DB_NAME: jobloop
    depends_on:
      postgres:
        condition: service_healthy
    restart: unless-stopped

volumes:
  postgres_data:

Run with:

docker-compose up -d

API Reference

The HTTP server runs on port 8081 and provides the following endpoints:

Health Check

GET /health

Response:

{
  "status": "ok",
  "time": "2026-02-03T10:30:00Z"
}

Get Database Statistics

GET /api/state

Response:

{
  "companies": 50,
  "jobs": 1247,
  "timestamp": "2026-02-03T10:30:00Z"
}

List Companies

GET /api/companies?limit=50&offset=0

Query Parameters:

limit (optional): Number of results (1-100, default: 50)
offset (optional): Pagination offset (default: 0)

Response:

{
  "data": [
    {
      "id": 1,
      "company_name": "Acme Corp",
      "company_url": "https://acme.com",
      "visited": true,
      "testimonial_scraped": true,
      "job_scraped": true,
      "created_at": "2026-02-03T10:00:00Z"
    }
  ],
  "total": 50,
  "limit": 50,
  "offset": 0
}

List Jobs

GET /api/jobs?limit=50&offset=0&company_id=1

Query Parameters:

limit (optional): Number of results (1-100, default: 50)
offset (optional): Pagination offset (default: 0)
company_id (optional): Filter by specific company

Response:

{
  "data": [
    {
      "id": 1,
      "seed_company_id": 1,
      "job_title": "Senior Software Engineer",
      "job_url": "https://acme.com/careers/senior-swe",
      "created_at": "2026-02-03T10:15:00Z"
    }
  ],
  "total": 25,
  "limit": 50,
  "offset": 0
}

Project Structure

JobLoop/
├── cmd/
│   └── api/
│       └── main.go              # Application entry point
├── internal/
│   ├── config/                  # Configuration files
│   ├── database/
│   │   └── database_service.go  # Database connection & setup
│   ├── interfaces/              # Interface definitions
│   ├── logger/
│   │   └── logger.go           # Structured logging setup
│   ├── models/                  # Data models & DTOs
│   ├── repository/              # Database operations
│   │   ├── job_repo.go
│   │   ├── seed_company_repo.go
│   │   └── testimonial_repo.go
│   ├── schema/
│   │   └── schema.go           # GORM database schemas
│   └── service/
│       ├── browser_service.go   # Playwright browser management
│       ├── error_service.go     # Error handling
│       ├── http_handler_service.go  # API endpoints
│       ├── scraper_service.go   # Job scraping logic
│       ├── search_service.go    # Claude Search integration
│       ├── seed_company_service.go  # Company scraping
│       ├── testimonial_service.go   # Testimonial scraping
│       └── vision_service.go    # Claude Vision AI integration
├── logs/                        # Application logs (gitignored)
├── .env                        # Environment variables (gitignored)
├── .gitignore
├── Dockerfile                   # Docker build configuration
├── go.mod                      # Go module definition
├── go.sum                      # Dependency checksums
└── README.md                   # This file

How It Works

1. Seed Company Discovery (Initial Bootstrap)

JobLoop starts by scraping seed companies from:

Y Combinator Companies Directory (/companies)
Peerlist Jobs Board (/jobs)

For each source, it:

Uses Playwright to navigate to the listing page
Waits for JavaScript-rendered content to load
Extracts company names and URLs
Stores companies in PostgreSQL as seed companies with unique constraints

2. Job Scraping (Concurrent)

For each seed company, JobLoop concurrently:

Searches for the company's careers page
Scrapes available job listings (title, URL)
Stores jobs with a composite unique index on (seed_company_id, job_title)
Handles missing careers pages gracefully

3. Recursive Company Discovery via Testimonials (The Growth Engine)

In parallel, the testimonial scraper creates a self-expanding company network:

Scrape Testimonials: For each seed company, scrapes testimonial images from their website
Extract Companies: Uses Claude Vision API to analyze testimonial images and extract company names mentioned
Find URLs: Uses Claude Search to discover URLs for the extracted company names
Create New Seeds: Stores discovered companies as new seed companies in the database
Repeat: These new seed companies feed back into the job scraping and testimonial discovery cycle

This creates a recursive discovery loop where companies lead to more companies.

4. Concurrency Model

Goroutines: Each company scraper runs in its own goroutine
Wait Groups: Coordinates completion of scraping batches
Channels: Passes seed company data between scraper stages
Error Groups: Manages HTTP server and scraper lifecycles
Atomic Counters: Limits max companies processed per batch (configurable)

5. Data Flow

┌─────────────────────────────────────────────────────────────┐
│                    Initial Seed Companies                   │
│              (Y Combinator, Peerlist, etc.)                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
         ┌───────────────────────────────┐
         │  Store as Seed Companies (DB) │
         └───────────┬───────────────────┘
                     │
         ┌───────────┴───────────┐
         ▼                       ▼
┌──────────────────┐    ┌──────────────────┐
│   Job Scraper    │    │ Testimonial      │
│                  │    │ Scraper          │
│ - Find careers   │    │                  │
│ - Extract jobs   │    │ 1. Scrape images │
│ - Store in DB    │    │ 2. Vision API    │
└──────────────────┘    │    (extract co.) │
                        │ 3. Claude Search │
                        │    (find URLs)   │
                        └────────┬─────────┘
                                 │
                                 ▼
                        ┌──────────────────┐
                        │ New Companies    │
                        │ (Back to DB as   │
                        │  Seed Companies) │
                        └────────┬─────────┘
                                 │
                                 └─────► Cycle Repeats

Configuration

Scraper Limits

Edit internal/service/seed_company_service.go to adjust:

// Y Combinator scraper
const maxCompanies = 50  // Line 123

// Peerlist scraper (in UploadSeedCompanyToChannel)
const maxCompanies = 15  // Line 236

Scraper Sources

Modify cmd/api/main.go (lines 140-153) to add/remove sources:

SeedCompanyConfigs := []models.SeedCompany{
    {
        Name:     "Y Combinator",
        URL:      "http://www.ycombinator.com/companies",
        Selector: `a[href^="/companies/"]`,
        WaitTime: 3 * time.Second,
    },
    // Add more sources here
}

Database Schema

The application auto-migrates three tables:

seed_companies - Root companies and recursively discovered companies
jobs - Job listings scraped from seed companies
testimonial_companies - Companies extracted from testimonials (before becoming seed companies)

Development

Running Tests

go test ./...

Build for Production

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
  -ldflags="-w -s" \
  -o bin/jobloop ./cmd/api/

Logging

Logs are written to:

stdout (JSON format for production)
logs/app.log (file rotation enabled, max 100MB, 30 days retention)

View live logs:

tail -f logs/app.log | jq

Contributing

We welcome contributions to JobLoop! Here's how to get started:

1. Fork & Clone

# Fork the repository on GitHub, then:
git clone https://github.com/YOUR_USERNAME/JobLoop.git
cd JobLoop
git remote add upstream https://github.com/chandhuDev/JobLoop.git

2. Create a Feature Branch

git checkout -b feature/your-feature-name

3. Set Up Development Environment

Follow the Installation & Setup section above.

4. Make Your Changes

Write clean, idiomatic Go code
Follow existing code structure and naming conventions
Add comments for complex logic
Update tests if applicable

5. Test Your Changes

# Run the application
go run ./cmd/api/main.go

# Verify API endpoints
curl http://localhost:8081/health
curl http://localhost:8081/api/companies

6. Commit Your Changes

git add .
git commit -m "feat: add your feature description"

Follow Conventional Commits:

feat: - New feature
fix: - Bug fix
docs: - Documentation changes
refactor: - Code refactoring
test: - Adding tests
chore: - Maintenance tasks

7. Push & Create Pull Request

git push origin feature/your-feature-name

Then open a Pull Request on GitHub with:

Clear description of changes
Any related issue numbers
Screenshots (if UI changes)

Development Guidelines

Code Style: Run gofmt and golint before committing
Error Handling: Always handle errors explicitly, never use _ unless justified
Logging: Use structured logging with appropriate levels (Info, Warn, Error)
Concurrency: Document any goroutines, channels, or sync primitives
Database: Use GORM best practices, avoid N+1 queries
API: Maintain backward compatibility for existing endpoints

Useful Make Commands (if Makefile exists)

make build    # Build binary
make run      # Run application
make test     # Run tests
make docker   # Build Docker image
make clean    # Clean build artifacts

Troubleshooting

Issue: "Found companies 0" for Y Combinator

Cause: Y Combinator's page is JavaScript-rendered and takes time to load.

Solution: The selector might have changed. Inspect the page manually:

Visit https://www.ycombinator.com/companies
Right-click on a company → Inspect
Find the correct CSS selector
Update in cmd/api/main.go line 144

Issue: Playwright browser fails to launch

Cause: Missing system dependencies.

Solution:

# Re-install with dependencies
go run github.com/playwright-community/playwright-go/cmd/playwright install --with-deps chromium

# On Linux, you may need:
sudo apt-get install -y libgbm1 libnss3 libatk1.0-0

Issue: Database connection failed

Cause: PostgreSQL not running or wrong credentials.

Solution:

# Check if PostgreSQL is running
pg_isready -h localhost -p 5432

# Verify credentials in .env match your PostgreSQL setup
# Try connecting manually:
psql -h localhost -U postgres -d jobloop

Issue: Rate limiting from APIs

Cause: Too many concurrent requests to Anthropic APIs.

Solution: Reduce maxCompanies limits or add delays between requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Playwright Go - Browser automation
Anthropic - Claude Vision AI and Claude Search
GORM - Go ORM library
zerolog - Structured logging

Support

For issues, questions, or contributions:

Open an issue on GitHub Issues
Start a discussion on GitHub Discussions

Built with ❤️ using Go, PostgreSQL, and AI

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
cmd		cmd
configs		configs
internal		internal
server		server
.air.toml		.air.toml
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile.scraper		Dockerfile.scraper
Dockerfile.server		Dockerfile.server
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

JobLoop

Overview

Key Features

Architecture

Technology Stack

Prerequisites

API Keys Required

Installation & Setup

1. Clone the Repository

2. Set Up PostgreSQL Database

Option A: Local PostgreSQL

Option B: Using Docker

3. Configure Environment Variables

4. Install Go Dependencies

5. Install Playwright Browsers

6. Run the Application

7. Verify Installation

Docker Deployment

Build and Run with Docker

Using Docker Compose (Recommended)

API Reference

Health Check

Get Database Statistics

List Companies

List Jobs

Project Structure

How It Works

1. Seed Company Discovery (Initial Bootstrap)

2. Job Scraping (Concurrent)

3. Recursive Company Discovery via Testimonials (The Growth Engine)

4. Concurrency Model

5. Data Flow

Configuration

Scraper Limits

Scraper Sources

Database Schema

Development

Running Tests

Build for Production

Logging

Contributing

1. Fork & Clone

2. Create a Feature Branch

3. Set Up Development Environment

4. Make Your Changes

5. Test Your Changes

6. Commit Your Changes

7. Push & Create Pull Request

Development Guidelines

Useful Make Commands (if Makefile exists)

Troubleshooting

Issue: "Found companies 0" for Y Combinator

Issue: Playwright browser fails to launch

Issue: Database connection failed

Issue: Rate limiting from APIs

License

Acknowledgments

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages