Production-ready internship aggregator with AI-powered job matching
Live Demo: jobfinder.asf0.dev
Try the live app to search internships, apply filters, and test resume-based job matching.
InternNexus aggregates internship opportunities from multiple job boards (Greenhouse, Lever, Ashby) and provides intelligent filtering, categorization, visa sponsorship search, and AI-powered resume matching.
- 15,000+ Jobs from 145+ companies
- Multi-Source Aggregation: Greenhouse, Lever, Workday, Ashby, SmartRecruiters
- Hybrid Search: Keyword + semantic (vector) search combined for best results
- Boolean Search: Advanced syntax (
AND,OR,NOT,"exact",field:value) - AI-Powered Matching: Resume-to-job matching using local LLM embeddings
- Smart Categorization: Automatic job categorization (Software Engineering, Data Science, PM, etc.)
- Advanced Filtering: Category, location, visa sponsorship, FAANG+, work mode
- Pipeline Resume: Interrupted runs can be resumed from last successful step
- Production Ready: Rate limiting, JWT auth, OAuth, comprehensive testing
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Frontend ββββββΆβ Backend ββββββΆβ PostgreSQL β
β (Next.js) β β (FastAPI) β β + pgvector β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β External Services β
β OpenAI-compatible embeddings β
β Greenhouse / Lever / Ashby β
βββββββββββββββββββββββββββββββ
Tech Stack:
- Frontend: Next.js 16, TypeScript, Tailwind CSS
- Backend: FastAPI, SQLAlchemy 2.0, Pydantic
- Database: PostgreSQL 18 + pgvector extension
- Cache: In-memory TTL cache (optional external Redis)
- AI: Ollama or an OpenAI-compatible API (embeddings)
- Geo: pycountry (ISO country/state lookups)
- Docker and
docker compose - pnpm (frontend package manager)
- uv (Python package manager)
- Python 3.12+
- OpenAI-compatible API or Ollama (for embeddings)
git clone <repository-url>
cd internjobs
cp .env.example .env
# Edit .env with your settingsdocker compose up -d dbcd backend
uv sync --group dev
# Run database migrations
uv run alembic -c alembic.ini upgrade head
# Start the backend server
uv run uvicorn app.main:app --reloadNote: pycountry will be installed automatically for location normalization.
cd frontend
pnpm install
pnpm devcd pipeline
uv sync --group dev
uv run internnexus-pipeline --skip-discoverDone! Visit http://localhost:3000
For day-to-day development, run only Postgres in Docker and run the app services in terminals. Set POSTGRES_HOST=localhost in the repo root .env when using the local Docker database.
docker compose up -d dbTerminal 1:
cd backend
uv run alembic -c alembic.ini upgrade head
uv run uvicorn app.main:app --reloadTerminal 2:
cd pipeline
uv run internnexus-pipeline --skip-discoverTerminal 3:
cd frontend
pnpm devAfter signing in once with Google, promote your user from the local database. Replace the email value with the Google email you used to sign in.
docker exec jobs-db sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "insert into admins (id, user_id, role, granted_by, notes) select gen_random_uuid(), id, '\''super_admin'\'', id, '\''local dev bootstrap'\'' from users where email = '\''you@example.com'\'' on conflict (user_id) do update set role = excluded.role, granted_by = excluded.granted_by, granted_at = now(), notes = excluded.notes;"'The ingestion system runs 7 sequential steps:
| Step | Action | Description |
|---|---|---|
| 1 | Discover | Verify companies have active job boards |
| 2 | Sync inactive | Mark existing jobs inactive before refresh |
| 3 | Ingest | Fetch jobs from APIs, deduplicate, and upsert |
| 4 | Delete inactive | Remove jobs no longer present upstream |
| 5 | Cleanup | Normalize location data (city/state/country) |
| 6 | Classify | Categorize jobs with the configured model |
| 7 | Embed | Generate vector embeddings for matching |
# Run from pipeline/
cd pipeline
uv run internnexus-pipeline
# Skip discovery (faster, uses cached companies)
uv run internnexus-pipeline --skip-discover
# Run continuously (interval from config)
uv run internnexus-pipeline -c
# Run with custom interval
uv run internnexus-pipeline -c --interval 3600
# Single step execution
uv run internnexus-pipeline --step discover
uv run internnexus-pipeline --step ingest
uv run internnexus-pipeline --step cleanup
uv run internnexus-pipeline --step embed
# Utility commands
uv run internnexus-pipeline --dry-run # Preview without changes
uv run internnexus-pipeline --resume # Resume failed run
uv run internnexus-pipeline --check # Health checks only
uv run internnexus-pipeline --fresh # Clear incomplete runs
# Re-process ALL locations (careful!)
uv run internnexus-pipeline --step cleanup --allDocumentation is still lightweight. For now, use README.md, backend/.env.example, and the code in backend/, frontend/, and pipeline/ as the primary reference.
# Backend
cd backend && uv run pytest tests
cd backend && uv run pytest tests --cov=app
# Pipeline
cd pipeline && uv run pytest tests
# Frontend
cd frontend && pnpm run lint && pnpm testKey environment variables:
# Database
POSTGRES_DB=internnexus
POSTGRES_USER=internnexus
POSTGRES_PASSWORD=secure_password
# Redis (optional; leave empty for in-memory cache)
REDIS_URL=
# Auth (min 32 characters)
AUTH_SECRET=your-super-secret-key-min-32-chars
# AI Provider
EMBEDDING_PROVIDER=ollama
OPENAI_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-textUse EMBEDDING_PROVIDER=openai-compatible with an OpenAI-compatible embeddings endpoint.
See .env.example for additional configuration options.
InternNexus supports advanced boolean search syntax:
| Query | Result |
|---|---|
python |
Hybrid search (keyword + semantic) |
python AND remote |
Both terms required |
python OR java |
Either term |
python NOT senior |
Exclude senior roles |
"software engineer" |
Exact phrase match |
title:python |
Search only in title |
company:google |
Search only in company |
Example: title:python AND remote NOT senior β Python remote roles, excluding senior positions.
We welcome contributions! Please follow standard GitHub fork and PR workflow.
# 1. Fork and clone
git clone https://github.com/your-username/internjobs.git
# 2. Create branch
git checkout -b feature/your-feature
# 3. Make changes and run the checks for the surfaces you touched
cd backend && uv run pytest tests
cd pipeline && uv run pytest tests
cd frontend && pnpm run lint && pnpm test
# 4. Commit and push
git commit -m "Add your feature"
git push origin feature/your-feature
# 5. Create Pull RequestMIT License - see LICENSE file
- SimplifyJobs for job data sources
- FastAPI for the excellent framework
- llama.cpp for local AI capabilities
Built for job seekers everywhere