Skip to content

asf0/internnexus

Repository files navigation

InternNexus

Python FastAPI Next.js PostgreSQL pnpm License: MIT

Production-ready internship aggregator with AI-powered job matching

Live Demo: jobfinder.asf0.dev

Try the live app to search internships, apply filters, and test resume-based job matching.

InternNexus aggregates internship opportunities from multiple job boards (Greenhouse, Lever, Ashby) and provides intelligent filtering, categorization, visa sponsorship search, and AI-powered resume matching.


✨ Features

  • 15,000+ Jobs from 145+ companies
  • Multi-Source Aggregation: Greenhouse, Lever, Workday, Ashby, SmartRecruiters
  • Hybrid Search: Keyword + semantic (vector) search combined for best results
  • Boolean Search: Advanced syntax (AND, OR, NOT, "exact", field:value)
  • AI-Powered Matching: Resume-to-job matching using local LLM embeddings
  • Smart Categorization: Automatic job categorization (Software Engineering, Data Science, PM, etc.)
  • Advanced Filtering: Category, location, visa sponsorship, FAANG+, work mode
  • Pipeline Resume: Interrupted runs can be resumed from last successful step
  • Production Ready: Rate limiting, JWT auth, OAuth, comprehensive testing

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  │────▢│    Backend   │────▢│  PostgreSQL β”‚
β”‚  (Next.js)  β”‚     β”‚   (FastAPI)  β”‚     β”‚  + pgvector β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚        External Services    β”‚
               β”‚  OpenAI-compatible embeddings β”‚
               β”‚  Greenhouse / Lever / Ashby β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack:

  • Frontend: Next.js 16, TypeScript, Tailwind CSS
  • Backend: FastAPI, SQLAlchemy 2.0, Pydantic
  • Database: PostgreSQL 18 + pgvector extension
  • Cache: In-memory TTL cache (optional external Redis)
  • AI: Ollama or an OpenAI-compatible API (embeddings)
  • Geo: pycountry (ISO country/state lookups)

πŸš€ Quick Start

Prerequisites

  • Docker and docker compose
  • pnpm (frontend package manager)
  • uv (Python package manager)
  • Python 3.12+
  • OpenAI-compatible API or Ollama (for embeddings)

1. Clone & Setup

git clone <repository-url>
cd internjobs
cp .env.example .env
# Edit .env with your settings

2. Start Infrastructure

docker compose up -d db

3. Install & Run Backend

cd backend
uv sync --group dev

# Run database migrations
uv run alembic -c alembic.ini upgrade head

# Start the backend server
uv run uvicorn app.main:app --reload

Note: pycountry will be installed automatically for location normalization.

4. Install & Run Frontend

cd frontend
pnpm install
pnpm dev

5. Ingest Jobs

cd pipeline
uv sync --group dev
uv run internnexus-pipeline --skip-discover

Done! Visit http://localhost:3000

Local Terminal Workflow

For day-to-day development, run only Postgres in Docker and run the app services in terminals. Set POSTGRES_HOST=localhost in the repo root .env when using the local Docker database.

docker compose up -d db

Terminal 1:

cd backend
uv run alembic -c alembic.ini upgrade head
uv run uvicorn app.main:app --reload

Terminal 2:

cd pipeline
uv run internnexus-pipeline --skip-discover

Terminal 3:

cd frontend
pnpm dev

Bootstrap Admin Access

After signing in once with Google, promote your user from the local database. Replace the email value with the Google email you used to sign in.

docker exec jobs-db sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "insert into admins (id, user_id, role, granted_by, notes) select gen_random_uuid(), id, '\''super_admin'\'', id, '\''local dev bootstrap'\'' from users where email = '\''you@example.com'\'' on conflict (user_id) do update set role = excluded.role, granted_by = excluded.granted_by, granted_at = now(), notes = excluded.notes;"'

πŸ“Š Data Pipeline

The ingestion system runs 7 sequential steps:

Step Action Description
1 Discover Verify companies have active job boards
2 Sync inactive Mark existing jobs inactive before refresh
3 Ingest Fetch jobs from APIs, deduplicate, and upsert
4 Delete inactive Remove jobs no longer present upstream
5 Cleanup Normalize location data (city/state/country)
6 Classify Categorize jobs with the configured model
7 Embed Generate vector embeddings for matching
# Run from pipeline/
cd pipeline
uv run internnexus-pipeline

# Skip discovery (faster, uses cached companies)
uv run internnexus-pipeline --skip-discover

# Run continuously (interval from config)
uv run internnexus-pipeline -c

# Run with custom interval
uv run internnexus-pipeline -c --interval 3600

# Single step execution
uv run internnexus-pipeline --step discover
uv run internnexus-pipeline --step ingest
uv run internnexus-pipeline --step cleanup
uv run internnexus-pipeline --step embed

# Utility commands
uv run internnexus-pipeline --dry-run    # Preview without changes
uv run internnexus-pipeline --resume     # Resume failed run
uv run internnexus-pipeline --check      # Health checks only
uv run internnexus-pipeline --fresh      # Clear incomplete runs

# Re-process ALL locations (careful!)
uv run internnexus-pipeline --step cleanup --all

πŸ“š Documentation

Documentation is still lightweight. For now, use README.md, backend/.env.example, and the code in backend/, frontend/, and pipeline/ as the primary reference.


πŸ§ͺ Testing

# Backend
cd backend && uv run pytest tests
cd backend && uv run pytest tests --cov=app

# Pipeline
cd pipeline && uv run pytest tests

# Frontend
cd frontend && pnpm run lint && pnpm test

πŸ”§ Configuration

Key environment variables:

# Database
POSTGRES_DB=internnexus
POSTGRES_USER=internnexus
POSTGRES_PASSWORD=secure_password

# Redis (optional; leave empty for in-memory cache)
REDIS_URL=

# Auth (min 32 characters)
AUTH_SECRET=your-super-secret-key-min-32-chars

# AI Provider
EMBEDDING_PROVIDER=ollama
OPENAI_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text

Use EMBEDDING_PROVIDER=openai-compatible with an OpenAI-compatible embeddings endpoint.

See .env.example for additional configuration options.


πŸ” Search Syntax

InternNexus supports advanced boolean search syntax:

Query Result
python Hybrid search (keyword + semantic)
python AND remote Both terms required
python OR java Either term
python NOT senior Exclude senior roles
"software engineer" Exact phrase match
title:python Search only in title
company:google Search only in company

Example: title:python AND remote NOT senior β†’ Python remote roles, excluding senior positions.


🀝 Contributing

We welcome contributions! Please follow standard GitHub fork and PR workflow.

Development Setup

# 1. Fork and clone
git clone https://github.com/your-username/internjobs.git

# 2. Create branch
git checkout -b feature/your-feature

# 3. Make changes and run the checks for the surfaces you touched
cd backend && uv run pytest tests
cd pipeline && uv run pytest tests
cd frontend && pnpm run lint && pnpm test

# 4. Commit and push
git commit -m "Add your feature"
git push origin feature/your-feature

# 5. Create Pull Request

πŸ“„ License

MIT License - see LICENSE file


πŸ™ Acknowledgments


Built for job seekers everywhere

About

Full-stack internship search platform with AI resume matching, semantic search, and multi-source job aggregation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors