Skip to content

DavideCamp/match_cv

Repository files navigation

match-cv

CV ingestion and matching service built with Django, pgvector, Datapizza pipelines, and Celery

Scoring uses category weights: skill, experience, education. The API requires all three and they must sum to 1.0 (example: 0.4 + 0.4 + 0.2). Higher weight means stronger impact on the final ranking.

Python 3.13+ Django Celery Ruff

🚀 Quick Start🔌 API🏗️ Architecture🧪 Testing📝 Notes


🏗️ Architecture Overview

Search flow

Split job offer (skill, experience, education) -> parallel category retrieval for semantic search and full text search on metadata -> merge by document -> weighted scoring.

Search Pipeline

Upload flow

  • Single upload: API -> serializer -> metadata extraction -> embedding -> vector store write
  • Bulk upload: API creates batch/items -> Celery task per item -> status polling endpoint

Upload Flow

⚙️ Requirements

  • Python 3.13+
  • uv
  • Docker (recommended for PostgreSQL + Redis)

🚀 Quick Start (Local)

  1. Create and activate virtualenv.
uv venv
source .venv/bin/activate
  1. Install dependencies.
uv sync
  1. Create local env file from template.
cp .env.example .env
  1. Configure environment variables in .env.
OPENAI_API_KEY=your_key
EMBEDDING_MODEL_NAME=text-embedding-3-small
  1. Start infrastructure.
docker compose up -d
  1. Run migrations.
python manage.py migrate
  1. Start Django server.
python manage.py runserver
  1. Start Celery worker (new terminal).
celery -A src.config.celery worker -l info

🧪 Testing

Run tests:

pytest

Run tests with coverage:

pytest --cov --cov-report=html

🎨 Formatting

ruff format --config ./ruff.toml .

🔌 API Endpoints

Base prefix: /api/

1. Upload single CV

  • Method: POST /api/cv-documents/
  • Content-Type: multipart/form-data
  • File field: source_file

Example:

curl -X POST http://127.0.0.1:8000/api/cv-documents/ \
  -F "source_file=@/absolute/path/cv.pdf"

Responses:

  • 201 document created and ingested synchronously
  • 400 validation error

2. Bulk upload CVs (async)

  • Method: POST /api/cv-documents/bulk/
  • Content-Type: multipart/form-data
  • File field: repeated files

Example:

curl -X POST http://127.0.0.1:8000/api/cv-documents/bulk/ \
  -F "files=@/absolute/path/cv1.pdf" \
  -F "files=@/absolute/path/cv2.pdf"

Responses:

  • 202 returns batch_id and upload_item_id list
  • 400 invalid multipart payload

3. Bulk upload batch status

  • Method: GET /api/cv-documents/bulk/<batch_id>/status/

Response contains:

  • batch status (PENDING|RUNNING|SUCCESS|FAILED|PARTIAL)
  • counters (total_files, processed_files, failed_files)
  • per-item status and error_message

4. Run matching pipeline

  • Method: POST /api/search-runs/
  • Content-Type: application/json

Example payload:

{
  "job_offer_text": "Looking for backend engineer with Python and 5+ years",
  "weights": {
    "skill": 0.1,
    "experience": 0.7,
    "education": 0.2
  },
  "top_k": 10
}

Responses:

  • 200 ranked candidate list
  • 400 request validation error
  • 500 pipeline/runtime error

📝 Notes

  • CV upload endpoints require multipart file upload; JSON file paths are not accepted.
  • If Celery worker is not running, bulk upload items remain in PENDING.
  • Vector metadata must be JSON-serializable; UUID handling is normalized in vector store code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages