CV ingestion and matching service built with Django, pgvector, Datapizza pipelines, and Celery
Scoring uses category weights: skill, experience, education.
The API requires all three and they must sum to 1.0 (example: 0.4 + 0.4 + 0.2).
Higher weight means stronger impact on the final ranking.
🚀 Quick Start • 🔌 API • 🏗️ Architecture • 🧪 Testing • 📝 Notes
Split job offer (skill, experience, education) -> parallel category retrieval for semantic search and full text search on metadata -> merge by document -> weighted scoring.
- Single upload: API -> serializer -> metadata extraction -> embedding -> vector store write
- Bulk upload: API creates batch/items -> Celery task per item -> status polling endpoint
- Python
3.13+ uv- Docker (recommended for PostgreSQL + Redis)
- Create and activate virtualenv.
uv venv
source .venv/bin/activate- Install dependencies.
uv sync- Create local env file from template.
cp .env.example .env- Configure environment variables in
.env.
OPENAI_API_KEY=your_key
EMBEDDING_MODEL_NAME=text-embedding-3-small- Start infrastructure.
docker compose up -d- Run migrations.
python manage.py migrate- Start Django server.
python manage.py runserver- Start Celery worker (new terminal).
celery -A src.config.celery worker -l infoRun tests:
pytestRun tests with coverage:
pytest --cov --cov-report=htmlruff format --config ./ruff.toml .Base prefix: /api/
- Method:
POST /api/cv-documents/ - Content-Type:
multipart/form-data - File field:
source_file
Example:
curl -X POST http://127.0.0.1:8000/api/cv-documents/ \
-F "source_file=@/absolute/path/cv.pdf"Responses:
201document created and ingested synchronously400validation error
- Method:
POST /api/cv-documents/bulk/ - Content-Type:
multipart/form-data - File field: repeated
files
Example:
curl -X POST http://127.0.0.1:8000/api/cv-documents/bulk/ \
-F "files=@/absolute/path/cv1.pdf" \
-F "files=@/absolute/path/cv2.pdf"Responses:
202returnsbatch_idandupload_item_idlist400invalid multipart payload
- Method:
GET /api/cv-documents/bulk/<batch_id>/status/
Response contains:
- batch status (
PENDING|RUNNING|SUCCESS|FAILED|PARTIAL) - counters (
total_files,processed_files,failed_files) - per-item status and
error_message
- Method:
POST /api/search-runs/ - Content-Type:
application/json
Example payload:
{
"job_offer_text": "Looking for backend engineer with Python and 5+ years",
"weights": {
"skill": 0.1,
"experience": 0.7,
"education": 0.2
},
"top_k": 10
}Responses:
200ranked candidate list400request validation error500pipeline/runtime error
- CV upload endpoints require multipart file upload; JSON file paths are not accepted.
- If Celery worker is not running, bulk upload items remain in
PENDING. - Vector metadata must be JSON-serializable; UUID handling is normalized in vector store code.

