A robust FastAPI-based backend service for candidate management, CV parsing, and workflow automation.
- 🚀 High-performance FastAPI backend
- 🤖 AI-powered CV parsing for PDF and Docx (document loaders & Unstructured OCR)
- 🔒 Rate limiting and security middleware
- 📊 PostgreSQL database with SQLAlchemy ORM (Async)
- 🔄 Asynchronous request handling
- 📝 Structured logging system with correlation IDs
- 🌐 CORS support
- 🔍 Request ID tracking
- 🐳 Docker support
- 📄 Pydantic schema validation
- 📚 Database Migrations using Alembic
- 📈 Pinecone vector database integration
- 📦 Poetry dependency management
- 🔑 Redis for rate limiting
- 🤖 LangChain & LangGraph integration for AI workflows
- 📄 Document processing capabilities
- 🔐 AWS S3 integration for file storage (Not enabled)
- 📝 Streaming and None-Streaming chat endpoints
api/
├── agent/ # AI agent implementation
│ ├── workflow.py # Candidate processing workflow
│ ├── tools.py # Agent tools
│ └── prompts.py # Agent prompts
├── api/ # API routes
│ ├── v1/ # API version 1 endpoints
│ ├── deps.py # Dependencies
│ └── router.py # Main router
├── core/ # Core configuration
│ └── config.py # Settings management
├── crud/ # Database operations
│ ├── candidates.py # Candidate CRUD operations
│ └── sections.py # Section CRUD operations
├── models/ # SQLAlchemy models
│ ├── candidates.py
│ ├── education.py
│ ├── experience.py
│ ├── projects.py
│ └── skills.py
├── schema/ # Pydantic schemas
│ ├── agent.py
│ ├── candidates.py
│ ├── education.py
│ └── responses.py
├── services/ # Business logic
│ └── documents.py # Document processing
└── utils/ # Utilities
├── helpers.py
├── logger.py
├── s3_client.py
└── middlewares/
- Python 3.12+
- Docker and Docker Compose (optional)
- Redis
- PostgreSQL
- OpenAI API key
- Pinecone API key (https://www.pinecone.io/)
- Unstructured API key (https://docs.unstructured.io/api-reference/api-services/free-api)
- Clone the repository:
git clone https://github.com/andrew-sameh/agentic-cv-parser.git
cd agentic-cv-parser- Install dependencies:
poetry install- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration- Run the application:
poetry run python main.py - Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration- Run the application:
python main.py1.Clone the repository
2.Set up environment variables:
cp .env.example .env
# Edit .env with your configuration3.Start the application:
docker-compose -p cv-parser up -d
# or
docker compose -p cv-parser up -dThe following environment variables need to be configured in your .env file:
ENV: Environment (dev/prod)PROJECT_NAME: Project nameVERSION: API versionDESCRIPTION: API descriptionLOG_LEVEL: Logging levelLOG_JSON_ENABLE: Enable JSON loggingBACKEND_CORS_ORIGINS: Allowed CORS origins
DATABASE_USER: PostgreSQL usernameDATABASE_PASSWORD: PostgreSQL passwordDATABASE_NAME: Database nameDATABASE_HOSTNAME: Database hostDATABASE_PORT: Database port
REDIS_HOST: Redis hostREDIS_PORT: Redis portREDIS_DB: Redis database number
AWS_S3_BUCKET_NAME: S3 bucket nameAWS_S3_ACCESS_KEY_ID: AWS access keyAWS_S3_SECRET_ACCESS_KEY: AWS secret keyAWS_S3_REGION_NAME: AWS regionAWS_S3_BASE_FOLDER: Base folder in S3
OPENAI_API_KEY: OpenAI API keyLLM_MODEL: Language model to useEMBEDDING_MODEL: Embedding modelUNSTRUCTURED_API_KEY: Unstructured API key
PINECONE_API_KEY: Pinecone API keyPINECONE_INDEX_NAME: Pinecone index nameEMBEDDING_SEARCH_TYPE: Search typeEMBEDDING_SCORE_THRESHOLD: Similarity thresholdEMBEDDING_TOPK: Top K results
1.Start PostgreSQL and Redis using Docker:
docker-compose up -d db redis2.Initialize the database:
alembic upgrade headOnce the application is running, visit:
- Swagger UI:
http://localhost:8000/ - ReDoc:
http://localhost:8000/redoc
The agent system (agent/) handles chat functionalities.
API endpoints are organized in the api/ directory with versioning support.
SQLAlchemy models in models/ define the database schema for:
- Candidates
- Education
- Experience
- Projects
- Skills
- Certifications
- Document processing (
services/documents.py) handles document parsing and extraction.
- S3 integration for file storage (not used)
- Structured logging
- Rate limiting
- Request correlation
- Error handling
- Redis integration
In progress
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request