AI & Data Engineer Β· I build systems that make machines understand documents β and data teams understand their business.
I sit at the intersection of AI engineering and data infrastructure β deploying LLM-powered retrieval systems, architecting modern data stacks, and translating raw transactional data into executive-ready insight.
- π€ RAG & LLM Systems β Production Persian chatbot: +38% retrieval precision, ~60% hallucination reduction (FastAPI Β· Qdrant Β· OpenAI API)
- ποΈ Data Engineering β End-to-end ELT pipelines with orchestration, transformation, and quality contracts (Airflow Β· dbt Β· PostgreSQL Β· Docker)
- π Analytics & ML β Churn prediction, RFM segmentation, CLV modelling β Power BI dashboards for stakeholders
Languages & Querying
Data Engineering
AI Β· LLM Β· NLP
Analytics & ML
Context-aware document retrieval system deployed for internal operations and external end-users.
Built a full-stack RAG pipeline over a mixed Persian/English document corpus. Implemented a retrieval evaluation framework (Precision@K, LLM-as-judge faithfulness scoring) across 1,000+ documents and 150 domain-specific queries.
Results: Retrieval precision 61% β 84% (+38%) Β· Hallucination rate reduced ~60%
FastAPI Qdrant OpenAI API OCR Docker Python Sentence Transformers Multilingual Embeddings
End-to-end streaming architecture for real-time financial transaction monitoring and fraud alerting.
Built a decoupled Kafka β Spark β PostgreSQL pipeline capable of handling high-throughput transaction streams. Implemented a tier-one deterministic rule engine with micro-batch processing, secure credential injection via environment variables, and a JDBC sink writing flagged records to an auditable fraud_alerts table. Containerized full infrastructure with single-command spin-up via Docker Compose.
Apache Spark Apache Kafka Confluent Platform PostgreSQL PySpark Docker Docker Compose Python
ποΈ Olist Modern Data Stack Pipeline
Production-grade ELT pipeline with automated orchestration and data quality contracts.
Architected a full modern data stack on the Olist e-commerce dataset. Modular dbt project with staging + mart layers, schema tests (not_null, unique, accepted_values), and source freshness checks. Single-command environment spin-up via Docker Compose.
Apache Airflow dbt PostgreSQL Docker Git
End-to-end customer analytics on 100K+ real orders across 9 relational tables (2016β2018).
Built a Python ETL pipeline into PostgreSQL, RFM segmentation framework (Champions / At-Risk / Lost cohorts), Logistic Regression churn classifier (Precision: 0.80, Recall: 1.00), and CLV scoring β all surfaced through four cross-filtered Power BI dashboards.
Python PostgreSQL scikit-learn SQLAlchemy Power BI DAX
Published open-source dataset capturing demographics, purchasing patterns, device usage, and satisfaction metrics β designed for ML, analytics, and educational use.
Python Pandas Open Source
- π M.Sc. in IT Management (E-Business) Β· Shahid Beheshti University, Tehran
- πΌ AI Developer @ Barmana β building LLM retrieval systems in production
- π Open to Data Engineering, AI/ML Engineering, and Analytics Engineering opportunities