Sidhant thakur Sidthak

Hi 👋, I'm Sidhant Thakur

AI/ML Engineer | LangGraph · RAG · Agentic AI · LLMs | Merck | DePaul University

👨‍💻 About Me

AI/ML Engineer and Data Scientist with 4+ years building and shipping production ML and GenAI systems across pharma, healthcare, and enterprise. I build production-grade agentic AI systems and LLM pipelines — not just experiments.

🏢 Currently: AI/ML Engineer @ Merck — production RAG pipelines, LLM summarization, embedding search, and anomaly detection
🎓 M.S. Data Science — DePaul University
🔨 Built: Agentic RAG agent with LangGraph + CRAG + HITL, hybrid RAG system with CI evaluation pipeline
🎯 Target roles: AI Engineer · GenAI Engineer · ML Engineer · LLM Engineer

🎯 Role Match

Role	What they want	What I have
AI Engineer	RAG, LLM APIs, vector DBs, production pipelines	✅ Production RAG + monitoring + LangGraph agent shipped
GenAI Engineer	LangChain, LangGraph, agents, CRAG, HITL	✅ 6-node LangGraph agent with CRAG + real HITL + LangSmith tracing
ML Engineer	PyTorch, MLOps, cloud deployment, pipelines	✅ 5+ years PyTorch/TF, AWS/Azure/GCP, MLflow, Kafka, Spark

🚀 Featured Projects

1. AdaptiveRAG — Agentic RAG System with LangGraph, CRAG, and HITL

A LangGraph agent that routes queries to web, documents, or both — grades results before answering — and pauses for human input when unsure

🔀 Query routing — LLM classifies each query to web search, document retrieval, or hybrid
✅ CRAG — 100% block rate on low-confidence retrievals before hitting the LLM
⏸️ Real HITL — LangGraph interrupt_before pauses graph mid-execution for human clarification
🔭 LangSmith tracing — all 6 nodes traced end-to-end for 100% of executions

2. StudyRAG — Production RAG System with Hybrid Retrieval and Observability

Ask questions over your own documents — hybrid AI search with full observability and CI-gated evaluation

⚡ Hybrid retrieval — BM25 + vector search via Reciprocal Rank Fusion across 107 document chunks
🎯 Cross-encoder reranking — ms-marco-MiniLM-L-6-v2 with citation enforcement (100% refusal rate on unsupported queries)
🔭 LangSmith tracing + SQLite metrics store + live Streamlit dashboard
🧪 Ragas CI gate — 50-item golden dataset, blocks deployments on metric regression

Area	Technologies
🔍 RAG & Retrieval	RAG · Hybrid Search · BM25 · Cross-Encoder Reranking · FAISS · Pinecone · ChromaDB · CRAG · Self-RAG
🤖 Agents & Graphs	LangGraph · Query Routing · HITL · Conditional Edges · MemorySaver · Subgraphs · MCP · Tavily
🔭 Observability	LangSmith · Tracing · Latency Metrics · Query Logging · Ragas · SQLite · Streamlit Dashboards
💬 LLM Frameworks	LangChain · OpenAI GPT-4o · Google Gemini · Claude · Prompt Engineering · CoT · Few-Shot
🎯 Fine-Tuning	LoRA · QLoRA · Hugging Face Transformers · Parameter-Efficient Fine-Tuning
📊 ML & Data Science	PyTorch · TensorFlow · Scikit-learn · XGBoost · Random Forests · SVM · CNN · NER · SHAP · Time-Series

🛠️ Languages and Tools

AI / GenAI / LLM

Languages

Machine Learning

Cloud & MLOps

Visualization & Monitoring

💼 Experience Highlights

Company	Role	AI Impact
Merck · 2025–Present	AI/ML Engineer	Production RAG ↓45% evidence-gathering · LLM summarization ↓45% review time · Embedding search ↓40% lookup time
Blue Cross Blue Shield · 2024–2025	Data Scientist	LLM Q&A ↓35% SQL requests · Embedding search ↓50% lookup time · LLM summarization ↓60% review time
Dell Technologies · 2019–2021	Data Scientist	Predictive failure models across 100+ product lines · Random Forest segmentation · Pricing analysis on 3M+ transactions

📫 Connect with me

sidhantthakur222@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly