Skip to content
View Sidthak's full-sized avatar
  • DePaul University
  • Chicago
  • 01:55 (UTC -05:00)

Block or report Sidthak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sidthak/README.md

Hi 👋, I'm Sidhant Thakur

AI/ML Engineer  |  LangGraph · RAG · Agentic AI · LLMs  |  Merck  |  DePaul University

   

   


👨‍💻 About Me

AI/ML Engineer and Data Scientist with 4+ years building and shipping production ML and GenAI systems across pharma, healthcare, and enterprise. I build production-grade agentic AI systems and LLM pipelines — not just experiments.

  • 🏢 Currently: AI/ML Engineer @ Merck — production RAG pipelines, LLM summarization, embedding search, and anomaly detection
  • 🎓 M.S. Data Science — DePaul University
  • 🔨 Built: Agentic RAG agent with LangGraph + CRAG + HITL, hybrid RAG system with CI evaluation pipeline
  • 🎯 Target roles: AI Engineer · GenAI Engineer · ML Engineer · LLM Engineer

🎯 Role Match

Role What they want What I have
AI Engineer RAG, LLM APIs, vector DBs, production pipelines ✅ Production RAG + monitoring + LangGraph agent shipped
GenAI Engineer LangChain, LangGraph, agents, CRAG, HITL ✅ 6-node LangGraph agent with CRAG + real HITL + LangSmith tracing
ML Engineer PyTorch, MLOps, cloud deployment, pipelines ✅ 5+ years PyTorch/TF, AWS/Azure/GCP, MLflow, Kafka, Spark

🚀 Featured Projects

1. AdaptiveRAG — Agentic RAG System with LangGraph, CRAG, and HITL

A LangGraph agent that routes queries to web, documents, or both — grades results before answering — and pauses for human input when unsure

  • 🔀 Query routing — LLM classifies each query to web search, document retrieval, or hybrid
  • CRAG — 100% block rate on low-confidence retrievals before hitting the LLM
  • ⏸️ Real HITL — LangGraph interrupt_before pauses graph mid-execution for human clarification
  • 🔭 LangSmith tracing — all 6 nodes traced end-to-end for 100% of executions

2. StudyRAG — Production RAG System with Hybrid Retrieval and Observability

Ask questions over your own documents — hybrid AI search with full observability and CI-gated evaluation

  • Hybrid retrieval — BM25 + vector search via Reciprocal Rank Fusion across 107 document chunks
  • 🎯 Cross-encoder rerankingms-marco-MiniLM-L-6-v2 with citation enforcement (100% refusal rate on unsupported queries)
  • 🔭 LangSmith tracing + SQLite metrics store + live Streamlit dashboard
  • 🧪 Ragas CI gate — 50-item golden dataset, blocks deployments on metric regression


Area Technologies
🔍 RAG & Retrieval RAG · Hybrid Search · BM25 · Cross-Encoder Reranking · FAISS · Pinecone · ChromaDB · CRAG · Self-RAG
🤖 Agents & Graphs LangGraph · Query Routing · HITL · Conditional Edges · MemorySaver · Subgraphs · MCP · Tavily
🔭 Observability LangSmith · Tracing · Latency Metrics · Query Logging · Ragas · SQLite · Streamlit Dashboards
💬 LLM Frameworks LangChain · OpenAI GPT-4o · Google Gemini · Claude · Prompt Engineering · CoT · Few-Shot
🎯 Fine-Tuning LoRA · QLoRA · Hugging Face Transformers · Parameter-Efficient Fine-Tuning
📊 ML & Data Science PyTorch · TensorFlow · Scikit-learn · XGBoost · Random Forests · SVM · CNN · NER · SHAP · Time-Series

🛠️ Languages and Tools

AI / GenAI / LLM

Languages

Machine Learning

Cloud & MLOps

Visualization & Monitoring


💼 Experience Highlights

Company Role AI Impact
Merck · 2025–Present AI/ML Engineer Production RAG ↓45% evidence-gathering · LLM summarization ↓45% review time · Embedding search ↓40% lookup time
Blue Cross Blue Shield · 2024–2025 Data Scientist LLM Q&A ↓35% SQL requests · Embedding search ↓50% lookup time · LLM summarization ↓60% review time
Dell Technologies · 2019–2021 Data Scientist Predictive failure models across 100+ product lines · Random Forest segmentation · Pricing analysis on 3M+ transactions


📫 Connect with me

sidhant-thakur-67b2a2169   sidhantthakur222@gmail.com

Popular repositories Loading

  1. Rag Rag Public

    Production-grade RAG system for study notes

    Python 1

  2. adaptiverag adaptiverag Public

    Python 1

  3. Black-friday-Sales-Predection Black-friday-Sales-Predection Public

    HTML

  4. Advance-Machine-Learning Advance-Machine-Learning Public

    Jupyter Notebook

  5. Customer-Analysis- Customer-Analysis- Public

    The purpose of this report is to visualize the sales dataset so that customer analysis can be performed on it. This report will use appropriate charts to represent revenue based on a variety of var…

  6. Arctic-Ice-Analyzer-Sea-Ice-Time-Series-Data Arctic-Ice-Analyzer-Sea-Ice-Time-Series-Data Public

    🌊 Exploring sea ice trends to address rising sea levels caused by the greenhouse effect. Join us in understanding this critical issue. 🌍 #ClimateData 📈

    R