Skip to content
View Ankit455's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Ankit455

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Ankit455/README.md

Hi 👋, I'm Ankit Dhandharia

Data Engineer · AI/ML Engineer · Software Engineer · Data Scientist

Typing SVG


🧑‍💻 About Me

AWS Certified Data Engineer with 1.5+ years of production experience building scalable pipelines, streaming architectures, and ML-ready data systems. I bridge the gap between raw data and real decisions — whether that's a Medallion Architecture warehouse in Snowflake, a RAG pipeline powered by LangChain, or a low-latency FastAPI service at scale.

  • 🎓 MSCS @ Stevens Institute of Technology, Hoboken NJ — GPA: 3.709 / 4.0 (Exp. May 2026)
  • ☁️ AWS Certified Data Engineer – Associate (Mar 2026)
  • 📍 Based in Union City, NJ | Open to full-time roles in Data / AI / Software Engineering
  • ⚡ Reduced pipeline latency by 35%, cut compute costs by 40%, processed 500k+ records/day
  • 🌐 View my interactive portfolio →

🛠 Tech Stack

Data Engineering

Apache Airflow Apache Spark dbt Apache Kafka Snowflake Databricks Hadoop

Cloud & DevOps

AWS Docker Kubernetes Terraform GitHub Actions

AI / ML

LangChain HuggingFace TensorFlow Scikit-learn ChromaDB

Languages & Backend

Python SQL Scala FastAPI JavaScript TypeScript


📊 Impact at a Glance

Metric Result
Daily records processed 500,000+ across 40+ clients
Pipeline latency reduction 35% via parallelism & scheduling
Compute cost savings 40% via Incremental Materialization
Manual effort eliminated 15+ hrs/month through Airflow automation
Legacy data migrated 200GB+ to modern cloud frameworks
SQL performance gain 25% through indexing & columnar storage

🏗 What I'm Currently Working On

[▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░]  Streaming pipeline with Kafka + Spark + dbt
[▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░]  RAG system with local LLMs and ChromaDB
[▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░]  AWS Certified Solutions Architect prep
[▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░]  MSCS coursework @ Stevens (GPA 3.709)

🚀 Featured Projects

🔁 End-to-End Streaming Data Pipeline

AWS S3 Kafka Spark Snowflake dbt Core Airflow Docker GitHub Actions

  • Medallion Architecture (Bronze/Silver/Gold) with Kafka + AWS Kinesis for real-time ingestion
  • Metadata-driven Gold layer with dbt Jinja macros; SCD Type 2 snapshots for point-in-time accuracy
  • Prometheus monitoring + dbt freshness checks; 40% compute cost reduction via Incremental Materialization
  • Containerized with Docker; automated deployments via GitHub Actions CI/CD

🤖 Scalable RAG Pipeline for Document Analysis

Python FastAPI LangChain ChromaDB HuggingFace SQLite

  • Zero-cost RAG backend using local HuggingFace embeddings (384-dim) — no cloud API dependency
  • Persistent ChromaDB + SQLite vector store; sub-second retrieval over 400+ page documents
  • Full OOP-driven modular design with REST API endpoints via FastAPI for production inference

💼 Experience

Data Engineer · Rajlaxmi Solutions Private Limited · May 2023 – June 2024

  • Production ELT pipelines processing 500k+ daily records in Python, Spark & Airflow
  • Snowflake warehouse design with Medallion Architecture & dimensional modeling
  • Schema validation, anomaly detection & SLA alerting — contributed to an Industry Excellence Award

Data Analyst Intern · Rajlaxmi Solutions Private Limited · Jan – May 2023

  • Migrated 200GB+ legacy databases to cloud; 25% SQL performance improvement
  • Automated Airflow reporting workflows — eliminated 15+ hrs/month of manual work

🏅 Certifications

  • 🏆 AWS Certified Data Engineer – Associate — Amazon Web Services, April 2026
  • 🏆 AWS Academy Data Engineering — Amazon Web Services, April 2026

📬 Get In Touch


Fun fact: I once eliminated 15 hours of manual work a month with a single Airflow DAG — and haven't looked back.

Pinned Loading

  1. AWS_DBT_Snowflake AWS_DBT_Snowflake Public

    End-to-end data engineering pipeline using dbt Core, Snowflake & AWS S3 — implements the Medallion Architecture (Bronze → Silver → Gold) with incremental loading, SCD Type 2 snapshots, custom Jinja…

    Python

  2. Mini-Search-Engine Mini-Search-Engine Public

    This project implements a simplified search engine based on Section 23.6 of the textbook, designed to index and search HTML documents from a small website. The implementation uses inverted indexing…

    HTML

  3. Polars-vs-Pandas Polars-vs-Pandas Public

    Jupyter Notebook

  4. RAG_Project RAG_Project Public

    A progressive RAG (Retrieval-Augmented Generation) project using LangChain, ChromaDB, HuggingFace Embeddings & OpenRouter — from plain text to PDF-based QA with persistent vector stores.

    Jupyter Notebook