Skip to content

Semantic search for fast.ai lectures.

Notifications You must be signed in to change notification settings

DanteOz/fastsearch

Repository files navigation

FastSearch

FastSearch is an end-to-end semantic search engine built to help tens of thousands of students search the popular fast.ai ~300-hour machine learning video corpus. Performs low-latency retrieval and ranking of lecture transcripts (ONNX), with bi- and cross-encoder models trained using cross-architecture knowledge distillation (PyTorch), on a custom dataset containing ~1,000 fast.ai questions and ~27,000 lecture segments. Backed by a data pipeline (Dagster) which scrapes and transcribes new video lectures (OpenAl Whisper) and incrementally updates an ANN search index (Qdrant). Tracks user queries and result feedback for model retraining. Deployed with fully custom CI/CD and MLOps (GitHub Actions) pipeline using IAC best practices (AWS CDK). MLOps pipeline launches backfill over the embedding/indexing pipeline and redeploys backend container with updated model weights upon push to model registry (Hugging Face).





Important

Since the write-up Fastsearch has migrated from Planetscale (MySQL) to Neon (Postgres).

About

Semantic search for fast.ai lectures.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published