FastSearch

FastSearch is an end-to-end semantic search engine built to help tens of thousands of students search the popular fast.ai ~300-hour machine learning video corpus. Performs low-latency retrieval and ranking of lecture transcripts (ONNX), with bi- and cross-encoder models trained using cross-architecture knowledge distillation (PyTorch), on a custom dataset containing ~1,000 fast.ai questions and ~27,000 lecture segments. Backed by a data pipeline (Dagster) which scrapes and transcribes new video lectures (OpenAl Whisper) and incrementally updates an ANN search index (Qdrant). Tracks user queries and result feedback for model retraining. Deployed with fully custom CI/CD and MLOps (GitHub Actions) pipeline using IAC best practices (AWS CDK). MLOps pipeline launches backfill over the embedding/indexing pipeline and redeploys backend container with updated model weights upon push to model registry (Hugging Face).

Important

Since the write-up Fastsearch has migrated from Planetscale (MySQL) to Neon (Postgres).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
assets		assets
backend		backend
frontend		frontend
infra		infra
pipeline		pipeline
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
cdk.json		cdk.json
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastSearch

About

Releases

Packages

Languages

DanteOz/fastsearch

Folders and files

Latest commit

History

Repository files navigation

FastSearch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages