Turn any meeting, lecture, or tutorial into a searchable, multimodal knowledge base—with deep AI insight into both speech and visuals.
- Upload or connect remote meeting/lecture recordings
- AI-generated summaries for every segment
- Visual timeline with slide/diagram detection
- “Ask the Meeting”: Natural language Q&A (audio + visual context)
- Tag and filter by topics, speakers, slides, or screen shares
- Export highlights and action items
- Secure, scalable, and privacy-first
- Clone the repo
- Install backend and frontend dependencies
- Set up HuggingFace keys and auth config
- Run backend and frontend locally (Docker-compose or Railway)
- Connect your Zoom/Meet/YouTube account or upload a file
- Open
localhost:3000to use RecallLens!
| Section/Component | Purpose / Role | Technology / Examples |
|---|---|---|
| User / Client | Initiates uploads, connects video sources, interacts with results | Web browser, mobile app |
| Frontend | User interface for uploads, search, results display | React, Tailwind CSS |
| API Gateway | Handles routing, integrates auth, connects frontend & backend | API layer, handles Auth0/Firebase |
| Auth | Authenticates users, manages sessions | Auth0, Firebase |
| Integration APIs | Connects to Zoom, YouTube, LMS for source videos | Zoom API, YouTube API, LMS API |
| Backend | Orchestrates all processing, manages data flow | FastAPI (Python) |
| Task Queue | Runs async, distributed jobs for processing | Celery |
| Video Processor | Extracts frames, audio, visuals from video | FFmpeg, OpenCV |
| ML Processing | Runs AI/ML models: transcription, summarization, QA, tagging | HuggingFace Transformers |
| Frame/Image Extraction | Analyzes visuals (slides, diagrams, shared docs) | OpenCV, ML models |
| Storage | Stores all files, metadata, and search indexes | S3 (files), PostgreSQL (metadata), |
| Elasticsearch (semantic search) | ||
| Search / Retrieval | Provides fast, structured search & retrieval | Elasticsearch, API queries |
- Frontend: React (Next.js) + TailwindCSS
- Backend: FastAPI + Celery + PostgreSQL + S3 + Elasticsearch
- AI Pipelines:
- Speech-to-text
- Video/image-to-text
- Visual Q&A
- Video summarization/classification
- Integrations: Zoom, YouTube, LMS APIs
- Auth: Auth0/Firebase
| Task | Example Model(s) | Purpose |
|---|---|---|
| Speech-to-Text | whisper-large | Transcribe meetings/lectures |
| Video-Text-to-Text | blip2, flamingo | Summarize video segments |
| Visual QA | OFA, blip2 | Answer “what’s on screen” queries |
| Image-to-Text | Donut, pix2struct | Describe slides, diagrams |
| Doc QA | layoutLMv3 | Extract from shared documents |
| Video Classification | timesformer, mvit | Tag “presentation”, “Q&A”, “discussion”, etc. |
- Save hours by jumping to the exact moment you need.
- Don’t miss key visuals: Find slides, diagrams, or shared docs instantly.
- Better remote learning: Retrieve context-rich moments from any lecture or training.
| Layer | Technology |
|---|---|
| Frontend | React (Next.js), TailwindCSS |
| Backend | FastAPI, Celery, PostgreSQL |
| ML | HuggingFace Transformers/Inference API |
| Video | FFmpeg, OpenCV |
| Storage | S3 |
| Search | Elasticsearch |
| Auth | Auth0/Firebase |
| Deploy | Docker, K8s, AWS/Railway |
| Integrate | Zoom/Meet/YouTube APIs |
- Multimodal AI integration (audio, video, images)
- Scalable cloud-native architecture
- Asynchronous, resilient processing
- Real UX for real pain points (remote work, e-learning)
- SaaS/product potential (Notion, Slack, enterprise integrations)
- Use provided scripts for local demo with sample videos (public datasets listed below)
- Configure HuggingFace and API keys in
.env.example - (See
CONTRIBUTING.mdfor dev setup and architecture details)
(Coming soon: GIFs and walkthroughs!)
MIT License.
Questions? PRs welcome!