Study Space is a personal academic workspace for turning your own material into searchable, interactive study sessions. Upload documents, organize them into folders, chat against indexed content with a transparent retrieval trace, generate saved study sets for later review, and analyze collections of exam papers with Topic Miner.
Study Space combines a FastAPI backend, a React frontend, MongoDB for structured state, and ChromaDB for semantic retrieval. The result is a user-scoped workspace where uploaded documents and saved practice material stay tied to the authenticated user, AI responses cite their evidence, and revision tools sit next to the source material instead of in a separate app.
- Upload and organize PDFs, DOCX, PPTX, XLSX, Markdown, HTML, images, audio, and more
- Chat with transparent RAG using visible queries, retrieval runs, fused evidence, and cited sources
- Generate and revisit study sets including flashcards, MCQ quizzes, written self-checks, and mixed practice
- Mine exam folders for recurring topics, patterns, and example questions
- Track academic context with extracted metadata, tags, notes, and calendar-friendly event views
- Stay accessible with voice input, high contrast mode, larger text, reduced motion, and stronger focus states
Study Space uses a retrieval-planned RAG pipeline rather than a single vector search or a fully autonomous agent loop.
- The user sends a question to
POST /chat. - The backend builds a compact catalog of the user's visible files and tags.
- Gemini (
gemini-3.1-flash-lite-preview) plans up to three retrieval steps. - Retrieval runs execute in ChromaDB using broad, focused, or full-document strategies.
- Results are fused with reciprocal-rank fusion and deduplicated.
- Gemini answers from the fused evidence set and returns a
tracepayload with sources. - The frontend renders the reasoning trail inline so the user can inspect how the answer was built.
That gives the app an agentic-style planning step while keeping execution constrained, inspectable, and grounded in the user's own material.
- Drag-and-drop uploads with background processing and job progress
- Folder organization and editable tags
- Inline access to owned study documents and exam papers
- Personal notes linked to the workspace
- Multi-query retrieval over user-scoped content
- Visible retrieval trace with generated queries and fused results
- Source-aware answers backed by chunk evidence and optional full-document fallback
- Search scope that stays limited to the authenticated user's data
- Auto-saved study sets from selected documents
- Flashcards, MCQ quizzes, written self-checks, and mixed practice modes
- Saved set library for reopening or deleting generated practice material
- Local-only written answer drafts for self-checking without storing attempts
- Metadata extraction for deadlines, events, and academic context
- Separate workflow for exam-paper folders
- Batch analysis across multiple PDFs
- Theme extraction, recurring topics, and synthesized study guidance
- Voice input support
- Higher contrast mode
- Larger text
- Reduced motion
- Stronger focus states and better keyboard support
| Layer | Implementation |
|---|---|
| Frontend | React 18 + Vite app under frontend/src/app, built into backend-served static assets |
| API assembly | FastAPI entrypoint in app/main.py with router registration from app/api/routers |
| Service wiring | Runtime dependencies and shared services in app/api/deps.py and app/services/ |
| Domain logic | Retrieval, ingestion, metadata extraction, study generation, and topic mining in app/core/ |
| Structured data | MongoDB access via app/db/mongo.py and app/db/repository.py |
| Vector retrieval | ChromaDB indexing and search via app/db/vector_store.py |
| Embeddings | all-MiniLM-L6-v2 via sentence-transformers |
| Primary LLM | Google Gemini via google-genai |
- Gemini
gemini-3.1-flash-lite-previewpowers chat, saved study set generation, metadata extraction, and Topic Miner flows. facebook/bart-large-mnliis used for document classification.all-MiniLM-L6-v2produces embeddings for semantic retrieval in ChromaDB.
- MongoDB records are scoped by authenticated user identity.
- Saved study sets are stored as user-owned MongoDB records and included in account export/deletion flows.
- Study documents live under
app/users/<username>/uploads/. - Processed markdown lives under
app/users/<username>/processed/. - Exam papers live under
app/users/<username>/exam_papers/. - ChromaDB is shared physically, but every indexed chunk stores
owner_username.
| Requirement | Notes |
|---|---|
| Python 3.12 | Matches the runtime image in Dockerfile |
| Node.js 20+ | Used for the Vite frontend build |
| MongoDB | Local instance or remote connection string |
GEMINI_API_KEY |
Required for chat and generation features |
| FFmpeg | Needed for local audio-file processing; already included in Docker |
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cd frontend
npm install
npm run build
cd ..Set the required environment variables in your shell or in an untracked local .env file:
export GEMINI_API_KEY="your_key_here"
export MONGODB_URI="mongodb://localhost:27017"
export MONGODB_DB_NAME="studyspace"Optional settings:
export SESSION_TTL_DAYS=7
export SESSION_COOKIE_SECURE=false
export MONGODB_APP_NAME=studyspace-api
export MONGODB_SERVER_SELECTION_TIMEOUT_MS=5000Run the app:
uvicorn app.main:app --reloadThen open http://127.0.0.1:8000, create an account, and start uploading study material.
The repository includes a multi-stage Docker build and a Docker Compose stack for local deployment.
cp .env.docker.example .envSet GEMINI_API_KEY in .env, then run:
docker compose up --buildThat starts:
apponhttp://127.0.0.1:8001by default (HOST_PORTincompose.yaml)mongoas the internal database service
Persistent data is stored in named volumes for:
- MongoDB data
- Chroma embeddings
- User uploads and processed files
The default image is CPU-only. It installs CPU PyTorch wheels and excludes the optional Docling ASR extras, so regular local runs do not pull CUDA or NVIDIA packages.
docker compose --profile gpu up --build app-gpu mongoUse this only if your host has NVIDIA Container Toolkit configured. The GPU profile builds a separate image that includes the optional GPU/ASR dependency set.
docker compose down
docker compose down -vRun the main test suite:
./.venv/bin/python -m pytest testsRun coverage:
./.venv/bin/python -m coverage run --source=app -m pytest tests
./.venv/bin/python -m coverage report -mMongoDB integration tests require MONGODB_TEST_URI:
MONGODB_TEST_URI="mongodb://localhost:27017" ./.venv/bin/python -m pytest tests/test_mongo_db.pyRun the frontend Playwright E2E suite:
cd frontend
npm run test:e2eThe Playwright suite starts a local Vite server and exercises mocked browser flows under frontend/e2e/, so it does not require the full backend stack for the covered UI journeys.
Available E2E scripts:
cd frontend
npm run test:e2e
npm run test:e2e:headedFor a fuller breakdown of the test suite, see README_TESTS.md.
Topic Miner is a separate exam-analysis workspace. Its flow is:
- Create an exam folder.
- Upload exam PDFs into that folder.
- Run folder-level analysis.
- Extract topic structure from each paper.
- Synthesize recurring themes and example questions across the folder.
- Reopen saved analyses later; they are marked stale when folder contents change.
app/
main.py FastAPI entry point and application assembly
auth.py Session auth and password hashing
config.py Environment-driven app configuration
api/
deps.py Shared dependency wiring and runtime context
schemas.py Request/response models
routers/ Auth, chat, documents, exams, study, uploads, UI, account
core/
ingestion.py Document processing and extraction
rag.py Retrieval-planned RAG orchestration
metadata_extractor.py Academic metadata extraction
study_set_generator.py Saved flashcard, MCQ, written, and mixed practice generation
topic_miner.py Exam-paper analysis
workspace_catalog.py Workspace inventory used by retrieval planning
services/
jobs.py Background upload and topic-mining job management
ownership.py User-scoped access validation helpers
storage.py User file storage paths and persistence helpers
db/
repository.py Database interface used across the app
mongo.py MongoDB integration
metadata.py Metadata persistence helpers
vector_store.py ChromaDB indexing and search
frontend/
src/
app/ Modular React app
components/ Chat, layout, modal, and accessibility UI pieces
hooks/ Shared React hooks
screens/ Standalone screens
sections/ Workspace and studio section composition
e2e/ Playwright end-to-end suite with mocked API flows
requirements/
base.txt Shared Python dependencies
cpu.txt CPU runtime dependencies
gpu.txt Optional GPU runtime dependencies
assets/
studyspace_banner.png README hero banner
preview.png Workspace preview image
tests/ Backend pytest suite
conftest.py Shared fixtures and test helpers
compose.yaml Local Mongo + app compose stack
Dockerfile Multi-stage frontend/backend image build
POST /auth/signupPOST /auth/signinPOST /auth/logoutGET /auth/me
POST /uploadGET /upload-jobsPOST /chatGET /documentsGET /foldersGET /tagsGET /notesPOST /study-sets/generateGET /study-setsGET /study-sets/{study_set_id}DELETE /study-sets/{study_set_id}POST /quiz/generatePOST /flashcards/generateGET /metadata
The /study-sets/* endpoints are the current frontend path for generated revision material. The older /quiz/generate and /flashcards/generate endpoints remain available for compatibility.
GET /exam-foldersPOST /exam-foldersPOST /exam-folders/{folder_id}/analyzeGET /exam-folders/{folder_id}/analysisGET /exam-papersPOST /exam-papers/upload
Legacy db.json is not used at runtime, but you can still import old data into MongoDB:
python scripts/migrate_json_to_mongo.py \
--json-path db.json \
--mongo-uri "$MONGODB_URI" \
--db-name "$MONGODB_DB_NAME"Preview counts without writing:
python scripts/migrate_json_to_mongo.py \
--json-path db.json \
--mongo-uri "$MONGODB_URI" \
--db-name "$MONGODB_DB_NAME" \
--dry-run
