Study Space

Study Space is a personal academic workspace for turning your own material into searchable, interactive study sessions. Upload documents, organize them into folders, chat against indexed content with a transparent retrieval trace, generate saved study sets for later review, and analyze collections of exam papers with Topic Miner.

Overview

Study Space combines a FastAPI backend, a React frontend, MongoDB for structured state, and ChromaDB for semantic retrieval. The result is a user-scoped workspace where uploaded documents and saved practice material stay tied to the authenticated user, AI responses cite their evidence, and revision tools sit next to the source material instead of in a separate app.

What it does well

Upload and organize PDFs, DOCX, PPTX, XLSX, Markdown, HTML, images, audio, and more
Chat with transparent RAG using visible queries, retrieval runs, fused evidence, and cited sources
Generate and revisit study sets including flashcards, MCQ quizzes, written self-checks, and mixed practice
Mine exam folders for recurring topics, patterns, and example questions
Track academic context with extracted metadata, tags, notes, and calendar-friendly event views
Stay accessible with voice input, high contrast mode, larger text, reduced motion, and stronger focus states

Workspace Preview

Agentic RAG, Without the Black Box

Study Space uses a retrieval-planned RAG pipeline rather than a single vector search or a fully autonomous agent loop.

The user sends a question to POST /chat.
The backend builds a compact catalog of the user's visible files and tags.
Gemini (gemini-3.1-flash-lite-preview) plans up to three retrieval steps.
Retrieval runs execute in ChromaDB using broad, focused, or full-document strategies.
Results are fused with reciprocal-rank fusion and deduplicated.
Gemini answers from the fused evidence set and returns a trace payload with sources.
The frontend renders the reasoning trail inline so the user can inspect how the answer was built.

That gives the app an agentic-style planning step while keeping execution constrained, inspectable, and grounded in the user's own material.

Feature Highlights

Document Workspace

Drag-and-drop uploads with background processing and job progress
Folder organization and editable tags
Inline access to owned study documents and exam papers
Personal notes linked to the workspace

Transparent Study Chat

Multi-query retrieval over user-scoped content
Visible retrieval trace with generated queries and fused results
Source-aware answers backed by chunk evidence and optional full-document fallback
Search scope that stays limited to the authenticated user's data

Revision Tools

Auto-saved study sets from selected documents
Flashcards, MCQ quizzes, written self-checks, and mixed practice modes
Saved set library for reopening or deleting generated practice material
Local-only written answer drafts for self-checking without storing attempts
Metadata extraction for deadlines, events, and academic context

Topic Miner

Separate workflow for exam-paper folders
Batch analysis across multiple PDFs
Theme extraction, recurring topics, and synthesized study guidance

Accessibility

Voice input support
Higher contrast mode
Larger text
Reduced motion
Stronger focus states and better keyboard support

Architecture

Layer	Implementation
Frontend	React 18 + Vite app under `frontend/src/app`, built into backend-served static assets
API assembly	FastAPI entrypoint in `app/main.py` with router registration from `app/api/routers`
Service wiring	Runtime dependencies and shared services in `app/api/deps.py` and `app/services/`
Domain logic	Retrieval, ingestion, metadata extraction, study generation, and topic mining in `app/core/`
Structured data	MongoDB access via `app/db/mongo.py` and `app/db/repository.py`
Vector retrieval	ChromaDB indexing and search via `app/db/vector_store.py`
Embeddings	`all-MiniLM-L6-v2` via `sentence-transformers`
Primary LLM	Google Gemini via `google-genai`

Model stack

Gemini gemini-3.1-flash-lite-preview powers chat, saved study set generation, metadata extraction, and Topic Miner flows.
facebook/bart-large-mnli is used for document classification.
all-MiniLM-L6-v2 produces embeddings for semantic retrieval in ChromaDB.

User isolation

MongoDB records are scoped by authenticated user identity.
Saved study sets are stored as user-owned MongoDB records and included in account export/deletion flows.
Study documents live under app/users/<username>/uploads/.
Processed markdown lives under app/users/<username>/processed/.
Exam papers live under app/users/<username>/exam_papers/.
ChromaDB is shared physically, but every indexed chunk stores owner_username.

Quick Start

Prerequisites

Requirement	Notes
Python 3.12	Matches the runtime image in `Dockerfile`
Node.js 20+	Used for the Vite frontend build
MongoDB	Local instance or remote connection string
`GEMINI_API_KEY`	Required for chat and generation features
FFmpeg	Needed for local audio-file processing; already included in Docker

Local setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cd frontend
npm install
npm run build
cd ..

Set the required environment variables in your shell or in an untracked local .env file:

export GEMINI_API_KEY="your_key_here"
export MONGODB_URI="mongodb://localhost:27017"
export MONGODB_DB_NAME="studyspace"

Optional settings:

export SESSION_TTL_DAYS=7
export SESSION_COOKIE_SECURE=false
export MONGODB_APP_NAME=studyspace-api
export MONGODB_SERVER_SELECTION_TIMEOUT_MS=5000

Run the app:

uvicorn app.main:app --reload

Then open http://127.0.0.1:8000, create an account, and start uploading study material.

Docker

The repository includes a multi-stage Docker build and a Docker Compose stack for local deployment.

Start with Compose

cp .env.docker.example .env

Set GEMINI_API_KEY in .env, then run:

docker compose up --build

That starts:

app on http://127.0.0.1:8001 by default (HOST_PORT in compose.yaml)
mongo as the internal database service

Persistent data is stored in named volumes for:

MongoDB data
Chroma embeddings
User uploads and processed files

The default image is CPU-only. It installs CPU PyTorch wheels and excludes the optional Docling ASR extras, so regular local runs do not pull CUDA or NVIDIA packages.

Optional GPU profile

docker compose --profile gpu up --build app-gpu mongo

Use this only if your host has NVIDIA Container Toolkit configured. The GPU profile builds a separate image that includes the optional GPU/ASR dependency set.

Stop or reset

docker compose down
docker compose down -v

Testing

Run the main test suite:

./.venv/bin/python -m pytest tests

Run coverage:

./.venv/bin/python -m coverage run --source=app -m pytest tests
./.venv/bin/python -m coverage report -m

MongoDB integration tests require MONGODB_TEST_URI:

MONGODB_TEST_URI="mongodb://localhost:27017" ./.venv/bin/python -m pytest tests/test_mongo_db.py

Run the frontend Playwright E2E suite:

cd frontend
npm run test:e2e

The Playwright suite starts a local Vite server and exercises mocked browser flows under frontend/e2e/, so it does not require the full backend stack for the covered UI journeys.

Available E2E scripts:

cd frontend
npm run test:e2e
npm run test:e2e:headed

For a fuller breakdown of the test suite, see README_TESTS.md.

Topic Miner Workflow

Topic Miner is a separate exam-analysis workspace. Its flow is:

Create an exam folder.
Upload exam PDFs into that folder.
Run folder-level analysis.
Extract topic structure from each paper.
Synthesize recurring themes and example questions across the folder.
Reopen saved analyses later; they are marked stale when folder contents change.

Project Layout

app/
  main.py                 FastAPI entry point and application assembly
  auth.py                 Session auth and password hashing
  config.py               Environment-driven app configuration
  api/
    deps.py               Shared dependency wiring and runtime context
    schemas.py            Request/response models
    routers/              Auth, chat, documents, exams, study, uploads, UI, account
  core/
    ingestion.py          Document processing and extraction
    rag.py                Retrieval-planned RAG orchestration
    metadata_extractor.py Academic metadata extraction
    study_set_generator.py Saved flashcard, MCQ, written, and mixed practice generation
    topic_miner.py        Exam-paper analysis
    workspace_catalog.py  Workspace inventory used by retrieval planning
  services/
    jobs.py               Background upload and topic-mining job management
    ownership.py          User-scoped access validation helpers
    storage.py            User file storage paths and persistence helpers
  db/
    repository.py         Database interface used across the app
    mongo.py              MongoDB integration
    metadata.py           Metadata persistence helpers
    vector_store.py       ChromaDB indexing and search
frontend/
  src/
    app/                  Modular React app
      components/         Chat, layout, modal, and accessibility UI pieces
      hooks/              Shared React hooks
      screens/            Standalone screens
      sections/           Workspace and studio section composition
  e2e/                    Playwright end-to-end suite with mocked API flows
requirements/
  base.txt                Shared Python dependencies
  cpu.txt                 CPU runtime dependencies
  gpu.txt                 Optional GPU runtime dependencies
assets/
  studyspace_banner.png   README hero banner
  preview.png             Workspace preview image
tests/                    Backend pytest suite
  conftest.py             Shared fixtures and test helpers
compose.yaml              Local Mongo + app compose stack
Dockerfile                Multi-stage frontend/backend image build

API Overview

Auth

POST /auth/signup
POST /auth/signin
POST /auth/logout
GET /auth/me

Study workspace

POST /upload
GET /upload-jobs
POST /chat
GET /documents
GET /folders
GET /tags
GET /notes
POST /study-sets/generate
GET /study-sets
GET /study-sets/{study_set_id}
DELETE /study-sets/{study_set_id}
POST /quiz/generate
POST /flashcards/generate
GET /metadata

The /study-sets/* endpoints are the current frontend path for generated revision material. The older /quiz/generate and /flashcards/generate endpoints remain available for compatibility.

Topic Miner

GET /exam-folders
POST /exam-folders
POST /exam-folders/{folder_id}/analyze
GET /exam-folders/{folder_id}/analysis
GET /exam-papers
POST /exam-papers/upload

Legacy Migration

Legacy db.json is not used at runtime, but you can still import old data into MongoDB:

python scripts/migrate_json_to_mongo.py \
  --json-path db.json \
  --mongo-uri "$MONGODB_URI" \
  --db-name "$MONGODB_DB_NAME"

Preview counts without writing:

python scripts/migrate_json_to_mongo.py \
  --json-path db.json \
  --mongo-uri "$MONGODB_URI" \
  --db-name "$MONGODB_DB_NAME" \
  --dry-run

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
app		app
assets		assets
frontend		frontend
requirements		requirements
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.docker.example		.env.docker.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README_TESTS.md		README_TESTS.md
compose.yaml		compose.yaml
package-lock.json		package-lock.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Study Space

Overview

What it does well

Workspace Preview

Agentic RAG, Without the Black Box

Feature Highlights

Document Workspace

Transparent Study Chat

Revision Tools

Topic Miner

Accessibility

Architecture

Model stack

User isolation

Quick Start

Prerequisites

Local setup

Docker

Start with Compose

Optional GPU profile

Stop or reset

Testing

Topic Miner Workflow

Project Layout

API Overview

Auth

Study workspace

Topic Miner

Legacy Migration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages