- Converts uploaded documents into clean, LLM-optimized plain text with semantic labels such as
[Page 4 - Heading],[Slide 3 - Code Block: Python], and[Page 12 - Table]. - Reduces token usage by extracting structured content, removing document-layout noise, and returning cached results by SHA-256 hash when available.
- Supports six processor categories: PDF, PPTX, DOCX, PNG/JPG images, CSV, and XLSX.
Chrome Extension
-- file bytes --> FastAPI /upload
-- SHA-256 hash --> Redis cache lookup
<-- cached ProcessingResult, when present -- Redis cache
FastAPI /upload
-- file bytes + SHA-256 hash --> Dramatiq Worker
Dramatiq Worker
-- file bytes --> PDFProcessor
-- file bytes --> PPTXProcessor
-- file bytes --> DOCXProcessor
-- file bytes --> ImageProcessor
-- file bytes --> CSVProcessor
-- file bytes --> XLSXProcessor
Six processors
-- ProcessingResult --> Dramatiq Worker
Dramatiq Worker
-- ProcessingResult keyed by SHA-256 hash --> Redis cache
-- SSE events --> FastAPI event stream
FastAPI event stream
-- SSE events --> Chrome Extension
Install these tools before running DocStream locally:
docker --versionnode --versionThe Node.js version must be 20.x or newer.
python --versionThe Python version must be 3.11.x or newer.
chrome --versionIf chrome --version is not available on Windows, verify Chrome from the browser at chrome://settings/help.
git clone <repository-url>cd Docs-Streamcp .env.example .envFill in REDIS_URL and the Supabase credentials in .env before starting services. The local Docker default for Redis is redis://redis:6379/0.
docker compose up --buildWait for the FastAPI and Redis health checks to pass:
docker compose psConfirm the API health endpoint responds:
curl http://localhost:8000/healthBuild the extension:
cd extension
npm ci
npm run build- Open Chrome.
- Go to
chrome://extensions. - Enable
Developer mode. - Click
Load unpacked. - Select the
extension/distfolder from this repository. - Open
https://claude.ai. - Verify the DocStream sidebar appears on the page.
Run backend tests with coverage:
python -m pytest --cov=backend --cov-report=term-missingExpected output line:
27 passed
Run TypeScript checks:
cd extension
npm run lintExpected output line:
tsc -p tsconfig.json --noEmit && tsc -p tsconfig.node.json --noEmit
Build the Chrome Extension:
cd extension
npm run buildExpected output line:
✓ built
Run extension Playwright tests:
cd extension
npm run test:extensionExpected output line:
passed
The setup flow expects .env.example to define the following variables:
| Variable name | What it controls | Example value |
|---|---|---|
REDIS_URL |
Redis connection string used by FastAPI cache utilities and Dramatiq workers. | redis://redis:6379/0 |
SUPABASE_URL |
Supabase project API URL for application data stored in PostgreSQL. | https://example-project.supabase.co |
SUPABASE_ANON_KEY |
Supabase anonymous client key for browser-safe access patterns when enabled. | eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.example |
SUPABASE_SERVICE_ROLE_KEY |
Supabase service-role key for trusted backend operations only. | eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.service-role-example |
.benchmarks/ Pytest benchmark output and local benchmark artifacts.
.github/ GitHub Actions workflow definitions.
.pytest_cache/ Local pytest cache generated by test runs.
backend/ FastAPI backend package.
backend/api/ FastAPI route handlers only.
backend/cache/ Redis cache client and dependency utilities.
backend/models/ Pydantic request, response, and processing models.
backend/pipeline/ Document processors for PDF, PPTX, DOCX, images, CSV, and XLSX.
backend/workers/ Dramatiq task worker code for asynchronous processing.
extension/ Chrome Extension source, Vite config, package lock, and build output.
extension/src/background/ Manifest V3 service worker entry point.
extension/src/components/ React UI components, including the sidebar.
extension/src/content/ Content script source.
extension/src/types/ Browser-only TypeScript module declarations.
extension/src/utils/ Client-side document extraction and backend upload utilities.
extension/src/workers/ Web Worker entry points.
infrastructure/ Docker, Nginx, and deployment configuration.
tests/ Pytest and Playwright tests.
tests/api/ FastAPI endpoint and upload-flow tests.
tests/extension/ Playwright tests for extension UI behavior.
tests/pipeline/ Processor tests for each supported document type.
.gitignore Git ignore rules for generated and local files.
AGENTS.md Repository instructions and implementation constraints for agents.
docker-compose.yml Local multi-service Compose stack for API, worker, Redis, Nginx, and Prometheus.
pytest.ini Pytest configuration and test discovery settings.
README.md Developer setup and architecture reference.
setup.sh Shell setup helper for project bootstrap tasks.