Skip to content

Al1Abdullah/Docs-Stream

Repository files navigation

DocStream

Section 1 — What It Does

  • Converts uploaded documents into clean, LLM-optimized plain text with semantic labels such as [Page 4 - Heading], [Slide 3 - Code Block: Python], and [Page 12 - Table].
  • Reduces token usage by extracting structured content, removing document-layout noise, and returning cached results by SHA-256 hash when available.
  • Supports six processor categories: PDF, PPTX, DOCX, PNG/JPG images, CSV, and XLSX.

Section 2 — Architecture Overview

Chrome Extension
  -- file bytes --> FastAPI /upload
  -- SHA-256 hash --> Redis cache lookup
  <-- cached ProcessingResult, when present -- Redis cache

FastAPI /upload
  -- file bytes + SHA-256 hash --> Dramatiq Worker

Dramatiq Worker
  -- file bytes --> PDFProcessor
  -- file bytes --> PPTXProcessor
  -- file bytes --> DOCXProcessor
  -- file bytes --> ImageProcessor
  -- file bytes --> CSVProcessor
  -- file bytes --> XLSXProcessor

Six processors
  -- ProcessingResult --> Dramatiq Worker

Dramatiq Worker
  -- ProcessingResult keyed by SHA-256 hash --> Redis cache
  -- SSE events --> FastAPI event stream

FastAPI event stream
  -- SSE events --> Chrome Extension

Section 3 — Prerequisites

Install these tools before running DocStream locally:

docker --version
node --version

The Node.js version must be 20.x or newer.

python --version

The Python version must be 3.11.x or newer.

chrome --version

If chrome --version is not available on Windows, verify Chrome from the browser at chrome://settings/help.

Section 4 — Local Setup

git clone <repository-url>
cd Docs-Stream
cp .env.example .env

Fill in REDIS_URL and the Supabase credentials in .env before starting services. The local Docker default for Redis is redis://redis:6379/0.

docker compose up --build

Wait for the FastAPI and Redis health checks to pass:

docker compose ps

Confirm the API health endpoint responds:

curl http://localhost:8000/health

Build the extension:

cd extension
npm ci
npm run build

Section 5 — Loading the Extension

  1. Open Chrome.
  2. Go to chrome://extensions.
  3. Enable Developer mode.
  4. Click Load unpacked.
  5. Select the extension/dist folder from this repository.
  6. Open https://claude.ai.
  7. Verify the DocStream sidebar appears on the page.

Section 6 — Running Tests

Run backend tests with coverage:

python -m pytest --cov=backend --cov-report=term-missing

Expected output line:

27 passed

Run TypeScript checks:

cd extension
npm run lint

Expected output line:

tsc -p tsconfig.json --noEmit && tsc -p tsconfig.node.json --noEmit

Build the Chrome Extension:

cd extension
npm run build

Expected output line:

✓ built

Run extension Playwright tests:

cd extension
npm run test:extension

Expected output line:

passed

Section 7 — Environment Variables

The setup flow expects .env.example to define the following variables:

Variable name What it controls Example value
REDIS_URL Redis connection string used by FastAPI cache utilities and Dramatiq workers. redis://redis:6379/0
SUPABASE_URL Supabase project API URL for application data stored in PostgreSQL. https://example-project.supabase.co
SUPABASE_ANON_KEY Supabase anonymous client key for browser-safe access patterns when enabled. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.example
SUPABASE_SERVICE_ROLE_KEY Supabase service-role key for trusted backend operations only. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.service-role-example

Section 8 — Project Structure

.benchmarks/                 Pytest benchmark output and local benchmark artifacts.
.github/                     GitHub Actions workflow definitions.
.pytest_cache/               Local pytest cache generated by test runs.
backend/                     FastAPI backend package.
backend/api/                 FastAPI route handlers only.
backend/cache/               Redis cache client and dependency utilities.
backend/models/              Pydantic request, response, and processing models.
backend/pipeline/            Document processors for PDF, PPTX, DOCX, images, CSV, and XLSX.
backend/workers/             Dramatiq task worker code for asynchronous processing.
extension/                   Chrome Extension source, Vite config, package lock, and build output.
extension/src/background/    Manifest V3 service worker entry point.
extension/src/components/    React UI components, including the sidebar.
extension/src/content/       Content script source.
extension/src/types/         Browser-only TypeScript module declarations.
extension/src/utils/         Client-side document extraction and backend upload utilities.
extension/src/workers/       Web Worker entry points.
infrastructure/              Docker, Nginx, and deployment configuration.
tests/                       Pytest and Playwright tests.
tests/api/                   FastAPI endpoint and upload-flow tests.
tests/extension/             Playwright tests for extension UI behavior.
tests/pipeline/              Processor tests for each supported document type.
.gitignore                   Git ignore rules for generated and local files.
AGENTS.md                    Repository instructions and implementation constraints for agents.
docker-compose.yml           Local multi-service Compose stack for API, worker, Redis, Nginx, and Prometheus.
pytest.ini                   Pytest configuration and test discovery settings.
README.md                    Developer setup and architecture reference.
setup.sh                     Shell setup helper for project bootstrap tasks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors