Skip to content

Al-powered document scanning and processing system with real-time desktop-mobile synchronization. Built with Flask (Python) backend, React + TypeScript frontend, OpenCV image enhancement, Tesseract OCR, Socket.IO WebSockets, and PowerShell automation for seamless printing and workflow management.

Notifications You must be signed in to change notification settings

chaman2003/Printchakra-AI

Repository files navigation

PrintChakra

Version Python React Node.js TypeScript License

πŸš€ AI-Powered Document Processing & Intelligent Print Automation

Enterprise-grade document handling with voice control, OCR, and AI-assisted workflows

Transform how you handle documents with voice-controlled, AI-assisted printing and scanning workflows. PrintChakra combines computer vision, OCR, and LLM-powered intelligence for seamless document management.


πŸ“‹ Table of Contents


🎯 Overview

PrintChakra is a comprehensive, full-stack document processing platform that revolutionizes printing and scanning workflows. It seamlessly integrates advanced OCR technology, AI-assisted document understanding, voice-enabled interaction, and intelligent printer management into a unified system.

Why PrintChakra?

  • 🎀 Hands-Free Voice Control – Speak commands to configure print/scan jobs, manage queues, and control devices
  • 🧠 AI-Powered Intent Detection – Automatically configures workflows from natural language commands
  • πŸ“Έ Advanced OCR Pipeline – 12-stage image enhancement and text extraction for maximum accuracy
  • ⚑ Real-Time Synchronization – WebSocket-powered instant updates across all interfaces
  • πŸ”§ Modular Architecture – Easy to extend with custom integrations and workflows

πŸ“Š Project Status

Current Version: 2.2.0

Release Date: December 2025 | Status: βœ… Active Development

Project Stages & Completion

Stage Component Status Details
Backend Foundation Flask API Framework βœ… Complete REST API with Socket.IO, error handling, logging
Frontend UI React + TypeScript Interface βœ… Complete Responsive dashboard, modals, real-time updates
AI Integration State Machine + Voice Processing βœ… Complete Strict workflow state validation, command parsing, voice bridge
Orchestration Print/Scan Workflow Engine βœ… Complete Stateful workflow management, intent detection, configuration
Document Processing OCR Pipeline βœ… Complete 12-stage enhancement, PaddleOCR integration, format conversion
Voice Interface Whisper + TTS + Ollama βœ… Complete STT, LLM intent detection, Coqui TTS responses
Printing System Hardware Integration βœ… Complete pywin32 drivers, multi-printer support, queue management
Real-Time Communication WebSocket Sync βœ… Complete Socket.IO integration, live status updates
AI Workflow Refinements Response Optimization βœ… Complete Concise responses (15-word limit), human-like interactions
Comprehensive Documentation README + Print Commands βœ… Complete Full AI workflow docs, command tables, implementation guide

Key Implementations Completed

βœ… Backend (Flask + Python)

  • REST API endpoints for document management (/api/documents, /api/print, /api/scan)
  • Socket.IO event handlers for real-time communication
  • Orchestration service with state machine (WorkflowState, IntentType)
  • Voice processing pipeline (voice_prompt.py, voice_bridge.py)
  • OCR module with image enhancement (12-stage pipeline)
  • Print/Scan configuration management
  • Error handling & comprehensive logging

βœ… Frontend (React + TypeScript)

  • AI Assist hook system (useAIAssist, useVoiceCommandBridge)
  • State manager for strict workflow control (stateManager.ts)
  • Command parser with confidence scoring (commandParser.ts)
  • Action handler with callback integration (actionHandler.ts)
  • Real-time settings synchronization
  • Document selection with multi-select support
  • Voice command bridge for backend/frontend integration

βœ… AI Workflow System

  • 3-State Architecture: Dashboard β†’ Print/Scan Mode β†’ Step Progression
  • 4-Step Print: Select β†’ Configure β†’ Review β†’ Execute
  • 5-Step Scan: Source β†’ Select β†’ Configure β†’ Review β†’ Execute
  • "Sorry" Protocol: Safety mechanism for workflow switching
  • Command Parsing: Regex-based pattern matching with 50+ command keywords
  • State Validation: Contextual command validation per workflow step
  • Voice/Text Parity: Identical behavior for voice and text inputs
  • Response Optimization: Concise responses (max 15 words, 1 sentence)

βœ… Features Implemented

Feature Frontend Backend Status
Document Upload βœ… Modal UI βœ… File handling βœ… Complete
Document Selection βœ… Multi-select βœ… Indexing βœ… Complete
Print Settings βœ… All controls βœ… Config storage βœ… Complete
Scan Settings βœ… All controls βœ… Config storage βœ… Complete
Voice Commands βœ… Whisper STT βœ… Intent detection βœ… Complete
AI Responses βœ… TTS playback βœ… Response generation βœ… Complete
Workflow State βœ… Validation βœ… Orchestration βœ… Complete
Real-Time Sync βœ… Socket.IO βœ… Event broadcast βœ… Complete
Settings Review βœ… Display panel βœ… Summary generation βœ… Complete
Error Handling βœ… Toast messages βœ… Error responses βœ… Complete

✨ Key Features

Document Management

  • Multi-Format Support – Process PDFs, images, Word documents, and scanned files
  • Intelligent OCR Pipeline – Extract text with 12-stage image enhancement and quality scoring
  • Batch Processing – Handle dozens or hundreds of documents with single commands
  • Format Conversion – Automatic conversion between PDF, images, and text formats
  • Real-Time Processing Status – Monitor document pipeline stages with visual indicators

Printing & Scanning

  • Smart Print Configuration – Paper size, orientation, color mode, quality, copy count, duplex
  • Advanced Scan Configuration – DPI, color mode, file format, batch scanning, OCR toggle
  • Multi-Printer Support – Manage multiple printers simultaneously from unified interface
  • Print Queue Management – Real-time monitoring and control of active print jobs
  • Printer Feed Tray Support – Direct document feeding from printer hardware

Voice & AI

  • Continuous Voice Listening – 10-15x faster Whisper transcription with local processing
  • Natural Language Commands – Control all functions with voice or text input
  • Contextual AI Analysis – Intelligent document understanding and metadata extraction
  • Customizable Prompts – Configure AI behavior through simple config files
  • Concise Spoken Responses – Max 15 words, human-like interactions with immediate feedback

Real-Time Monitoring

  • Live Dashboard – Real-time document upload and processing status
  • Device Status – Printer connectivity, driver availability, system resources
  • Connectivity Verification – Backend API health, device connectivity, link establishment
  • Process Tracking – Pipeline visualization showing document processing stages
  • Workflow Progress – Step-by-step indication of print/scan progress

πŸš€ Implementation Highlights

Architecture Innovations

Innovation Benefit Implementation
Strict State Machine Prevents workflow confusion AppState + WorkflowStep with validated transitions
"Sorry" Protocol Safety for mode switching Requires keyword before switching print ↔ scan
Voice/Text Parity Unified experience Identical command parsing + responses for both inputs
Real-Time Sync Live updates across devices Socket.IO with event broadcasting
Intent Detection Natural language understanding Ollama LLM with fallback keyword matching
Response Optimization Natural speech Max 15 words, 1 sentence, context-aware

Core Systems

Frontend State Management

  • stateManager.ts: Enforces workflow progression with state validation
  • commandParser.ts: Parses 50+ command patterns with confidence scoring
  • actionHandler.ts: Routes commands to appropriate handlers
  • useAIAssist.ts: Main AI interaction hook with callbacks
  • useVoiceCommandBridge.ts: Bridges backend voice intents to frontend actions

Backend Orchestration

  • PrintScanOrchestrator: Manages workflow state and transitions
  • IntentType Detection: Print, Scan, Status, Configure, Help, etc.
  • VoicePromptManager: Handles LLM queries and response formatting
  • OCR Pipeline: 12-stage image enhancement with quality scoring
  • Configuration Manager: Persists user settings across sessions

πŸ›  Tech Stack

Backend

Component Technology Purpose
Framework Flask 3.0 REST API & real-time coordination
Real-Time Socket.IO 5.3 WebSocket synchronization
OCR PaddleOCR 2.7 Advanced text extraction
Voice OpenAI Whisper Speech-to-text transcription
PDF PyMuPDF, Poppler Document processing
Image OpenCV, Pillow Image enhancement
Printing pywin32 Windows printer communication
AI Ollama Integration Local LLM for intent detection

Frontend

Component Technology Purpose
Framework React 19 UI framework
Language TypeScript 4.9 Type-safe development
UI Library Chakra UI 2.10 Accessible components
Styling Emotion CSS-in-JS styling
Communication Socket.IO Client Real-time updates
HTTP Axios API requests
Routing React Router 7 Page navigation
Icons Iconify, React Icons Icon system
Animations Framer Motion Smooth animations

DevOps & Deployment

  • Containerization – Docker support for consistent deployments
  • Frontend Deployment – Vercel configuration included
  • Environment Management – Python dotenv for configuration
  • Automation Scripts – PowerShell scripts for setup and management
  • Git Workflow – Full version control with documented refactoring history

πŸš€ Quick Start

Prerequisites

  • Windows 10/11 (due to printer integration)
  • Python 3.8+
  • Node.js 18+
  • npm or yarn
  • Git (for version control)
  • Ollama (optional, for enhanced AI features)

Installation

1. Clone the Repository

git clone https://github.com/chaman2003/printchakra.git
cd printchakra

2. Backend Setup

cd backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

3. Frontend Setup

cd ../frontend
npm install
# or
yarn install

4. Environment Configuration

Create .env file in backend/ directory:

FRONTEND_URL=http://localhost:3000
BACKEND_PUBLIC_URL=http://localhost:5000
API_CORS_ORIGINS=http://localhost:3000

# Ollama Configuration (optional)
OLLAMA_BASE_URL=http://localhost:11434
VOICE_AI_MODEL=smollm2:135m

# Voice Settings
VOICE_SYSTEM_PROMPT_FILE=backend/config/prompts/system_prompt.txt
VOICE_COMMAND_MAPPINGS_FILE=backend/config/prompts/command_mappings.json

Running the Application

Option 1: Using PowerShell Scripts (Recommended)

# Start all services
.\scripts\run-all.ps1

# Or start individually
.\scripts\backend.ps1
.\scripts\frontend.ps1

Option 2: Manual Start

# Terminal 1 - Backend
cd backend
.\venv\Scripts\activate
python app.py

# Terminal 2 - Frontend
cd frontend
npm start

Access the Application:


πŸ“ Project Structure

printchakra/
β”‚
β”œβ”€β”€ backend/                       # Flask backend application
β”‚   β”œβ”€β”€ app.py                     # Main application entry point
β”‚   β”œβ”€β”€ requirements.txt           # Python dependencies
β”‚   β”œβ”€β”€ REFACTORING_PLAN.md        # Refactoring documentation
β”‚   β”œβ”€β”€ app/                       # Core application module
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ api/                   # REST API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   └── document.py        # Document management endpoints
β”‚   β”‚   β”œβ”€β”€ config/                # Configuration module
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ settings.py        # Configuration management
β”‚   β”‚   β”‚   └── prompts/           # AI system prompts
β”‚   β”‚   β”œβ”€β”€ core/                  # Core utilities
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”‚   β”œβ”€β”€ extensions.py      # Flask extensions
β”‚   β”‚   β”‚   β”œβ”€β”€ logging_config.py  # Logging configuration
β”‚   β”‚   β”‚   └── middleware/        # Middleware modules
β”‚   β”‚   β”œβ”€β”€ models/                # Data models
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ document.py        # Document model
β”‚   β”‚   β”‚   β”œβ”€β”€ file_info.py       # File information model
β”‚   β”‚   β”‚   β”œβ”€β”€ print_config.py    # Print configuration model
β”‚   β”‚   β”‚   └── scan_config.py     # Scan configuration model
β”‚   β”‚   β”œβ”€β”€ middleware/            # Middleware handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ cors_config.py     # CORS configuration
β”‚   β”‚   β”‚   β”œβ”€β”€ error_handler.py   # Error handling
β”‚   β”‚   β”‚   └── request_logger.py  # Request logging
β”‚   β”‚   β”œβ”€β”€ features/              # Feature modules
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ connection/        # Connection management
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard/         # Dashboard services
β”‚   β”‚   β”‚   β”œβ”€β”€ document/          # Document features
β”‚   β”‚   β”‚   β”œβ”€β”€ orchestration/     # Workflow orchestration
β”‚   β”‚   β”‚   β”œβ”€β”€ phone/             # Phone integration
β”‚   β”‚   β”‚   β”œβ”€β”€ print/             # Printing features
β”‚   β”‚   β”‚   └── voice/             # Voice features
β”‚   β”‚   β”œβ”€β”€ modules/               # Processing modules
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ api_endpoints.py   # API endpoint definitions
β”‚   β”‚   β”‚   β”œβ”€β”€ utility.py         # Utility functions
β”‚   β”‚   β”‚   β”œβ”€β”€ document/          # Document processing
β”‚   β”‚   β”‚   β”œβ”€β”€ image/             # Image enhancement
β”‚   β”‚   β”‚   β”œβ”€β”€ ocr/               # OCR pipeline
β”‚   β”‚   β”‚   β”œβ”€β”€ orchestration/     # Orchestration logic
β”‚   β”‚   β”‚   β”œβ”€β”€ pipeline/          # Processing pipeline
β”‚   β”‚   β”‚   └── voice/             # Voice processing
β”‚   β”‚   β”œβ”€β”€ sockets/               # WebSocket handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   └── handlers.py        # Socket.IO event handlers
β”‚   β”‚   └── utils/                 # Utility functions
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ file_utils.py      # File operations
β”‚   β”‚       β”œβ”€β”€ image_utils.py     # Image utilities
β”‚   β”‚       └── logger.py          # Logging utilities
β”‚   β”‚   β”œβ”€β”€ print_scripts/             # Printing utility scripts
β”‚   β”‚   β”‚   β”œβ”€β”€ print-file.py          # File printing script
β”‚   β”‚   β”‚   β”œβ”€β”€ printer_test.py        # Printer testing utility
β”‚   β”‚   β”‚   └── README.md              # Printing scripts documentation
β”‚   β”œβ”€β”€ data/                      # Data storage directories
β”‚   β”‚   β”œβ”€β”€ uploads/               # User uploaded files
β”‚   β”‚   β”œβ”€β”€ processed/             # Processed files
β”‚   β”‚   β”œβ”€β”€ converted/             # Format-converted files
β”‚   β”‚   β”œβ”€β”€ pdfs/                  # Generated PDFs
β”‚   β”‚   β”œβ”€β”€ processed_text/        # Extracted text files
β”‚   β”‚   β”œβ”€β”€ models/                # Model files
β”‚   β”‚   └── ocr_results/           # OCR output
β”‚   β”œβ”€β”€ public/                    # Static files and resources
β”‚   β”‚   β”œβ”€β”€ blank.pcl              # Printer control language file
β”‚   β”‚   β”œβ”€β”€ test_print.txt         # Test print file
β”‚   β”‚   β”œβ”€β”€ data/                  # Data subdirectories
β”‚   β”‚   β”‚   β”œβ”€β”€ converted/
β”‚   β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”‚   β”œβ”€β”€ ocr_results/
β”‚   β”‚   β”‚   β”œβ”€β”€ pdfs/
β”‚   β”‚   β”‚   β”œβ”€β”€ processed/
β”‚   β”‚   β”‚   β”œβ”€β”€ processed_text/
β”‚   β”‚   β”‚   └── uploads/
β”‚   β”‚   └── poppler/               # Poppler binary for PDF processing
β”‚   β”‚       └── poppler-24.08.0/   # Poppler version
β”‚   β”œβ”€β”€ logs/                      # Application logs
β”‚   └── __pycache__/               # Python cache files
β”‚
β”œβ”€β”€ frontend/                      # React + TypeScript frontend
β”‚   β”œβ”€β”€ package.json               # Node.js dependencies
β”‚   β”œβ”€β”€ tsconfig.json              # TypeScript configuration
β”‚   β”œβ”€β”€ craco.config.js            # Create React App config
β”‚   β”œβ”€β”€ vercel.json                # Vercel deployment config
β”‚   β”œβ”€β”€ src/                       # Source code
β”‚   β”‚   β”œβ”€β”€ App.tsx                # Main app component
β”‚   β”‚   β”œβ”€β”€ App.css                # App styles
β”‚   β”‚   β”œβ”€β”€ index.tsx              # React entry point
β”‚   β”‚   β”œβ”€β”€ index.css              # Global styles
β”‚   β”‚   β”œβ”€β”€ config.ts              # Frontend configuration
β”‚   β”‚   β”œβ”€β”€ types.ts               # TypeScript types
β”‚   β”‚   β”œβ”€β”€ theme.ts               # Chakra theme configuration
β”‚   β”‚   β”œβ”€β”€ apiClient.ts           # HTTP API client
β”‚   β”‚   β”œβ”€β”€ ocrApi.ts              # OCR API interface
β”‚   β”‚   β”œβ”€β”€ react-app-env.d.ts     # React environment types
β”‚   β”‚   β”œβ”€β”€ reportWebVitals.ts     # Performance metrics
β”‚   β”‚   β”œβ”€β”€ setupWarnings.js       # Console warnings setup
β”‚   β”‚   β”œβ”€β”€ aiassist/              # AI assistance features
β”‚   β”‚   β”‚   β”œβ”€β”€ actionHandler.ts   # Action handling
β”‚   β”‚   β”‚   β”œβ”€β”€ commandParser.ts   # Command parsing
β”‚   β”‚   β”‚   └── ...                # Other AI features
β”‚   β”‚   β”œβ”€β”€ components/            # React components
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard/         # Dashboard components
β”‚   β”‚   β”‚   β”œβ”€β”€ document/          # Document management UI
β”‚   β”‚   β”‚   β”œβ”€β”€ layout/            # Layout components
β”‚   β”‚   β”‚   β”œβ”€β”€ orchestration/     # Workflow UI
β”‚   β”‚   β”‚   β”œβ”€β”€ voice/             # Voice control UI
β”‚   β”‚   β”‚   └── common/            # Shared components
β”‚   β”‚   β”œβ”€β”€ pages/                 # Page components
β”‚   β”‚   β”œβ”€β”€ context/               # React context (Socket.IO, etc)
β”‚   β”‚   β”œβ”€β”€ hooks/                 # Custom React hooks
β”‚   β”‚   β”œβ”€β”€ utils/                 # Frontend utilities
β”‚   β”‚   β”œβ”€β”€ styles/                # Global styles
β”‚   β”‚   └── ui/                    # UI utilities
β”‚   β”œβ”€β”€ public/                    # Static assets
β”‚   β”‚   β”œβ”€β”€ index.html             # HTML entry point
β”‚   β”‚   β”œβ”€β”€ manifest.json          # PWA manifest
β”‚   β”‚   └── robots.txt             # SEO robots file
β”‚   β”œβ”€β”€ build/                     # Production build output
β”‚   β”‚   β”œβ”€β”€ index.html
β”‚   β”‚   β”œβ”€β”€ asset-manifest.json
β”‚   β”‚   β”œβ”€β”€ manifest.json
β”‚   β”‚   β”œβ”€β”€ robots.txt
β”‚   β”‚   └── static/                # Built assets
β”‚   β”‚       β”œβ”€β”€ css/
β”‚   β”‚       β”œβ”€β”€ js/
β”‚   β”‚       └── media/
β”‚   └── node_modules/              # Node dependencies (git-ignored)
β”‚
β”œβ”€β”€ scripts/                       # Automation scripts
β”‚   β”œβ”€β”€ backend.ps1                # Backend startup script
β”‚   β”œβ”€β”€ frontend.ps1               # Frontend startup script
β”‚   β”œβ”€β”€ run-all.ps1                # Run all services script
β”‚   β”œβ”€β”€ cleanup.ps1                # Cleanup script
β”‚   β”œβ”€β”€ ngrok.ps1                  # Ngrok tunneling script
β”‚   └── install_cuda_pytorch.ps1   # CUDA/PyTorch installation
β”‚
β”œβ”€β”€ docs/                          # Documentation
β”‚   β”œβ”€β”€ outcome.txt                # Outcome documentation
β”‚   β”œβ”€β”€ ENHANCEMENTS/              # Enhancement proposals
β”‚   └── pics/                      # Documentation images
β”‚       └── TECHNOLOGY_STACK.txt   # Technology stack details
β”‚
β”œβ”€β”€ README.md                      # This file
β”œβ”€β”€ prompt.txt                     # Project prompt
└── error.txt                      # Error log

πŸ— System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CLIENT LAYER                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Web Dashboard   β”‚  Mobile Capture  β”‚  Voice Control Panel  β”‚
β”‚  (React + TS)    β”‚  (Responsive)    β”‚  (Real-time)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚              β”‚                    β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   Socket.IO / WebSocket
                            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    COMMUNICATION LAYER                β”‚
        β”‚  - Real-time Updates                  β”‚
        β”‚  - Event Broadcasting                 β”‚
        β”‚  - Connection Management              β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    API LAYER (Flask + REST)           β”‚
        β”‚  - Document endpoints                 β”‚
        β”‚  - Print/Scan configuration           β”‚
        β”‚  - File conversion                    β”‚
        β”‚  - Device management                  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    BUSINESS LOGIC LAYER               β”‚
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚ Document β”‚ β”‚  Voice   β”‚            β”‚
        β”‚ β”‚Processingβ”‚ β”‚ AI/Whisper            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚   OCR    β”‚ β”‚ Printing β”‚            β”‚
        β”‚ β”‚ Pipeline β”‚ β”‚ Scanning β”‚            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚  Image   β”‚ β”‚Orchestr. β”‚            β”‚
        β”‚ β”‚Enhancement           β”‚            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    DATA LAYER                         β”‚
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚   File   β”‚ β”‚  Model   β”‚            β”‚
        β”‚ β”‚ Storage  β”‚ β”‚ Management            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
        β”‚ β”‚   Logging & Metrics  β”‚             β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    EXTERNAL INTEGRATIONS              β”‚
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚ Printers β”‚ β”‚ Scanners β”‚            β”‚
        β”‚ β”‚ (Windows)β”‚ β”‚(pywin32) β”‚            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚ β”‚  Ollama  β”‚ β”‚ Poppler  β”‚            β”‚
        β”‚ β”‚  (LLM)   β”‚ β”‚(PDF Util)β”‚            β”‚
        β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

βš™οΈ Configuration

Environment Variables (backend/.env)

# Application
DEBUG=false
ENV=production

# Frontend & CORS
FRONTEND_URL=http://localhost:3000
BACKEND_PUBLIC_URL=http://localhost:5000
API_CORS_ORIGINS=http://localhost:3000,https://yourapp.com

# Ollama Configuration (Local LLM)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_CHAT_ENDPOINT=/api/chat
OLLAMA_TAGS_ENDPOINT=/api/tags
OLLAMA_TIMEOUT=60
OLLAMA_VERIFY_SSL=true

# Voice AI Model
VOICE_AI_MODEL=smollm2:135m
VOICE_SYSTEM_PROMPT_FILE=backend/config/prompts/system_prompt.txt
VOICE_COMMAND_MAPPINGS_FILE=backend/config/prompts/command_mappings.json

# Logging
LOG_LEVEL=INFO
LOGS_DIR=backend/logs

Prompt Configuration (backend/config/prompts/)

system_prompt.txt

  • Core behavior definition for AI assistant
  • Configured with command patterns and response templates
  • Plain text format for easy editing

command_mappings.json

{
  "wake_words": [...],
  "command_patterns": {...},
  "responses": {...},
  "ollama_sampling": {...}
}

πŸ“– Usage Guide

Dashboard Features

  1. Document Management

    • Upload and monitor document processing
    • View OCR results in real-time
    • Browse converted and processed files
    • Select and batch process multiple documents
  2. Print Configuration

    • Choose printer from available devices
    • Set paper size, orientation, color mode
    • Configure quality, copies, collation
    • Preview print layout before sending
  3. Scan Configuration

    • Customize scan resolution and quality
    • Select file format (image/PDF)
    • Enable automatic document detection
    • Batch scan multiple pages
  4. Device Management

    • View all connected printers
    • Monitor printer status and health
    • Access driver downloads
    • View system resources and performance
  5. Voice Control

    • Activate continuous listening
    • Issue commands in natural language
    • Configure jobs via voice
    • Receive voice feedback and confirmations

PrintChakra AI Workflow Documentation

This document outlines the AI-driven workflow and command structure for PrintChakra. It serves as a reference for both developers and users to understand how the AI assistant interacts with the system across different states and workflows.


🧠 AI Workflow Architecture

PrintChakra uses a strict state-machine-based AI assistant that ensures users follow a logical progression for printing and scanning tasks. The assistant supports both voice and text inputs with identical behavior.

Workflow States

State Description Valid Entry Commands
DASHBOARD The default state. AI is ready to start a new workflow. print, scan, help, status
PRINT_WORKFLOW Active when a user is preparing a print job. sorry, print (if in Scan mode)
SCAN_WORKFLOW Active when a user is preparing a scan job. sorry, scan (if in Print mode)

πŸ”„ Mode Switching (The "Sorry" Protocol)

To prevent accidental workflow interruptions, switching between Print and Scan modes while one is active requires the "sorry" keyword.

Action Command Example AI Response
Switch to Scan from Print sorry, scan Scan mode.
Switch to Print from Scan sorry, print Print mode.
Attempt switch without "sorry" scan (while in Print) Say "sorry" first to switch to scan.

πŸ–¨οΈ Print Workflow Commands

The print workflow follows a 4-step progression: Select -> Configure -> Review -> Execute.

Step 1: Document Selection

State: PRINT_WORKFLOW | Step: SELECT_DOCUMENT

Command Type Patterns Example AI Response
Select select, choose, pick select document 1 Got it, document 1.
Section converted, uploaded, originals switch to converted Converted.
Navigation next, previous, back next document Next.
Continue confirm, proceed, next step confirm selection Ready. Confirm?

Step 2: Configuration

State: PRINT_WORKFLOW | Step: CONFIGURATION

Setting Patterns Example AI Response
Layout portrait, landscape set landscape Landscape.
Color color, grayscale, bw color mode Color.
Copies copies, copy 3 copies 3 copies.
Paper Size A4, Letter, Legal A4 size A4.
Quality draft, normal, high high quality High quality.
Duplex duplex, double sided double sided Double-sided.

Step 3: Review & Step 4: Execution

State: PRINT_WORKFLOW | Step: REVIEW / EXECUTING

Action Patterns Example AI Response
Execute confirm, start, print confirm print Printing now!
Cancel cancel, stop, abort cancel print Cancelled.
Status status, progress what's the status? Printing...

πŸ“Έ Scan Workflow Commands

The scan workflow follows a 5-step progression: Source -> Select -> Configure -> Review -> Execute.

Step 1: Source Selection

State: SCAN_WORKFLOW | Step: SOURCE_SELECTION

Action Patterns Example AI Response
Feed Tray feed, tray, insert use feed tray Feeding documents.
Manual Select select, manual select from files Opening selection.

Step 2 & 3: Selection & Configuration

State: SCAN_WORKFLOW | Step: SELECT_DOCUMENT / CONFIGURATION

Setting Patterns Example AI Response
OCR ocr, text mode, recognize enable ocr OCR on.
Format pdf, jpeg, png save as pdf PDF.
Resolution dpi, resolution 300 dpi 300 DPI.
Mode single, multi, batch multi page scan Multi page.

🌐 Global & UI Commands

These commands are available across most states to control the interface and get information.

Category Command Example Action
Help help, commands what can you do? Shows help dialog
Status status, where are we current status Reports current mode/step
Navigation scroll up, scroll down scroll down Scrolls the active panel
UI Control close, exit, back close panel Closes modals or goes back
System connectivity, device info check printer Shows device status toast

πŸ”„ Command Flow Example: Full Print Job

  1. User: "print" -> AI: "Print mode." (Enters PRINT_WORKFLOW)
  2. User: "select document 3" -> AI: "Got it, document 3."
  3. User: "landscape, 2 copies" -> AI: "Landscape. 2 copies."
  4. User: "confirm" -> AI: "Ready. Confirm?" (Moves to REVIEW)
  5. User: "yes" -> AI: "Printing now!" (Moves to EXECUTING)

πŸ›  Technical Implementation Details

  • Command Parsing: Handled by commandParser.ts using regex and keyword matching.
  • State Validation: Enforced by stateManager.ts to ensure commands are contextually valid.
  • Action Execution: Dispatched via actionHandler.ts to the UI and backend.
  • Voice Bridge: useVoiceCommandBridge.ts synchronizes backend voice intents with frontend state.

πŸ‘¨β€πŸ’» Development

Setting Up Development Environment

# Clone and setup
git clone https://github.com/chaman2003/printchakra.git
cd printchakra

# Backend development
cd backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
pip install -e .  # For development mode

# Frontend development
cd ../frontend
npm install
npm run dev  # Start with hot reload

Running Tests

# Backend tests
cd backend
python -m pytest tests/

# Frontend tests
cd ../frontend
npm test

# Conversion validation
python backend/app/print_scripts/print-file.py <file_path>

Code Structure Guidelines

  • Modular Design – Each feature in its own module
  • Separation of Concerns – Routes β†’ Services β†’ Utilities
  • Error Handling – Comprehensive logging and user feedback
  • Type Safety – Full TypeScript coverage in frontend

🚒 Deployment

Docker Deployment

# Build containers
docker build -t printchakra-backend ./backend
docker build -t printchakra-frontend ./frontend

# Run services
docker-compose up -d

Vercel Deployment (Frontend)

# Install Vercel CLI
npm i -g vercel

# Deploy
cd frontend
vercel deploy --prod

Environment-Specific Configuration

  • Development – Local services, verbose logging
  • Staging – Pre-production environment
  • Production – Hardened security, performance optimized

🀝 Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit with clear messages (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request with detailed description

Code Standards

  • Follow PEP 8 (Python)
  • Use ESLint + Prettier (TypeScript/React)
  • Include tests with 80%+ coverage
  • Update documentation for new features

πŸ“„ License & Author

License: MIT License

Author: Chaman S (GitHub: @chaman2003)

This project is open source and available under the MIT License. See LICENSE file for details.


πŸ“ž Support & Feedback

  • Issues – Report bugs on GitHub Issues
  • Discussions – Join conversations on GitHub Discussions
  • Documentation – Read detailed docs in docs/ folder

πŸŽ“ Learning Resources


Made with ❀️ by Chaman S

If you find this project helpful, please consider giving it a ⭐ on GitHub!

⬆ Back to top

About

Al-powered document scanning and processing system with real-time desktop-mobile synchronization. Built with Flask (Python) backend, React + TypeScript frontend, OpenCV image enhancement, Tesseract OCR, Socket.IO WebSockets, and PowerShell automation for seamless printing and workflow management.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published