Skip to content

Workflow-Aware Screen Reader Companion for Predictive UI Intent Analysis

Notifications You must be signed in to change notification settings

AryanKansagara/blindly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Blindly 🦯

Workflow-Aware Screen Reader Companion for Predictive UI Intent Analysis

🎯 Core Concept

We focus on awareness before action. Instead of just reading labels like β€˜Submit,’ our tool explains what the action really does, subscribing to emails or sending bank info, before the user clicks.

Blindly is a Chrome extension that predicts the consequences of UI actions before users interact with them. It combines local DOM analysis, workflow context detection, and Gemini AI reasoning to provide clear explanations via accessible visual overlays and natural ElevenLabs voice output.

The Problem

Screen reader users face uncertainty navigating web forms and interactive elements. They often don't know if clicking a button will:

  • Charge their credit card
  • Delete their account permanently
  • Submit official government forms
  • Initiate irreversible financial transactions (like Interac e-Transfers)

Our Solution

Blindly analyzes UI elements in real-time using a multi-layered approach:

  1. Local Signal Detection (Fast, deterministic)

    • DOM patterns and form structure analysis
    • Keyword matching for payments, risks, and workflows
    • Canadian-specific patterns (Interac, government domains)
  2. Workflow Context Analysis

    • Stepper UI detection (Step X of Y)
    • Progress indicators and breadcrumbs
    • Multi-page flow understanding
  3. Gemini AI Reasoning (Complex cases)

    • Converts signals into human explanations
    • Structured JSON responses with risk levels
    • Handles ambiguous or novel UI patterns
  4. Accessible Output

    • High-contrast visual overlays (WCAG AAA)
    • ElevenLabs natural voice synthesis
    • Keyboard navigation and screen reader compatibility

Analysis Pipeline

  1. Event Trigger: Focus/hover on interactive element (300ms delay)
  2. Context Extraction: DOM analysis, form detection, signal gathering
  3. Local Analysis: Fast deterministic checks for high-confidence cases
  4. AI Analysis: Gemini API (direct) β†’ Backend (fallback) β†’ Local (final fallback)
  5. Response Processing: JSON validation, risk level assignment
  6. Output: Visual overlay + TTS (ElevenLabs β†’ Chrome TTS fallback)

πŸš€ Quick Start

Extension Setup

  1. Load Extension in Chrome

    # Navigate to extension folder
    cd intent-reader/extension
    
    # Open Chrome and go to chrome://extensions/
    # Enable "Developer mode"
    # Click "Load unpacked" and select the extension folder
  2. Configure API Key (Required for AI features)

    • Create extension/env.json with your OpenRouter API key:
    {
      "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
    }
  3. Configure Settings

    • Click the Blindly icon in Chrome toolbar
    • Toggle Canada Mode (recommended for Canadian sites)
    • Enable Privacy Mode (redacts sensitive data - default: ON)
    • Enable Auto-Read Intent (automatic voice feedback - default: ON)
  4. Test with Demo Pages

    • Open demo-pages/checkout.html for payment detection
    • Open demo-pages/delete-account.html for irreversibility warnings
    • Open demo-pages/gov-form.html for Canada Mode (ensure it's enabled)

Keyboard Shortcuts

  • Ctrl+Shift+I: Read intent of currently focused element
  • Ctrl+Shift+S: Summarize form before submission
  • Ctrl+Shift+V: Toggle voice companion (speech-to-text form filling)

πŸ“Š Signal Detection

Payment Detection

  • Keywords: pay, checkout, total, tax, billing, charge, purchase
  • Providers: Stripe, PayPal, Interac, Shop Pay, Apple Pay
  • Field Patterns: 16-digit card numbers, CVV fields, expiry dates
  • Result: HIGH risk + clear warning

Irreversibility Detection

  • Keywords: delete, permanent, cannot be undone, irreversible
  • Context: account settings, danger zones
  • Result: HIGH risk + consequence explanation

Form Submission

  • Signals: <form> with action, submit buttons
  • Analysis: Field count, required fields, consent checkboxes
  • Result: MEDIUM risk + field summary

Workflow Navigation

  • Signals: "Step X of Y", progress bars, breadcrumbs
  • Prediction: Next step in flow (payment β†’ review β†’ confirmation)
  • Result: LOW-MEDIUM risk + next page type

Canadian Context (Canada Mode)

  • Interac Keywords: e-transfer, autodeposit, Interac Online
  • Gov Domains: canada.ca, ontario.ca, cra-arc.gc.ca, serviceontario
  • Postal Codes: A1A 1A1 pattern validation
  • Result: Enhanced warnings for Canadian services

πŸ”’ Privacy & Responsible AI

Local Redaction (Privacy Mode)

Before sending data to backend, we redact:

  • Credit card numbers β†’ [CARD]
  • SSN/SIN β†’ [SSN]
  • Email addresses β†’ [EMAIL]
  • Phone numbers β†’ [PHONE]

Transparency

  • Users can toggle Privacy Mode in settings
  • Clear indication when data is sent to AI service
  • Fallback to local analysis if backend unavailable

What We Send

Only send to Gemini:

  • Element tag, type, and redacted text
  • Detected signals (boolean flags, not raw data)
  • Form field labels (not values)
  • Domain and URL (not credentials)

What We DON'T Send

  • Form input values
  • Passwords
  • Credit card details
  • Personal information

πŸ… The Blindly Difference

Accessibility

  • All features accessible via keyboard (no mouse required)
  • Tab navigation through overlay or key shortcuts and escape to close
  • Concise spoken summaries (10-15 words) with Real Human Voice
  • Enhanced detection for Canadian services
  • Automatic voice feedback on element focus/hover

Proud to be Canadian

  • Canadian Payment and Interac e-Transfer detection
  • Canadian Government domain, health keywords, and more recognition
  • Postal code validation
  • Support for Canadian accessibility requirements
  • Tested and specialized for Canadian government websites and forms

Innovation

  • Novel approach: predictive intent analysis vs. reactive screen reading
  • Combines deterministic heuristics with AI reasoning
  • Workflow-aware: understands multi-step processes

Technical Excellence

  • Clean architecture: extension ↔ backend separation
  • Efficient: local analysis for high-confidence cases
  • Robust: fallbacks at every layer (ElevenLabs β†’ Chrome TTS)

User Experience

  • Accessible by design (WCAG AAA compliance)
  • Non-intrusive overlay
  • Clear, actionable information
  • Works without keyboard/mouse

Responsible AI

  • Privacy Mode with local redaction
  • User control over all features

πŸ› οΈ Technologies Used

  • Frontend: JavaScript (ES6+), Chrome Extension API (Manifest V3)
  • AI Integration:
    • Google Gemini 1.5 Flash (via OpenRouter API - primary)
    • Google Gemini API (direct - backend fallback)
  • Voice Synthesis: ElevenLabs TTS API + Chrome TTS (fallback)
  • Backend: Python 3.11, FastAPI, Uvicorn (optional)
  • Cloud: Google Cloud Run, Secret Manager
  • Deployment: Docker, gcloud CLI
  • Architecture: Multi-layer fallback system for reliability

πŸ“ Future Enhancements

  • OCR for image-based buttons
  • Multi-language support (French, Spanish)
  • Browser extension for Firefox, Edge
  • Mobile app (React Native)
  • Improve localAnalysis and GeminiAnalysis for faster responses
  • Custom companion training and personalization for user preferences
  • Machine learning model for workflow prediction
  • Integration with PDF files, Excel, etc.

Blindly - Making the web more predictable and accessible, one interaction at a time. ⚑

About

Workflow-Aware Screen Reader Companion for Predictive UI Intent Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •