Skip to content

Chamstin/Glimmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Glimmer

An AI-Powered Visual Assistant for Visually Impaired Users

iOS Swift MLX License

Features β€’ Installation β€’ Usage β€’ Architecture β€’ Development


πŸ“– Overview

Glimmer is a native iOS accessibility application prototype designed specifically for visually impaired users. It combines real-time computer vision with voice interaction to provide an intelligent visual assistant that runs entirely on-device using Apple's MLX framework and Qwen 3.5 Vision Language Model.

πŸ–Ό Example Screenshots

Glimmer app example screenshot 1 Glimmer app example screenshot 2

🎯 Core Capabilities

  • Real-time Visual Description: Continuous scene understanding using on-device AI
  • Voice-First Interaction: Press-and-hold voice input with automatic TTS management
  • On-Device Processing: Privacy-focused local inference using MLX
  • Adaptive Captioning: Smart deduplication and throttling to prevent audio queue overflow
  • Bilingual Support: Native Chinese language support with English capabilities
  • Extensible Architecture: Ready for cloud-backend integration when needed

✨ Features

Visual Intelligence

  • Live Camera Feed: Upper screen displays real-time camera preview
  • AI-Powered Descriptions: Near real-time scene analysis using Qwen 3.5 VLM (0.8B quantized)
  • Smart Captioning: Automatic filtering of redundant descriptions
  • Visual Context Awareness: Maintains short-term visual memory for Q&A

Voice Interaction

  • Priority-Based Audio: Voice input automatically pauses visual descriptions
  • System TTS Integration: Native iOS text-to-speech with on-screen captions
  • Multimodal Q&A: Combines user questions with visual context for intelligent responses
  • Hands-Free Operation: Large touch-and-hold button for easy voice activation

Privacy & Performance

  • 100% On-Device: No data leaves your iPhone
  • Optimized for iPhone: Tested on iPhone 17 Pro with iOS 26
  • Battery Efficient: Adaptive throttling balances performance and power consumption
  • Offline Capable: Works without internet connection after initial model download

πŸš€ Installation

Prerequisites

  • Hardware: iPhone 15 or later (iPhone 17 Pro recommended)
  • OS: iOS 17.0+
  • Tools: Xcode 15.0+, XcodeGen

Build from Source

  1. Clone the repository

    git clone https://github.com/yourusername/glimmer.git
    cd glimmer
  2. Generate Xcode project

    xcodegen generate
  3. Open in Xcode

    open Glimmer.xcodeproj
  4. Configure signing

    • Select your development team in project settings
    • Ensure proper code signing certificates
  5. Build and run

    • Connect your iPhone
    • Select device as run destination
    • Press Cmd+R to build and run

First Launch

On first run, the app will:

  1. Request camera and microphone permissions
  2. Download the Qwen 3.5 model (~500MB) - requires stable internet
  3. Cache the model locally for offline use

πŸ“± Usage

Basic Operation

  1. Launch the app - Grant camera and microphone permissions when prompted
  2. Point camera at scene - AI will automatically start describing what it sees
  3. Listen to descriptions - Captions appear on screen while being spoken
  4. Ask questions:
    • Press and hold the large button (bottom half of screen)
    • Speak your question in Chinese or English
    • Release to get AI-powered answer based on current visual context

Best Practices

  • Stable Internet: Ensure good connectivity for first-time model download
  • Portrait Mode: Keep phone vertical for optimal camera framing
  • Lighting: Works best in well-lit environments
  • Distance: Hold phone at comfortable viewing distance (30-100cm from objects)

Performance Tuning

If you experience lag:

  • Switch to 5-bit model for faster inference
  • Increase captionProcessingInterval to 2.2s in settings

For better quality:

  • Use 8-bit model for improved accuracy
  • Consider backend integration for complex queries

πŸ— Architecture

Project Structure

Glimmer/
β”œβ”€β”€ App/                          # iOS application layer
β”‚   β”œβ”€β”€ Views/                   # SwiftUI views
β”‚   β”œβ”€β”€ ViewModels/              # Application state management
β”‚   └── Services/                # Camera, Speech, TTS services
β”œβ”€β”€ Sources/GlimmerCore/         # Core reusable framework
β”‚   β”œβ”€β”€ Configuration/           # Model & prompt configs
β”‚   β”œβ”€β”€ Domain/                  # Protocol definitions
β”‚   β”œβ”€β”€ Inference/               # MLX engine & backends
β”‚   └── Speech/                  # Caption & TTS policies
β”œβ”€β”€ Tests/GlimmerCoreTests/      # Unit tests
└── docs/                        # Documentation

Key Components

1. Real-Time Visual Pipeline

CameraService β†’ AppViewModel β†’ FrameSnapshotWriter β†’ LocalMLXAssistantEngine
                                                    ↓
                                            Qwen 3.5 VLM (MLX)
                                                    ↓
                                        CaptionSpeechPolicy β†’ TTS

2. Voice Interaction Flow

User Press β†’ Speech Recognition β†’ Question + Visual Summary + Frame
                                                    ↓
                                        LocalMLXAssistantEngine
                                                    ↓
                                            Answer β†’ TTS β†’ Resume Visual Loop

Default Model

Current: Qwen3.5-0.8B

Why this model?

  • True vision-language model (image-text-to-text)
  • Optimized for MLX framework
  • 0.8B parameters balance quality and speed on iPhone

Change model in: Sources/GlimmerCore/Configuration/AssistantModelConfiguration.swift


πŸ›  Development

Running Tests

# Core logic tests
swift test

# Full build verification
xcodebuild \
  -project Glimmer.xcodeproj \
  -scheme Glimmer \
  -destination 'generic/platform=iOS' \
  build

Local Release (No GitHub Actions)

Build and publish directly from your local machine:

# 1) Build unsigned archive
mkdir -p build
xcodebuild -project Glimmer.xcodeproj \
   -scheme Glimmer \
   -configuration Release \
   -destination 'generic/platform=iOS' \
   -archivePath build/Glimmer.xcarchive \
   archive \
   CODE_SIGNING_ALLOWED=NO \
   CODE_SIGNING_REQUIRED=NO \
   CODE_SIGN_IDENTITY=""

# 2) Package IPA
APP_PATH="build/Glimmer.xcarchive/Products/Applications/Glimmer.app"
mkdir -p build/Payload
cp -R "$APP_PATH" build/Payload/
(cd build && /usr/bin/zip -qry Glimmer-unsigned.ipa Payload)

# 3) Create release from local artifact (replace vX.Y.Z)
gh release create vX.Y.Z build/Glimmer-unsigned.ipa \
   --title "vX.Y.Z" \
   --generate-notes

Requirements:

  • GitHub CLI installed and authenticated (gh auth login)
  • Tag vX.Y.Z should match your release version

Backend Integration

The architecture supports seamless backend switching:

  1. Protocol: AssistantEngine (in GlimmerCore/Domain/)
  2. Local Implementation: LocalMLXAssistantEngine
  3. Remote Stub: RemoteAssistantEngine (ready for implementation)

To integrate a cloud backend:

// In AppViewModel initialization
let engine = RemoteAssistantEngine(apiEndpoint: "https://your-api.com")

No changes needed in UI, camera, or speech layers.

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

⚠️ Known Limitations

  • Audio Session Conflicts: Simultaneous TTS and speech recognition may interfere on some iOS versions
  • Metal Performance: Inference speed varies by device thermal state
  • Model Download: First launch requires ~500MB download over stable network
  • Language Support: Optimized for Chinese; English support is experimental
  • Camera Orientation: Currently designed for portrait mode only

πŸ“‹ Device Testing Checklist

Before release, verify on physical iPhone:

  • Camera permission granted and live preview working
  • Microphone permission granted
  • Model downloaded successfully
  • Visual descriptions play through TTS
  • Captions display correctly on screen
  • Voice input activates and transcribes accurately
  • Q&A responses are contextually relevant
  • No audio conflicts between TTS and speech recognition
  • App remains responsive under continuous use (10+ minutes)
  • Memory usage stable without leaks

Full checklist: docs/DEVICE_TEST_CHECKLIST.md


πŸ—Ί Roadmap

  • Multi-language support (English, Spanish, etc.)
  • Landscape mode optimization
  • Object detection with haptic feedback
  • Customizable voice profiles
  • Cloud sync for conversation history
  • watchOS companion app
  • Accessibility shortcuts integration

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • MLX: Apple's ML framework for efficient on-device inference
  • Qwen Team: For open-sourcing the Qwen 3.5 vision-language models
  • Hugging Face: Model hosting and community support
  • Accessibility Community: For invaluable feedback and testing

πŸ“§ Contact


Made with ❀️ for the visually impaired community

⬆ Back to Top

About

On-device assistive vision app for iPhone

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors