Glimmer

An AI-Powered Visual Assistant for Visually Impaired Users

Features • Installation • Usage • Architecture • Development

📖 Overview

Glimmer is a native iOS accessibility application prototype designed specifically for visually impaired users. It combines real-time computer vision with voice interaction to provide an intelligent visual assistant that runs entirely on-device using Apple's MLX framework and Qwen 3.5 Vision Language Model.

🖼 Example Screenshots

🎯 Core Capabilities

Real-time Visual Description: Continuous scene understanding using on-device AI
Voice-First Interaction: Press-and-hold voice input with automatic TTS management
On-Device Processing: Privacy-focused local inference using MLX
Adaptive Captioning: Smart deduplication and throttling to prevent audio queue overflow
Bilingual Support: Native Chinese language support with English capabilities
Extensible Architecture: Ready for cloud-backend integration when needed

✨ Features

Visual Intelligence

Live Camera Feed: Upper screen displays real-time camera preview
AI-Powered Descriptions: Near real-time scene analysis using Qwen 3.5 VLM (0.8B quantized)
Smart Captioning: Automatic filtering of redundant descriptions
Visual Context Awareness: Maintains short-term visual memory for Q&A

Voice Interaction

Priority-Based Audio: Voice input automatically pauses visual descriptions
System TTS Integration: Native iOS text-to-speech with on-screen captions
Multimodal Q&A: Combines user questions with visual context for intelligent responses
Hands-Free Operation: Large touch-and-hold button for easy voice activation

Privacy & Performance

100% On-Device: No data leaves your iPhone
Optimized for iPhone: Tested on iPhone 17 Pro with iOS 26
Battery Efficient: Adaptive throttling balances performance and power consumption
Offline Capable: Works without internet connection after initial model download

🚀 Installation

Prerequisites

Hardware: iPhone 15 or later (iPhone 17 Pro recommended)
OS: iOS 17.0+
Tools: Xcode 15.0+, XcodeGen

Build from Source

Clone the repository

git clone https://github.com/yourusername/glimmer.git
cd glimmer

Generate Xcode project
```
xcodegen generate
```
Open in Xcode
```
open Glimmer.xcodeproj
```
Configure signing
- Select your development team in project settings
- Ensure proper code signing certificates
Build and run
- Connect your iPhone
- Select device as run destination
- Press Cmd+R to build and run

First Launch

On first run, the app will:

Request camera and microphone permissions
Download the Qwen 3.5 model (~500MB) - requires stable internet
Cache the model locally for offline use

📱 Usage

Basic Operation

Launch the app - Grant camera and microphone permissions when prompted
Point camera at scene - AI will automatically start describing what it sees
Listen to descriptions - Captions appear on screen while being spoken
Ask questions:
- Press and hold the large button (bottom half of screen)
- Speak your question in Chinese or English
- Release to get AI-powered answer based on current visual context

Best Practices

Stable Internet: Ensure good connectivity for first-time model download
Portrait Mode: Keep phone vertical for optimal camera framing
Lighting: Works best in well-lit environments
Distance: Hold phone at comfortable viewing distance (30-100cm from objects)

Performance Tuning

If you experience lag:

Switch to 5-bit model for faster inference
Increase captionProcessingInterval to 2.2s in settings

For better quality:

Use 8-bit model for improved accuracy
Consider backend integration for complex queries

🏗 Architecture

Project Structure

Glimmer/
├── App/                          # iOS application layer
│   ├── Views/                   # SwiftUI views
│   ├── ViewModels/              # Application state management
│   └── Services/                # Camera, Speech, TTS services
├── Sources/GlimmerCore/         # Core reusable framework
│   ├── Configuration/           # Model & prompt configs
│   ├── Domain/                  # Protocol definitions
│   ├── Inference/               # MLX engine & backends
│   └── Speech/                  # Caption & TTS policies
├── Tests/GlimmerCoreTests/      # Unit tests
└── docs/                        # Documentation

Key Components

1. Real-Time Visual Pipeline

CameraService → AppViewModel → FrameSnapshotWriter → LocalMLXAssistantEngine
                                                    ↓
                                            Qwen 3.5 VLM (MLX)
                                                    ↓
                                        CaptionSpeechPolicy → TTS

2. Voice Interaction Flow

User Press → Speech Recognition → Question + Visual Summary + Frame
                                                    ↓
                                        LocalMLXAssistantEngine
                                                    ↓
                                            Answer → TTS → Resume Visual Loop

Default Model

Current: Qwen3.5-0.8B

Why this model?

True vision-language model (image-text-to-text)
Optimized for MLX framework
0.8B parameters balance quality and speed on iPhone

Change model in: Sources/GlimmerCore/Configuration/AssistantModelConfiguration.swift

🛠 Development

Running Tests

# Core logic tests
swift test

# Full build verification
xcodebuild \
  -project Glimmer.xcodeproj \
  -scheme Glimmer \
  -destination 'generic/platform=iOS' \
  build

Local Release (No GitHub Actions)

Build and publish directly from your local machine:

# 1) Build unsigned archive
mkdir -p build
xcodebuild -project Glimmer.xcodeproj \
   -scheme Glimmer \
   -configuration Release \
   -destination 'generic/platform=iOS' \
   -archivePath build/Glimmer.xcarchive \
   archive \
   CODE_SIGNING_ALLOWED=NO \
   CODE_SIGNING_REQUIRED=NO \
   CODE_SIGN_IDENTITY=""

# 2) Package IPA
APP_PATH="build/Glimmer.xcarchive/Products/Applications/Glimmer.app"
mkdir -p build/Payload
cp -R "$APP_PATH" build/Payload/
(cd build && /usr/bin/zip -qry Glimmer-unsigned.ipa Payload)

# 3) Create release from local artifact (replace vX.Y.Z)
gh release create vX.Y.Z build/Glimmer-unsigned.ipa \
   --title "vX.Y.Z" \
   --generate-notes

Requirements:

GitHub CLI installed and authenticated (gh auth login)
Tag vX.Y.Z should match your release version

Backend Integration

The architecture supports seamless backend switching:

Protocol: AssistantEngine (in GlimmerCore/Domain/)
Local Implementation: LocalMLXAssistantEngine
Remote Stub: RemoteAssistantEngine (ready for implementation)

To integrate a cloud backend:

// In AppViewModel initialization
let engine = RemoteAssistantEngine(apiEndpoint: "https://your-api.com")

No changes needed in UI, camera, or speech layers.

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

⚠️ Known Limitations

Audio Session Conflicts: Simultaneous TTS and speech recognition may interfere on some iOS versions
Metal Performance: Inference speed varies by device thermal state
Model Download: First launch requires ~500MB download over stable network
Language Support: Optimized for Chinese; English support is experimental
Camera Orientation: Currently designed for portrait mode only

📋 Device Testing Checklist

Before release, verify on physical iPhone:

Full checklist: docs/DEVICE_TEST_CHECKLIST.md

🗺 Roadmap

Multi-language support (English, Spanish, etc.)
Landscape mode optimization
Object detection with haptic feedback
Customizable voice profiles
Cloud sync for conversation history
watchOS companion app
Accessibility shortcuts integration

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

MLX: Apple's ML framework for efficient on-device inference
Qwen Team: For open-sourcing the Qwen 3.5 vision-language models
Hugging Face: Model hosting and community support
Accessibility Community: For invaluable feedback and testing

📧 Contact

Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ for the visually impaired community

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
App		App
Glimmer.xcodeproj		Glimmer.xcodeproj
Sources/GlimmerCore		Sources/GlimmerCore
Tests/GlimmerCoreTests		Tests/GlimmerCoreTests
Vendor		Vendor
img		img
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
TODO.md		TODO.md
project.yml		project.yml

Folders and files

Latest commit

History

Repository files navigation

Glimmer

📖 Overview

🖼 Example Screenshots

🎯 Core Capabilities

✨ Features

Visual Intelligence

Voice Interaction

Privacy & Performance

🚀 Installation

Prerequisites

Build from Source

First Launch

📱 Usage

Basic Operation

Best Practices

Performance Tuning

🏗 Architecture

Project Structure

Key Components

1. Real-Time Visual Pipeline

2. Voice Interaction Flow

Default Model

🛠 Development

Running Tests

Local Release (No GitHub Actions)

Backend Integration

Contributing

⚠️ Known Limitations

📋 Device Testing Checklist

🗺 Roadmap

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages