Vision is a browser-based AI assistant designed to support accessibility, daily organization, and voice-first interaction, with a focus on assisting visually impaired users. It combines voice recognition, text-to-speech, local data storage, and Google Gemini AI to provide an all-in-one personal assistant experience—entirely in the browser.
This project is currently a front-end prototype intended for learning, experimentation, and future expansion.
- One-tap emergency activation
- Custom emergency contact and message
- Audio confirmation and visual alert feedback
Note: This is a simulation. No real calls or messages are sent.
- Speech-to-text via browser APIs
- AI-generated responses using Google Gemini
- Spoken responses with adjustable voice speed
- Context-aware assistance (schedule, health, items, etc.)
- Add, view, and delete events
- Date, time, and notes support
- Persistent storage using
localStorage
- Log health metrics (e.g. blood pressure, medication, heart rate)
- Timestamped entries
- Quick review of recent records
- Track income and expenses
- Categories and notes
- Automatic balance summary
- Save item locations (e.g. keys, wallet)
- Search items by name
- Spoken location feedback when an item is found
- Emergency contact configuration
- Emergency message customization
- Voice speed control for speech synthesis
-
HTML5 / CSS3 / Vanilla JavaScript
-
Web Speech API
SpeechRecognitionSpeechSynthesis
-
Google Gemini API
-
Browser Local Storage
-
Responsive UI with accessibility considerations
git clone https://github.com/your-username/vision-ai-assistant.git
cd vision-ai-assistantOpen the HTML file and replace:
const GEMINI_API_KEY = 'YOUR_GEMINI_API_KEY_HERE';with your actual API key from Google AI Studio.
Simply open the HTML file in a modern browser:
- Chrome (recommended)
- Edge
- Brave
⚠️ Voice recognition requires a Chromium-based browser.
- This project runs entirely client-side
- No backend or authentication
- Emergency functionality is simulated
- Voice recognition is browser-dependent
- Data is stored locally per browser/device
- Not intended for medical or emergency-critical use
-
No user data is sent anywhere except:
- Voice prompts sent to Google Gemini (when enabled)
-
All personal data is stored locally in the browser
Prototype / Proof of Concept
Planned future improvements:
- Backend integration
- Real emergency service hooks
- Camera-based visual assistance
- Offline mode
- User accounts and cloud sync
This project is released under the MIT License. You are free to use, modify, and distribute it with attribution.
- Google Gemini API
- Web Speech API
- Accessibility-first design principles