Skip to content

The app converts a sound wave (from the mic) into a mel-spectrogram image that serves as the main feature fed into a Convolutional Neural Network that will then classify the sound into one of eight classes. Average inference time is about 15 ms so the user never has to worry about missing a beat and the app can also be synced with a wearable dev…

License

sugamxp/virtual-assistant-accessibility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI for Accessibility: Virtual Assistant for Hearing Impaired

We set out to create an impactful solution for anyone who can benefit from improved accessibility to everyday sound events. Our mobile application uses artificial intelligence to recognize key sound events of interest to the community such as emergency vehicle sirens and door knocks where immediate alerts and continuous logging is critical for the user. While there are many audio accessibility innovations in the app space, up until the time of writing it has been mostly in the areas of sound amplification and text to speech/speech to text. This app is optimized for Android with low-latency so that it works in real-time for the user.

The app converts a sound wave (from the mic) into a mel-spectrogram image that serves as the main feature fed into a Convolutional Neural Network that will then classify the sound into one of eight classes. Average inference time is about 15 ms so the user never has to worry about missing a beat and the app can also be synced with a wearable device.

UI Screenshots

activated prediction
settings event selector

Pipeline Overview

pipeline

Performance Overview

  • 110 MB Peak Memory Usage
  • 5% Average, 10% Peak CPU Usage
  • 10-15% Battery Life Penalty

Algorithmic performance:

pipeline

Suggested Contributions

  1. Enable "wake word" detection based on user's name
  2. Cross-platform support
  3. Sensitivity (threshold tuning)
  4. General accuracy improvements with minimal power usage penalty

About

The app converts a sound wave (from the mic) into a mel-spectrogram image that serves as the main feature fed into a Convolutional Neural Network that will then classify the sound into one of eight classes. Average inference time is about 15 ms so the user never has to worry about missing a beat and the app can also be synced with a wearable dev…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published