🤖 DAIN Voice Assistant Feature is an extension for DAIN’s AI platform, designed to provide hands-free, voice-based control for navigating and interacting with computers. This project aims to enhance accessibility and multitasking for users, allowing commands to be executed through voice, which is especially beneficial for those with disabilities or those who require hands-free control while performing tasks.
DAIN is a leading AI company dedicated to creating a Large Language Model (LLM) with a diverse range of features to cater to various user needs. DAIN’s platform enables both clients and developers to utilize and extend its functionalities:
- Platform Link: DAIN Platform
- Documentation: DAIN Software Documentation
Clients can interact with pre-built features, while software engineers can log in to develop new features, extending DAIN’s capabilities.
The initial concept for this project was to develop a voice-activated AI assistant that could handle various tasks on behalf of the user, accommodating different types of users, especially those performing tasks or those with disabilities. Key features include:
- Voice-Based Navigation: Allow users to navigate the internet using voice commands.
- Voice Reply: DAIN responds with spoken feedback to enhance interactivity.
- Accessibility Focus: Built with accessibility in mind to assist users with disabilities, providing an alternative to traditional mouse and keyboard navigation.
For more details, check the original project proposal document: DAIN Voice Assistant Ideas.
Thanks to our sponsor, DAIN, we have successfully integrated a voice-interaction feature, enabling users to control various actions on their computers using DAIN’s platform.
We explored three different approaches to create a seamless, voice-activated experience:
- Chrome Extension or OS Application: Developed to interact with DAIN, allowing users to control their computers with voice commands via the Chrome browser.
- API Integration: An API was deployed to communicate user actions to DAIN, which, in turn, controlled the operating system.
- Direct System Control with AST NPL Library: This approach utilized the
astlibrary to execute OS commands directly, such as mouse movements and scrolling.
Each method was tested to determine which approach provided the most efficient user experience.
Our tech stack included a combination of web and backend technologies, as well as DAIN’s platform-specific tools:
- Chrome Extension Development: Built with JavaScript, HTML, and SCSS to enable a user-friendly, voice-activated web browsing experience.
- Backend (API Integration): Developed using pyshell in JavaScript to enable direct communication between DAIN’s feature and the operating system for executing commands.
- System Control with AST Library: Used Python and the AST NPL library to control OS-level actions directly, including mouse movements and scroll functions.
- DAIN Platform Integration: DAIN’s platform primarily runs on TypeScript. Our team had limited access to the core codebase and was able to add features but not alter DAIN’s main code.
- JavaScript: For frontend Chrome extension development.
- HTML & SCSS: To design and style the Chrome extension interface.
- pyshell:: Used in JavaScript to interface with Python commands for controlling the OS.
- AST NPL Library: A Python library that enables system-level control for executing commands like scrolling and clicking.
- TypeScript: Used in DAIN’s software ecosystem, which we worked with to extend, though with limited modification permissions.
We developed an API that received requests from DAIN’s feature and translated them into actionable commands for the OS. Below are some of the initial endpoints used:
- Scroll Up:
http://localhost/action/scroll/up - Scroll Down:
http://localhost/action/scroll/down - Click at Specific Coordinates:
http://localhost/action/click/128/100http://localhost/action/click/{x-coordinate}/{y-coordinate}
- Check Action Endpoint:
http://127.0.0.1:5000/check_actionor{ "action": null, "direction": null }or{ "action": "scroll", "direction": "up" }These endpoints and responses were essential for testing and ensuring effective communication between DAIN’s AI and the user’s operating system.{ "action": "scroll", "direction": "down" }
For inspiration and insights into enabling browser interaction through voice, we referred to this repository:
This project served as a guide for creating our initial API requests and understanding browser control possibilities.
Check out the project details on Devpost:
This link provides an overview of the project, including features, development insights, and team contributions.




