DAIN Voice Assistant Feature

🤖 DAIN Voice Assistant Feature is an extension for DAIN’s AI platform, designed to provide hands-free, voice-based control for navigating and interacting with computers. This project aims to enhance accessibility and multitasking for users, allowing commands to be executed through voice, which is especially beneficial for those with disabilities or those who require hands-free control while performing tasks.

1st Place Winners of the DAIN Hackathon at USC 🎉

📖 About DAIN's Software

DAIN is a leading AI company dedicated to creating a Large Language Model (LLM) with a diverse range of features to cater to various user needs. DAIN’s platform enables both clients and developers to utilize and extend its functionalities:

Platform Link: DAIN Platform
Documentation: DAIN Software Documentation

Clients can interact with pre-built features, while software engineers can log in to develop new features, extending DAIN’s capabilities.

💡 Project Concept

The initial concept for this project was to develop a voice-activated AI assistant that could handle various tasks on behalf of the user, accommodating different types of users, especially those performing tasks or those with disabilities. Key features include:

Voice-Based Navigation: Allow users to navigate the internet using voice commands.
Voice Reply: DAIN responds with spoken feedback to enhance interactivity.
Accessibility Focus: Built with accessibility in mind to assist users with disabilities, providing an alternative to traditional mouse and keyboard navigation.

For more details, check the original project proposal document: DAIN Voice Assistant Ideas.

Thanks to our sponsor, DAIN, we have successfully integrated a voice-interaction feature, enabling users to control various actions on their computers using DAIN’s platform.

🛠️ Development Process

We explored three different approaches to create a seamless, voice-activated experience:

Chrome Extension or OS Application: Developed to interact with DAIN, allowing users to control their computers with voice commands via the Chrome browser.
API Integration: An API was deployed to communicate user actions to DAIN, which, in turn, controlled the operating system.
Direct System Control with AST NPL Library: This approach utilized the ast library to execute OS commands directly, such as mouse movements and scrolling.

Each method was tested to determine which approach provided the most efficient user experience.

💻 Technology Stack

Our tech stack included a combination of web and backend technologies, as well as DAIN’s platform-specific tools:

Chrome Extension Development: Built with JavaScript, HTML, and SCSS to enable a user-friendly, voice-activated web browsing experience.
Backend (API Integration): Developed using pyshell in JavaScript to enable direct communication between DAIN’s feature and the operating system for executing commands.
System Control with AST Library: Used Python and the AST NPL library to control OS-level actions directly, including mouse movements and scroll functions.
DAIN Platform Integration: DAIN’s platform primarily runs on TypeScript. Our team had limited access to the core codebase and was able to add features but not alter DAIN’s main code.

Key Technologies

JavaScript: For frontend Chrome extension development.
HTML & SCSS: To design and style the Chrome extension interface.
pyshell:: Used in JavaScript to interface with Python commands for controlling the OS.
AST NPL Library: A Python library that enables system-level control for executing commands like scrolling and clicking.
TypeScript: Used in DAIN’s software ecosystem, which we worked with to extend, though with limited modification permissions.

🌐 API Endpoints

We developed an API that received requests from DAIN’s feature and translated them into actionable commands for the OS. Below are some of the initial endpoints used:

Example API URLs

Scroll Up: http://localhost/action/scroll/up
Scroll Down: http://localhost/action/scroll/down
Click at Specific Coordinates:
- http://localhost/action/click/128/100
- http://localhost/action/click/{x-coordinate}/{y-coordinate}

API Responses

Check Action Endpoint: http://127.0.0.1:5000/check_action
```
{
  "action": null,
  "direction": null
}
```
or
```
{
"action": "scroll",
"direction": "up"
}
```
or
```
{
"action": "scroll",
"direction": "down"
}
```
These endpoints and responses were essential for testing and ensuring effective communication between DAIN’s AI and the user’s operating system.

🔗 Resources and References

For inspiration and insights into enabling browser interaction through voice, we referred to this repository:

GitHub Repository: Browser Control HackSC

This project served as a guide for creating our initial API requests and understanding browser control possibilities.

🔗 Project on Devpost

Check out the project details on Devpost:

Voicify on Devpost

This link provides an overview of the project, including features, development insights, and team contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
README.md		README.md
dain.json		dain.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DAIN Voice Assistant Feature

1st Place Winners of the DAIN Hackathon at USC 🎉

📖 About DAIN's Software

💡 Project Concept

🛠️ Development Process