Skip to content

artiso-solutions/CoVoX

Repository files navigation

CoVoX

net5.0 CI License MIT

Cloud enabled library providing a customizable voice-interface for your application or your device

CoVoXSimpleGraph

Covox allows the interaction with an application or device through voice.
You provide a list of Commands, i.e. operations that can be invoked via the voice interface, Covox then listens to the audio and when a command matches with the spoken words, it's executed. It also has multi-language support!

CoVoXMultiLanguageGraph

With some imagination you could speak to a calculator, a virtual assistant, or a CRM application!

How it works

CoVoXHowItWorks

  1. define commands and provide them to a CovoxEngine instance
  2. start the audio capture by calling the method covox.StartAsync
  3. covox will translate and recognize the input, and then it will emit the event Recognized
  4. execute the logic connected to the detected command

Getting started

Covox is offered as .NET library and acts on behalf of the Azure Cognitive Services, therefore to use it you will need:

  • an Azure Cognitive Services subscription key (follow this guideline)
  • a .NET project or application
  • a device connected to internet
  • a device with a working microphone

In order to get started, take a look at the samples.

How to use

Consider a simple use case: a voice-controlled light-switching application.

  • Define the available commands, with unique IDs and one or many voice triggers (in English):
var turnOnLightCmd = new Command
{
    Id = "TurnOnLight",
    VoiceTriggers = new[] { "turn on the light", "light on", "on" }
};

var turnOffLightCmd = new Command
{
    Id = "TurnOffLight",
    VoiceTriggers = new[] { "turn off the light", "light off", "off" }
};
  • Create an instance of CovoxEngine:
var covox = new CovoxEngine(new Configuration
{
    AzureConfiguration = AzureConfiguration.FromSubscription(
        subscriptionKey: YOUR_SUBSCRIPTION_KEY,
        region: YOUR_REGION),

    // Define all the languages that can be regognized
    InputLanguages = new[] { "en-US", "de-DE", "it-IT", "es-ES" },
});

covox.RegisterCommands(turnOnLightCmd, turnOffLightCmd);
  • Define a delegate for when a command is recognized:
covox.Recognized += (cmd, ctx) =>
{
    if (cmd == turnOnLightCmd) { /* ... */ }
    else if (cmd == turnOffLightCmd) { /* ... */ }
};

await covox.StartAsync();

Use case scenarios

Basic

LightSwitch

(source) Basic showcase of the engine and commands invocation.

Commands

  • turn on the lights
    output: "Light on"
  • turn off the lights
    output: "Light off"

Web application

Pac-Scream

Pac-Scream is a variant on the popular game Pac-Man, in which movements are defined via voice commands instead of keys press.

image

Commands

  • left / move left
  • right / move right
  • up / move up
  • down / move down
  • stop / cancel / no
    to cancel the previous command

Technologies

  • CoVoX engine
  • ASP.NET Core 5
  • SignalR
  • WebGL

Mobile application

Find-it

Find-it it's a Mobile App that is able to recognize objects in an image, or in a video, from user voice request. Given an image or a video, if the user requests to see a particular object, the application will create a box around the object that match the description.

Technologies


AI/Machine Learning

Guess-Who

Guess Who is a game for 2 players. Each player has a "playing field" with different people and a fixed person, which must be guessed by the opponent, by exclusion questions. Via Voice commands you should be able to ask a question, such as, "Does the woman have red hair?" Image recognition should then return the answer yes / no.

drawing

Procedure

  1. Asking a Question via Voice Command
  2. Recognize and process question
  3. Looking at e.g. Image and detect the answer
  4. Returning Answer (Yes / No)

Technologies

  • CoVoX engine
  • Python / Tensorflow
  • Face

Security

Voice-Unlock

Voice-Unlock showcases the voice recognition service from azure. An application will display a locked lock. If the authorized user says "Unlock", the lock should unlock. Instead, if an unauthorized users says "Unlock" the background flashes a few seconds in red.

Technologies


External device

Robobutler

Robobutler is a robot capable of executing voice triggered actions based on its perception of the current environment. The idea is that an operator can tell the robot to "Bring me the yellow box" and the robot will in this case do the following:

  1. Confirm/Repeat the task the robot was told to do
  2. Go to the yellow box
  3. Pick it up
  4. Bring it to the operator

Other possible scenarios

  • Placing a box on top of another
  • Basic movements (Stop, rotate, etc)
  • Spatial awarness (e.g. go to the nearest corner)

Benefit to the real world

In the real world you could have a warehouse with a lot of heavy weight packages. Working in a human-robot collaboration environment the human would be able to control the robot either with a controller or by voice. Adding intelligence to the robot does simplify the interaction with the robot increasing the overall productivity and performance of the human and the facility. Furthermore it enables the human do multitask.

Robo to use

https://www.dji.com/de/robomaster-s1

The desired configuration would be an industrial arm on top of a body with wheels to represent a valid scenario for the industry.

Technologies


Technologies

The library is developed in .NET 5 and uses the Azure's Cognitive Services.