idiolect
A general purpose voice user interface for the IntelliJ Platform, inspired by Tavis Rudd. Possible use cases: visually impaired and RSI users. Originally developed as part of a JetBrains hackathon, it is now a community-supported project. For background information, check out this presentation.
Usage
To get started, press the button in the toolbar, then speak a command, e.g. "Hi, IDEA!" Idiolect supports a simple grammar. For a complete list of commands, please refer to the wiki. Click the button once more to deactivate.
Building
For Linux or macOS users:
git clone https://github.com/OpenASR/idiolect && cd idiolect && ./gradlew runIde
For Windows users:
git clone https://github.com/OpenASR/idiolect & cd idiolect & gradlew.bat runIde
Recognition works with most popular microphones (preferably 16kHz, 16-bit). For best results, minimize background noise.
Contributing
Contributors who have IntelliJ IDEA installed can simply open the project. Otherwise, run the following command from the project's root directory:
./gradlew runIde -PluginDev
Architecture
Idiolect is implemented using the IntelliJ Platform SDK. For more information about the plugin architecture, please refer to the wiki page.
Integration with Idear
plugin.xml defines a number of <extensionPoint>
s which would allow other plugins to integrate with or extend/customise the capabilities of Idear.
AsrProvider
Listens for audio input, recognises speech to text and returns an NlpRequest
with possible utterances.
Does not resolve the intent.
Possible alternative implementations could:
- integrate with Windows SAPI 5 Speech API
- integrate with Dragon/Nuance API
NlpProvider
Processes an NlpRequest
.
The default implementation invokes IdeService.invokeAction(ExecuteVoiceCommandAction, nlpRequest)
and the action is handled by ExecuteVoiceCommandAction
and ActionRecognizerManager.handleNlpRequest()
AsrSystem
Processes audio input, recognises speech to text and executes actions.
The default implementation AsrControlLoop
uses the AsrProvider
and NlpProvider
.
Some APIs such as AWS Lex implement the functionality of AsrProvider
and NlpProvider
in a single call.
IntentResolver
Processes an NlpRequest
(utterance/alternatives) and resolves an NlpResponse
with intentName
and slots
.
ActionRecognizerManager.handleNlpRequest()
iterates through the IntentResolver
s until it finds a match.
The Idear implementations use either exact-match or regular expressions on the recognized text. Alternative implementations may use AI to resolve the intent.
CustomPhraseRecognizer
Many of the auto-generated trigger phrases are not suitable for voice activation. You can add your own easier to
say and remember phrases in ~/.idea/phrases.properties
IntentHandler
Fulfills an NlpResponse
(intent + slots), performing desired actions.
ActionRecognizerManager.handleNlpRequest()
iterates through the IntentHandler
s until the intent is actioned.
TemplateIntentHandler
Handles two flavours of intent prefix:
Template.id.${template.id}
eg:Template.id.maven-dependency
Template.${template.groupName}.${template.key}
eg:Template.Maven.dep
template.id
is often null.
template.key
is the "Abbreviation" that you would normally type before pressing TAB
.
The default trigger phrases are generated from the template description or key and are often not suitable for voice activation.
You can add your own trigger phrase -> live template mapping in ~/.idea/phrases.properties
and it will be resolved by CustomPhraseRecognizer
.
ttsProvider
Reads audio prompts/feedback to the user
org.openasr.idear.nlp.NlpResultListener
Any interfaces which are registered to the topic in plugin.xml under <applicationListeners>
will be notified when
- listening state changes
- recognition is returned by the
AsrProvider
- request is fulfilled by an
IntentHandler
- there is a failure
- a prompt/message is provided for the user
Plugin Actions
plugin.xml defines <action>
s:
VoiceRecordControllerAction
This action is invoked when the user clicks on the button in the toolbar.
This simply tells
AsrService
to activate or standby.
When the AsrService
is active, the AsrSystem
,
by default ASRControlLoop
(see below).
ExecuteActionFromPredefinedText
A debugging aid to use one of the ActionRecognizer
extension classes configured in plugin.xml
to generate an ActionCallInfo
which is then runInEditor()
.
ExecuteVoiceCommandAction
Similar to ExecuteActionFromPredefinedText
but uses the Idiolect.VoiceCommand.Text
data attached to the invoking AnActionEvent
.
IDEA Actions
There are many Actions (classes which extend AnAction
) provided by IDEA:
ASRControlLoop
When AsrControlLoop
detects an utterance, it invokes
PatternBasedNlpProvider.processUtterance()
which typically calls invokeAction()
and/or one or more of the methods of IdeService
Programming By Voice
- Interactive IDE Voice Control
- Using Python to Code by Voice
- How a Blind Developer uses Visual Studio