Skip to content

Input Handling

arturb90 edited this page Jul 31, 2019 · 10 revisions

The input component is based on the Mixed Reality Toolkit and compiles input data for further processing. The InputHandler performs basic input validation and resolves conflicting inputs, enabling users to combine gesture and speech input seamlessly. For example, speech commands can assist users during object alignment using gestures.

Each component that shall receive input events must implement the appropriate interface. Objects of type IGestureInputReceiver then receive gesture input events and objects of type ISpeechInputReceiver receive speech input events.

Gesture Input

The GestureInputListener is responsible for managing active gesture sources, evaluating gesture inputs and delegating corresponding gesture input events to the InputHandler. It collects press and release events (see image) from Mixed Reality Toolkit's InputManager and maps them to specific gesture types according to the following table.

Gesture GS1 Presses GS1 Releases GS2 Presses GS2 Releases
One Hand Tap 1 1 - -
Two Hand Tap 1 1 1 1
One Hand Double Tap 2 2 - -
Two Hand Double Tap 2 2 2 2
One Hand Manipulation Start 1 - - -
Two Hand Manipulation Start 1 - 1 -
One Hand Manipulation End - 1 - -
Two Hand Manipulation End - 1 - 1
GS1 = Gesture Source One

GS2 = Gesture Source Two


You can choose to process gesture input data in a custom way by checking the ‘External Responder’ toggle and switch between processing modes during runtime. This way, local input processing is bypassed and redirected to a custom component which has to be provided to the GestureInputListener. Custom responders have to extend and implement abstract class BaseExternalResponder and are then responsible for mapping the raw input data to a specific gesture type.


Hand gestures corresponding to released (left) and pressed (right) states.

Speech Input

The SpeechInputListener relies on Unity’s DictationRecognizer to convert an audio clip into a text string. An active internet connection is necessary for dictation services. Speech input processing can also be customized by checking the ‘External Responder’ toggle. Then, just like for the GestureInputListener, a custom component has to be provided to the SpeechInputListener that extends BaseExternalResponder. The custom responder component is then responsible for evaluating speech input and determining the appropriate keyword from the set of available keywords. If no custom responder is provided, the SpeechInputListener falls back to default input processing by simply matching input strings with all predefined keywords.

A client for a Natural Language Understanding (NLU) service, RiQue, is included in the project and set as external responder by default. If you wish to use the NLU service, refer to the documentation for setting it up. To connect to the service on the HoloLens, set the IP address of the NLU service in the initial configuration panel on startup of the IslandViz application or edit the settings configuration file prior to deployment. If the NLU client has been deactivated or the NLU service is not reachable, the SpeechInputListener will fall back to default input processing.

Only input strings followed by a predefined activation keyword are processed. The activation keyword defaults to ‘Assistant’. For example, saying …

Assistant, please find the bundle containing the most compilation units.

… will trigger the SpeechInputListener to process the input string ‘please find the bundle containing the most compilation units’. To add new keywords, simply add a keyword to the KeywordType enum. Refer to the State Management section for mapping newly added keywords to state transitions or interactions.