# Supporting Rich Interactions in Mixed Reality

<a href="http://jjdudley.com">John Dudley</a>, Department of Engineering, University of Cambridge, UK.

Note that portions of the <em>Statistical Decoding for Text Entry</em> portion of this notebook have been adapted from a tutorial originally created by <a href="http://pokristensson.com">Per Ola Kristensson</a> (see material from 2019 Computational Interaction Summer School <a href="https://github.com/yujko/5thSummerSchoolCourseMaterials/tree/master/Day2-Jacques">here</a>).

In this tutorial we will explore <em>statistical decoding</em> as a technique for supporting rich interactions in Mixed Reality.


| Outline                                               | Exercises |
|:------------------------------------------------------|:-|
| Motivation                                            |  |
| Introduction to Statistical Decoding                  |  |
| Sequence Decoding for Text Entry                      |  |
| Statistical Decoding using Token Passing              | Group Discussion, Ex1. Test your understanding |
| Decoding with substitutions, deletions and insertions | Group Discussion, Ex2. Use log probabilties |
| Sequence Decoding for Hand Gesture Recogntion         |  |
| Decoding with Simple Thresholds on Finger Bend Angles | Group Discussion, Ex3. Test your understanding |
| Statistical Decoding using Finger Bend Angles         | Group Discussion, Ex4. Probabilistic recognition |

## Introduction

An intelligent text entry method is a text entry method that _infers_ users' intended text. We have known how to design such methods for quite some time. One of the more interesting examples is a 12th century shorthand system called _Nova ars notaria_ ("the new note art") by the monk of John of Tilbury. What is interesting with this system is that the _design principles_ are known:

* Simplify letters to line marks.
* Compress common word stems into sequences of simple line marks and dots.
* Identify common word stems by frequency analysis

In other words, two fundamental principles underpin the design of any efficient text entry method:

* Minimise users' effort in articulating their input.
* Exploit the redundancies in natural languages using statistical language modelling.

However, _how_ can we both minimise users' articulation effort and leverage the redundancies in natural languages? The solution is to perform _statistical decoding_. A statistical decoder is a generative probabilistic model that is capable of searching a vast hypothesis space in order to identify users' intended text given noisy observations.

## Statistical Decoding

The problem we are adressing is a user attempting to communciate information over some form of channel. In Human-Computer Interaction (HCI) we typically model a user transmitting _information_ to a computer system.

The term _information_ in HCI usually refers to characters (for example, typing on a keyboard), words (using speech recognition), or commands (for instance by using keyboard shortcuts or touchscreen gestures).

More formally, we intend to transmit a message $y$ via some form of signal $x$. In a perfect world this would be trivially achieved via a lookup-table. Unfortunately we live in an imperfect world and as a consequence our signal will be perturbed by noise in our neuromuscular system, device sensor imprecision, cognitive errors by the user, etc. Due to this inherent uncertainty it makes sense to model the problem _probabilistically_.

We then wish to compute the probability of the message $y$ _given_ the signal $x$. This can be written mathematically as $P(y | x)$.

<p>
The probability $P(y|x)$ is known as a _conditional probability_ and it is (by either the definition of conditional probability or as an axiom of probability):
</p>

\begin{equation}
P(y|x) = \frac{P(x \cap y)}{P(x)}
\end{equation}

(We are going to assume $P(x)\neq0$ and $P(y)\neq 0$).

The above equation states that the conditional probability of $y$ given $x$ is identical to the ratio of the _joint probability_ of $x$ and $y$ and the probability of $x$. The joint probability of $x$ and $y$ can also be written as $P(x,y)$. 

We can rewrite the conditional probability $P(y|x)$ as follows:

\begin{align}
P(y|x) &= \frac{P(x \cap y)}{P(x)}\\
P(x|y) &= \frac{P(y \cap x)}{P(y)} = \frac{P(x \cap y)}{P(y)}\\
\Rightarrow P(x \cap y) &= P(x|y)P(y) = P(y|x)P(x)\\
\Rightarrow P(y|x) &= \frac{P(x|y)P(y)}{P(x)}\\
\end{align}

This last expression is known as _Bayes' rule_ (or theorem). Usually we have many possible messages that we wish to decode and $P(y|x)$ will then become the _posterior_ probability distribution, assigning a probability to every possible message. Our objective is to compute this posterior probability distribution and select the most probable message.

Now, since we are usually only interested in the most probable message $\hat{y}$ we can write:

\begin{equation}
\hat{y}=\underset{y}{\arg\max}\left[P(y|x)\right]
\end{equation}

We have already seen that the conditional probability $P(y|x)$ can be written using Bayes' rule:

\begin{equation}
\hat{y}=\underset{y}{\arg\max}\left[\frac{P(x|y)P(y)}{P(x)}\right]
\end{equation}

However, as we are only interested in the message that maximises the conditional probability of the message given the signal, the denominator $P(x)$ will be invariant and can therefore be dropped:

\begin{equation}
\hat{y}=\underset{y}{\arg\max}\left[P(x|y)P(y)\right]
\end{equation}

$P(x|y)$ is the likelihood of the signal $x$ given a particular hypothesis for what the message $y$ could be. $P(y)$ is the _prior_ probability of the message, that is, without taking any signal into account. For instance, if a system can only recognise two messages, $x_1$ and $x_2$, and both are equally likely in the absence of any additional information, then the prior probability of either $x_1$ or $x_2$ is $0.5$.

Identifying the highest probable message $y$ is a _search problem_. We search by consulting a model of the likelihood of a signal $x$ given a message $y$ under consideration and by consulting a model of the prior probability of a message $y$ without any consideration to any signal. This search will generate _hypotheses_ and these hypotheses will have probabilities assigned to them. Usually, the hypothesis with the highest probability assigned to it is our preferred hypothesis:

\begin{equation}
\hat{\text{hypothesis}}=\underset{\text{hypotheses}}{\arg\max}\left(\text{likelihood model}\cdot\text{prior model}\right)
\end{equation}

## Sequence Decoding for Text Entry

Let's now switch to the [Sequence Decoding for Text Entry notebook](TextEntry.ipynb).

## Sequence Decoding for Hand Gesture Recognition

Let's now switch to the [Sequence Decoding for Hand Gesture Recognition notebook](HandGestures.ipynb).