# III: Bayesian HCI: A review and a reflection


**sli.do** https://app.sli.do/event/pqHMPkfsh41jv8s5yamNk4

## Outcomes

You will understand:

* The loops and stakeholders in interaction design: modelling, design, interaction and analysis
* Key themes of Bayesian approaches in HCI
* 1. How to apply Bayesian models at interaction time: **inference of intent**
* 2. Design time optimisation with uncertainty: **Bayesian optimisation**
* 3. How to analyse experimental evaluations: **Bayesian statistics for empirical analyses**
* 4. How to model cognitive process using Bayesian models: **Bayesian cognitive modelling**
* 5. How to apply interaction to problems of Bayesian modelling: **Visualisation, interaction and workflows for Bayesian models**

### Survey results

<img src="imgs/survey_bayes.png">

<img src="imgs/survey_bayes_2.png">

---


# The times, the players and the loops
![The times, the players and the loops](imgs/bayesian_loops.png)

*A somewhat dodgy cartoon of the HCI "process"*

## An HCI process
What goes on in HCI? An old-school traditional HCI process might unfold like this:

* A **psychologist** builds models of how people perceive and act in the world, and how they understand and might operate interfaces.
* A **designer** thinks about a problem and uses tools like use cases, personas, scenarios to conceptualise how an interface might used; and prototyping tools (like paper prototypes) to decide how that interface should be used.
* An interface is implemented. When it is in operation, a **user** is engaged in a control loop where the interface reacts to inputs from sensors, changes internal states, and updates outputs on displays.
* An **analyst** conducts an evaluation to analyse the way the interface is used, how successfully it is operated, how users feel about different aspects, etc.


## The times and the loops
This gives rise to four distinct **times**:

* **Modelling time** where we create general models human behaviour in the presence of interactive systems;
* **Design time** where a specific instantiation of an interface is arrived at by some design process, informed by modelling;
* **Interaction time** where specific user goals at an instant of time are serviced via an interface;
* **Analysis time** where the efficacy of the interface is reviewed and analysed.

None of these are simple feed-forward processes; all would typically involve *feedback*; we might test prototypes to refine designs or to build better models of interface usage.

## Themes

When we talk through these, there are some key themes of "Bayesianism" to look for:

* **Models**: what form do the data generating processes take? What are the parameters that can vary?
* **Inference**: how is it achieved? What computations give us posteriors, and how is evidence brought into play?
* **Uncertainty**: how is represented and what is used for? Why is uncertainty valuable in this context?
* **Priors**: where do they come from and how are they specified?

## Commonalities
A Bayesian approach:
* *always* involves manipulating probability distributions over unobserved parameters;
* *always* involves a movement from prior to posterior guided by belief;
* *always* requires defining a data generating process;
* *always* preserves uncertainty (to some degree, usually limited by computation);
* *always* has some computational struggles!


## Structure

In each sub-section, I'll:
* outline the concepts and how Bayesian ideas apply
* we'll run through a self-contained live code example.
* briefly discuss 2-3 papers from CHI or UIST that used these ideas in real research applications.

# In: In-the-loop: Bayesian interaction

## Summary
We infer distributions over intentions during an interaction, fusing together evidence from multiple sources, and over time. We need models of behaviour/sensing/display, representations of probability distributions that are compatible with our UI software, and techniques for preserving and reflecting probabilistic state.

## Explanation
### Motivation
Adding probabilistic Bayesian inference in the interaction loop can make interaction more robust, rational and efficient.

* **Robust** means that the interaction should reliably coincide with intention, even in the presence of disturbance;
* **Rational** means that the actions should be taken that accurately reflect both certainty and utility;
* **Efficient** means that action should coincide with intention with a minimum of time, mental or physical effort expended.

This is particularly salient when there is a large control:action disparity.
* For example: the interaction technique is marginally effective (e.g. in assistive technology for text entry) 
* For example: the space of possible actions is huge (e.g. a search engine)

### Model

A human interacting with a computer. We can see this several ways:

<img src="imgs/control.png">

* Communication: users encode selections to be packaged over a motor channel and then a sensor channel. These are decoded into state changes.
    * Essentially feed-forward; low-latency, high mental cost, high-bandwidth. Domain of pattern recognition approaches.
* Control: users drive a system into a desired equilibrium by feedback control, through a motor/sensor channel, via a mediating mechanism (like a cursor), and back through a perceptual channel.
    * Essentially closed-loop. Low mental cost, low bandwidth. Domain of traditional UI components.
* Inference: A system infers what a user might want to happen, based on observed evidence, and designs experiments via the feedback channel to optimally acquire more.
    * Closed-loop or feed-forward. Domain of probabilistic interfaces.


#### Interaction as inference
If we view interaction as inference of intention, there are three elements:
* **Interaction is inference**; it is the process of inferring a distribution over a hidden variable: what the user wants a system to do. 
* **Observations are indirect, noisy and incomplete** What a system sees is a distorted and incomplete representation of user actions in the world, which are in turn a noisy representation of internal intentions (your hand does not always go where you want it...)
* **Interaction occurs over time** Interaction is a *process* that evolves over time. Information flow is not instantaneous. Observations must be fused together to update beliefs.

### Approach

* Inference approach: Model the system's uncertainty over user intentions as a probability distribution, update via Bayes' rule. $P(\text{intention}|{input})$
    * What inputs would we *expect* to see, given hypothesised intentions and their prior likelihood?
        * For example, what mouse trajectories would we expect to see, given that a user was pointing a specific icon?
    * If still unresolved, provoke a response that will maximise the information gain (change in entropy).
        * For example, move one of the two most likely icons and see if the user compensates.

We'll look at a **Bayesian** approach to modelling human computer interaction, where we explicitly model what might be going on inside a user's mind and use Bayesian methods to try and perform "optimal mindreading". 
<img src="imgs/brain_inference.png" width="70%">

### Models
This view on interaction sees user intentions as **unknown values** which are partially observed through inputs. The time series of inputs from the user give a partial, noisy, incomplete view of intention inside the user's head, along with a great deal of superfluous information. 

We try and infer intention *generative model* which is a simplified representation of intention and how it is mediated and transformed by the world. The stronger model we have available, the more effectively we can infer intention.

> In this view, improving interaction (or at least *input*) comes down to more efficiently concentrating probability density where a user wants it. A better pointing device reduces uncertainty faster; a better display helps a user understand how best to target future actions to concentrate belief as desired; a better model of the user intentions concentrates belief with less explicit effort on the part of a user.

<img src="imgs/contraction_probability.png" width="70%">

#### Partitioning the inferred variables

We can further partition the problem. The causes of observed evidence can be factored, for example, into:

<img src="imgs/brainspace.png">

* **Mind state** The parameters of the intentions that generate the behaviour: what menu option does the user want?
* **World state** The parameters of the motor system that generate movement: where is the user's hand?
* **Sensor state** The parameters of the sensing system that generates signals: what is the camera matrix?

$$P(X_{\text{intention}}, X_{\text{motor}}, X_{\text{sensing}}|Y)$$

[Betancourt's article on probabilistic modeling](https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html) expresses these ideas in terms of the "phenomenon" (intention), "environment" (motor/world system) and "probe" (sensing/interface context).

#### Purity
It's rare to have an interaction loop that is *pure* Bayesian, where every step of the interaction is modelled with probability distributions and updated via Bayesian inference. Often we restrict the Bayesian slice to the realm where uncertainty is most relevant.

<img src="imgs/fwd_inv_bottleneck.png" width="50%">

For example, we might have a system where:

* A standard ML algorithm processes high-dimensional sensor data to a simpler form (e.g. a touch sensor to a cursor)
* Bayesian inference over possible gestures that could be being performed by the cursor (not the sensor)
* A standard state machine which actuates when probabilities cross thresholds.

This gives the power of inverse models (e.g. deep nets) in efficiently performing *representation learning*, and the standard and familiar operation of systems with discrete states, but with some of the robustness and flexibility of a Bayesian model. It's not *necessary* to do this; we could have a fully Bayesian interactive system, but it's rarely practical to do so, and it raises questions about how a user might understand and use such a system.

#### Forward-inverse
<img src="imgs/fwd_inv.png" width="50%">

In particular, this is a really powerful way of plugging together advanced ML systems (e.g. for computer vision); these are extremely powerful, but typically don't have any representation of uncertainty, or at best a weak one. Instead, most deep learning based systems map directly from one space to another; in interaction this is often a mapping from an observed sensor state (like an image) to an inferred hidden state (like the pose of the person in the image). This is an **inverse model** (because it maps from observations to hidden states); we can also use ML-learned **forward models** (i.e. generative models from hidden states to models) as part of a Bayesian model, learning some elements of the data generating process that we cannot easily write down analytically.

### Uncertainty
* Input from a human user will often be ambiguous, at least at some point in time. This might be because of:
    * Genuine noise in the sensing that disturbs the intended control;
    * User confusion, error or changing intentions;
    * Partial and incomplete evidence of intention from sensing;
    * Inaccurate models of behaviour given intention.
* Failure to represent uncertainty means actions might be taken without sufficient evidence, or require unreasonable quantities of evidence to actuate.
* Balancing the information flow requires a proper accounting of uncertainty.

#### Feedback of uncertainty
Uncertainty is useful to perform robust inference. It can also assist users in understanding the belief state of the system they are controlling. Reflecting the probability distribution (or summaries of it) can be a powerful way of building interfaces that help a user understand how their action is being interpreted, and when ambiguity remains.

<img src="imgs/uncertain_ui.png" width="50%">

#### Active inference
A system which models uncertainty can take actions to minimise it; it's not possible to do this if you don't capture the uncertainty! This means we can build interactive systems that generate stimuli to optimally acquire information (reduce uncertainty). This is the "pull" to the "push" of uncertainty feedback. A Bayesian interactive system can reveal its probability distribution to help the user feed it the right information, and stimulate the user to feed it the most helpful information. 


### Priors
Typically we have priors that are either derived from:
    * observations from a population; for example, we might use historical frequencies of selection of menu items as a prior for the probability of selecting those items
    * the immediate past; we use a sequential filtering process where the posterior from one timestep becomes the prior for the next; for example, a language model might predict the next character to enter *given* the previous characters entered.

We may also have priors that arise from psychological or physiological models. For example, if we were tracking someone's hand in space, we could use a prior that implemented the reach volume of the hand -- the probability of the hand being more than 2m from the torso is probably small, for example.


## Example
We'll look at building a very simple gesture recogniser using a *particle filter* (that's a sample-based/MCMC probabilistic filter).

**Link to the notebook: [examples/1_in_the_loop.ipynb](ex_1_in_the_loop.ipynb)**

## Papers 

### Paper a: AnglePose
1. Rogers, Simon, John Williamson, Craig Stewart, and Roderick Murray-Smith. 2011. [**“AnglePose: Robust, Precise Capacitive Touch Tracking via 3d Orientation Estimation.”**](https://doi.org/10.1145/1978942.1979318.) In *Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems - CHI ’11*, 2575. Vancouver, BC, Canada: ACM Press. 

![AnglePose](imgs/paper_1_anglepose.png)

* **Distribution**
    * Over finger poses, i.e. over the 4D vector space $[x,y, \phi, \theta]$. Roll is ignored.
* **Data Generating Process**
    * Simple model of finger as a hinged flap
    * Observed via basic capacitive electrical model 
    * First-order dynamics of motion (e.g. velocity tends to be constant in the short term)
* **Priors**
    * The previous time step (i.e. the last pose known)
    * And the bounds of the device
* **Observation**
    * Raw sensor images (160 element vectors representing capacitances)
* **Inference**
    * Sequential Monte Carlo (particle filter) -- propagating samples forward in time.
* **Interaction benefit**
    * Increased precision and robustness of pointing
    * Uncertainty available to decide on whether to actuate

### Paper b: Dasher
1. Ward, David J., Alan F. Blackwell, and David JC MacKay. 2000a. [**“Dasher-a Data Entry Interface Using Continuous Gestures and Language Models.”**](https://dl.acm.org/doi/pdf/10.1145/354401.354427) In *UIST*, 129–37.

![Dasher](imgs/paper_2_dasher.png)

* **Distribution**
    * Over (sequences of) characters of an alphabet.
* **Data Generating Process**
    * Characters are generated by a Markov model (PPM-based) conditioned on prior characters.
    * These are indexed based on a response to a stimuli which dedicates screen real-estate according to prior.
* **Priors**
    * Language model given previous characters    
* **Observation**
    * Cursor position (note: not stochastic!).
* **Inference**
    * Exact.
* **Interaction benefit**
    * Increased efficiency of interaction 
    * Easy to fuse with other input sources (e.g. SpeechDasher)

### Paper c: BIGNav
1. Liu, Wanyu, Rafael Lucas d’Oliveira, Michel Beaudouin-Lafon, and Olivier Rioul. 2017a. [**“Bignav: Bayesian Information Gain for Guiding Multiscale Navigation.”**](https://dl.acm.org/doi/pdf/10.1145/3025453.3025524) In *Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems*, 5869–80. ACM.

<img src="imgs/paper_3_bignav.png">

* **Distribution**
    * Over spatial locations (e.g. on a map)
* **Data Generating Process**
    * Users generate input based on the target "zone" closest to their intention    
* **Priors**
    * Initially uniform; but incorporating the distribution at the previous timestep.
* **Observation**
    * A zone which is activated (e.g. clicked with a mouse).
* **Inference**
    * Exact (discretised).
* **Interaction benefit**
    * Actively elicits the stimulus that optimises the *information gain*
    * Can incorporate priors (e.g. importance of landmarks in a map navigation task)
    * Faster than naive zooming based interfaces    

# Before: Bayesian optimisation at design time

## Discussion
### Motivation
Design by hand-engineering relies on idiom and experience, which may not be optimal or agile enough to adapt to new contexts, such as new devices or specific user groups. Automatic optimisation of the specific parameterisation of an interface can address this *but* acquiring data from users to fine-tune is very expensive and very noisy. Bayesian optimisation is a sample-efficient way to tune designs by constructing a distribution over **proxy objective functions**.

The typical application of Bayesian optimisation is when we have some (relatively few) parameters in an interactive system we need to tweak; perhaps scrolling speed, or colour contrast ratios on an UI component, or cooldown time after a gesture. We want to optimise this parameter to make the interaction work well, but we lack an understanding of how the parameter affects the interface. To resolve this, we need to conduct empirical work. 

Doing basic A/B testing or systematic experiments is extremely wasteful and works poorly if the responses are very noisy (as they often are in interaction). Bayesian optimisation simultaneously learns a model and optimises to find the best predicted parameters given that model. It preserves uncertainty over possible models as it does progressively narrows down the options.

### Approach
We conduct optimisation -- that is, adjusting some parameters and measuring the effect on an objective function. 

> Example: we might adjust the font size of a document reader, and measure the reading speed to select the most "efficient" font. 
> We assume we *don't* have a model of how fast someone can read for a given font size -- there's some unknown function that maps from font sizes to wpm.
> So we can't use standard optimisation approaches which rely on knowledge of the objective functions (and probably its gradient). Instead, we form a distribution *over possible objective functions* (what possible response curves could users have to font sizes?); and we update this based on point samples. 

#### Bayesian optimisation loop

* Initially we have some broad distribution over possible objective functions, informed by our prior knowledge;
* Then, we choose a point to evaluate the objective function that will maximise some criterion (e.g. the largest expected improvement)
* We then acquire data, typically by running an experiment on a user at that point, and measure the result.
* We use this to update the distribution over functions, and repeat the sampling process.

<img src="imgs/cartoon_bayes_opt.png">

This is an *active inference* approach -- we actively design and adapt "experiments" to acquire information about the most useful parameters to test. This can *automatically* balance exploration (maximising information about the objective function) and exploitation (finding the optimal value of the objective function). This gives us an *algorithm* to do online adaptation in a way that respects the uncertainty about user behaviour/preferences/values.

#### Functions, functions, functions

* **Objective function** $f(x;\theta)$: The function that tells us for some input $x$ and parameters $\theta$ how "good" the result is. We obviously have to define what "good" means precisely. This is our model, but is inaccessible. For example, $x$ might be a text, and $\theta$ might be a font size. $L(x;\theta)$ might be the number of seconds to read $x$ with font size $\theta$.
* **Acquisition function** $g(f)$: A function of the distribution over objective functions, that tells us what to prioritise when choosing a new $\theta$ to test, given a distribution over objective functions. This drives the interactive experimental process. This might be:
    * The **expected improvement** (EI), how much of an improvement in $f(x;\theta)$ is expected, averaging over the current distribution of $f(x;\theta)$ at that point
    * The **probability of improvement** (PI), how likely there is to be an improvement $f(x;\theta)$ for a given $\theta$
    * and various other choices as well.


### Uncertainty
Our uncertainty here is about the hypothetical model of user responses the objective function embodies. We also have uncertainty about the reliability of any observation we make by running an experiment -- most interaction problems will have random variation just due to different subjects or environmental factors. If I measure someone's reading speed five times on the same document, it won't be exactly the same; it certainly won't if I test five different individuals. 

### Priors
We can choose to remain relatively neutral if we know virtually nothing about possible objective functions. We usually have to make *some* (weak) assumption about how smooth the objective function is likely to be, but beyond that we can remain flexible. If we have stronger prior knowledge, such as from a first-principles model, or from a previous round of experiments, we can easily incorporate it as a starting point to accelerate the optimisation process.

## Example

We'll look at optimising a... using a Gaussian process as a proxy for our objective function with exact inference. This is a simple to apply and widely applicable way of modelling preference or performance functions.

**Link to the notebook: [examples/2_bayesian_optimisation.ipynb](examples/2_bayesian_optimisation.ipynb)**

## Papers

### Paper a: Engaging games
Khajah, Mohammad M, Brett D Roads, Robert V Lindsey, Yun-En Liu, and Michael C Mozer. 2016. [**Designing Engaging Games Using Bayesian Optimization.**](https://dl.acm.org/doi/pdf/10.1145/2858036.2858253) In *Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems*, 5571–82.

<img src="imgs/paper_4_engaging.png">

#### Problem
* Optimise engagement (measured via retention time) in game playing. 
* Induce players to play for longer by adjusting parameters of gameplay.
* Two games are optimised: Flappy Bird, and Spring Ninja.

#### Design space
* **Flappy bird**: 
    * pipe spacing
    * pipe gap
    * covert assistance
* **Spring ninja**: 
    * spacing between pillars
    * visible trajectory extent
    * covert assistance


#### Objective function
* Retention time, as a function of the three parameters in each games' design space.

    
#### Observations
* How long the players played the game (from a very large number of subjects; N>900 for flappy bird; N>300 for Spring Ninja)

#### Bayesian optimisation approach
* Gaussian process based proxy function, using Thompson sampling as the acquisition function. 
* Thompson sampling is stated as being more likely to explore the design space compared to EI/PI.

#### Outcomes
* Retention improved with adapted gameplay parameters
* This was also correlated with positive responses in a gameplay questionnaire administered to players.

----

### Paper b: Better fonts
Kadner, Florian, Yannik Keller, and Constantin Rothkopf. 2021. [**Adaptifont: Increasing Individuals’ Reading Speed with a Generative Font Model and Bayesian Optimization.**](https://dl.acm.org/doi/pdf/10.1145/3411764.3445140) In *Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems*, 1–11.

<img src="imgs/paper_5_adaptifont.png">

#### Problem
* Optimise reading speed by changing the typeface.
* Typefaces here are represented in a machine-learned latent space where interpolation among standard fonts is possible.

#### Design space
* Typefaces are represented in a 3D space
* Machine learning used to reduce fonts onto this space, preserving some notion of shape similarity
* The design space is a location in these three typeface dimensions.

#### Objective function
* Time taken to read a short question as a function of the typeface shape 

#### Observations
* Reading time, in a question answering task.

#### Bayesian optimisation approach
* Gaussian Process, with upper confidence bound (UCB) acquisition function. Optimisation was performed sequentially.

#### Outcomes
* Significant improvements in reading speeds over baseline with adapted fonts.
<img src="imgs/adaptifont_results.png" width="50%">

---

### Paper c: Crowdsourced design
Dudley, John J, Jason T Jacques, and Per Ola Kristensson. 2019. [**Crowdsourcing Interface Feature Design with Bayesian Optimization.**](https://dl.acm.org/doi/pdf/10.1145/3290605.3300482) In *Proceedings of the 2019 Chi Conference on Human Factors in Computing Systems*, 1–12.

<img src="imgs/paper_6_bayes_ui.png">

#### Problem
* Optimise an interface for an interface task where users had to answer queries about hotels on a map
* Optimisation was to the tooltip display featured in the app when hovering over icons
* Tested in both a standard mobile app setting, and in a (quasi)-VR application (i.e. a 3D view on a mobile device)

#### Design space
* The design space for each interface had five dimensions:
    * Distance to an object before tooltip appeared
    * Delay time before tooltip popped up
    * Decay time before tooltip is hidden again
    * Size of the tooltip
    * Opacity of the tooltip

#### Objective function
* Time to complete a query given the tooltip configuration specified above.

#### Observations
* Task time
* (users also asked rate relative improvement between tasks, but not used for optimisation).

#### Bayesian optimisation approach
* Gaussian process, using the expected improvement (EI) acquisition function
* Design space is quantised to simplify optimisation
* Batched experiments, where a number of users (20) receive a number of tasks at once, then an update is performed before the next task.

#### Outcomes
* Task completion time improved over baseline (~10% in the mobile task, ~15-20% in the VR task)
* User's subjective reporting of quality of experience increased when using Bayesian optimisation

# After: Bayesian analysis of empirical work

## Discussion
### Survey results
<img src="imgs/survey_freq.png">

### Introduction
When we evaluate how an interface works, we typically rely on statistical tools to analyse quantitative results, in particular to turn quantitative results into statements. Even better, we'd hope to use statistical tools to *design* the analyses that we intend to perform: statistics starts before the experiment, not afterwards!

Statistical analysis in HCI are common -- just look at any CHI paper -- but they are often inappropriate or flawed. Almost all HCI uses classical frequentist statistics adopted from psychology. Frequentist statistics include things like **t-tests, ANOVAs, Mann-Whitney tests**, and so on. Frequentist statistics *are not wrong*; they are perfectly mathematically valid and sometimes useful techniques. But they are most certainly not the only way to do statistics and questionably appropriate methods for HCI.

> Some authors (e.g. E. T. Jaynes or Aubrey Clayton) would argue that all of frequentist statistics is useless rubbish -- even if mathematically correct -- and that Bayesian methods are the only valid approach to analysing data. 
> They might be right. 

### What are Bayesian statistical models?
A Bayesian statistical model is just a data generating process + priors + data. The output is a posterior distribution over the unknown parameters, which we can use to answer questions about our interface.

> You are testing keyboard layouts for PIN entry. A Bayesian analysis could:

> * Define a model of the PIN error rate, as a function of the layout: `error = f(layout, params)`
>    * For example, it might assign different mean error rates to each layout
> * These means are unknown -- the parameters we want to infer.
> * We'd set some priors (e.g. all means are somewhere between 0.5 and 50.0 seconds per PIN)
> * Then we'd run an experiment, and get a bunch of error rates for two layouts, A and B.
> * We'd apply an inference algorithm to learn the posteriors.
> * We'd get posterior means-per-layout
> * We could then make statements like: "there's a 90% chance that layout A is better than layout B in terms of error rates, under our priors"


### Motivation
Why do frequentist statistics fail to do what we want to do? How do Bayesian methods help? And what are the trade-offs?

#### What do we want to know?

Frequentist statistics answer questions of the form: "if I ran this experiment (identically) a very large number of times (tending to infinity), how often would I encounter observations like I actually observed, given that some hypothesis does not hold?". It is not possible to answer questions about how likely different hypotheses are! It is only possible to make statements about how likely observations are under fixed hypotheses.

> Do we want to compare two blocks of observations, apply a "black box" algorithm, and make statements about how likely the variation is to be "random" as opposed to "true"?

This is pretty unnatural for most problems.

#### False dichotomy and backwards questions
* Frequentist statistics often force us into dichotomous decisions: is A (statistically significantly) better than B. Let's do a test and find out... 

* But *constructing* a B to compare with may not make much sense. At the very least, it leaves us to make pointwise comparisons against baselines.
* This mode of thought pervades how we think about experimental work in HCI **but it is not necessary**.
* Dichotomous analyses *are* appropriate if you want to make comparisons to a baseline, in controlled studies, where effects are large and false positives are important.
    * For example, in randomised controlled trials for efficacy of drugs.
* It further often forces us to ask questions of a null hypothesis structure: would I expect random variation this large if my hypothesised effect *didn't exist*?

---

* Frequentist statistics can't answer many questions we'd like to know the answer to; "how likely is it that A is twice as efficient as B"? 
    * **Our analysis procedures should answer the questions we want to know, not define the questions we are allowed to ask!**
* Frequentist statistics typically give us a single estimate but not uncertainty about inference.
    * FS can make only make uncertainty quantifications about observations, not parameters.
    * But our questions are usually about parameters!

#### Application and interpretation
* Frequentist statistics produce quantities like p-values and confidence intervals. **These are valid but very hard to interpret!**
    * Q: define one of these terms *correctly*    

* Frequentist statistics are easy to misuse; great care is needed make sure you account for multiple testing, researcher degrees of freedom, stopping rules to preserve the false positive rate of testing procedures.
* Frequentist statistics come in pre-packaged forms (e.g. ANOVA). These have to be slotted in. This leads to several problems:
    * You must make the choice of approach intelligently among a zoo of tests and procedures. But you cannot adjust any of the details to fit your problem.
    * No modelling is required and so you can avoid defining a good data generating process -- which means you are likely to do silly things.
    * On the other hand, standardised procedures mean that there is efficient software, and readers and reviewers have a common baseline to worki on.
* No prior information is explicitly defined. That seems "objective", but it hides that every frequentist approach makes *hidden* assumptions equivalent to priors.
    * In other words, you get a one-size-fits-all prior that is unlikely to match your true priors, and there is nothing you can do about it.

#### Other issues
* Frequentist methods are typically used because of tradition. This is not a good reason.
    * Bayesian statistics are not widely understood in HCI.
    * Researchers are reluctant to admit that much of published statistics might be (very) dubious.
    * Sotfware and tooling doesn't make it easy to Bayesian analysis.
    * Philosophical debates still rage about Bayesian vs. frequentist models. Some are still cautious about Bayesian methods (or aligned with the dark side).
    
* Computational issues used to make Bayesian methods impractical. This isn't a such a good excuse these days.
* Priors can be controversial. You can obviously change what you assume to change what you end up with as posteriors.
    * But the counter-argument is that *expliclity stating* your assumptions (as priors) is the right way to be honest; a procedure can't guarantee honesty but it can hide deception.
    * In practice, prior choice is often relatively unimportant -- even a small amount of evidence washes away the difference between prior choices.
    
* Adaptive experiments can be hard to construct with frequentist methods: typically, sample size must be computed in advance (via a power analysis) and then carried out without changes
    * Bayesian statistics can be much more flexible, because our uncertainty is captured in the posterior; we simply get more precise posteriors if we capture more information, and *we can use our posterior to decide the next trial/query to run!*

## Example

We'll look at analysing a Fitts' law task using a very simple Bayesian regression model, using `pymc3` to perform MCMC-based inference. We'll look at how we can interpret and report the results of the inference process.

**Link to the notebook: [ex_3_regression_analysis.ipynb](ex_3_regression_analysis.ipynb)**

## Papers

### Paper a: Why Bayesian Statistics Better Fit the Culture and Incentives of HCI
Kay, Matthew, Gregory L Nelson, and Eric B Hekler. 2016.
[**Researcher-Centered Design of Statistics: Why Bayesian Statistics
Better Fit the Culture and Incentives of HCI.**](https://dl.acm.org/doi/pdf/10.1145/2858036.2858465) In *Proceedings of the
2016 CHI Conference on Human Factors in Computing Systems*, 4521–32.

#### The problem identified
* Studies in HCI are (often necessarily) small, and the power of each individual study is therefore usually small.
* Frequentist statistics make it very hard to combine evidence from multiple papers
* Typically, a meta-analysis is required (but this has problems)
* This leads to weak "accrual of knowledge" -- papers tend to be point samples that are unrelated to each other.

#### The solution suggested
* Bayesian statistics are *composable*
* They are also more sensible with small-n studies -- as they preserve uncertainty and can be informed by expert priors
    * We don't just throw away Bayesian statistical results if they are "rejected"
* Posterior results from studies can be used as priors in new studies
* This works even if the studies don't exactly match, if we're careful
    * We can always make priors broader
* They show a very interesting *simulation* of what would happen if HCI papers were Bayesian: importantly, results would be found more reliably and in fewer publications with a Bayesian approach (under certain assumptions).


### Paper b: Dichotomous-ness
Besançon, Lonni, and Pierre Dragicevic. 2019. [**The Continued Prevalence of Dichotomous Inferences at CHI.**](https://dl.acm.org/doi/pdf/10.1145/3290607.3310432) In *Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems*, 1–11.

#### The problem identified
* Papers in the HCI community are often of the form: "is A statistically significantly better than B"?
* This leads to:
    * excessive weight on p-values and similar metrics, which can be very misleading
    * bizarre traditions around 5% significance levels and similar make conclusions unscientific
    * fallacious thinking around the absence of evidence and the evidence of absence
    * A and B are often straw men anyway -- the requirement to choose two points to compare is not often useful
    
* All nuance is lost: everything is reduced to simply superiority
* Dichotomous statistics in HCI are often reported and interpreted incorrectly

#### The solution suggested
* Better reporting of frequentist statistics (e.g. using confidence intervals as opposed to p-values)
    * But this has many problems as well...
* Use Bayesian methods instead, which have no particular bias towards dichotomous analyses
    * They can be used for this if we wish, however
* Bayesian methods make it easy to report our certainty in conclusions, not a binary decision.

### Paper c: Prior selection
Phelan, Chanda, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. [**Some Prior (s) Experience Necessary: Templates for Getting Started with Bayesian Analysis.**](https://dl.acm.org/doi/pdf/10.1145/3290605.3300709) In *Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems*, 1–12.

#### The problem identified
* Bayesian analysis requires priors to be defined.
* In many HCI problems, the priors to use are not obvious.
* We also need to define data generating processes; these may not be obvious.
* Bayesian analyses need to represent posterior distributions in an interpretable manner.
* This is not simply the case of stating a p-value!
* Many researchers shy away from Bayesian statistics as a result

#### The solution suggested
* The paper gives templates for standard HCI tasks, including:
    * A workflow for Bayesian analyses
    * Template models, which include DGPs and standard priors
        * These are linear Gaussian models that correspond to common frequentist models like ANOVA
    * Visualisation via *hypothetical outcome plots* to make it easy to communicate posteriors



# About: Bayesian models of cognition and behaviour
## Discussion

### Introduction
How do we start to build interfaces? In the old days, hardware and computational power dictated many of the constraints. Even so, ancient mainframes still  had to group their switches logically, label them, and lay them out for ergonomic reachability. To build interfaces we need to understand how humans behave. Human behaviour is the key constraint that defines interaction. We need models of how humans behave. We can divide that roughly into *physical* (biomechanics, ergonomics, etc.) and *mental* (perceptual, cognitive) domains.

Some HCI relies on human experience of behaviour ("designer's intuition"), heuristics that formalise these experiences, and repeated evaluation as a crutch to stitch up the mismatches. Computational interaction rejects this approach, and puts computational models first. This means we have to have actionable simulators of aspects of human behaviour that can be applied to interface design problems.

Some human simulation problems are tricky but largely in the realm of the observable and relatively certain, like simulating reach volumes from biomechanics. But others, especially those that involve cognition, are very hard to construct. Cognitive processes are:

* weakly observable -- we have essentially no instruments to act upon them directly and limited modes of observation and control.
* potentially highly variable among the population and even within an individual
* Cognition is stochastic in production of evidence ("same" mental state => different observed outcomes).
* very complex in nature. A linear system probably won't suffice!

### Bayesian models for cognitive models
Bayesian models are well-suited to this type of computational modelling -- it is straightforward to cope with the uncertainty that arises. Bayesian inference can establish posterior distributions over parameters of cognitive simulators.

### Bayesian models *as* cognitive models
The *other* side of this is modelling humans *as if* they are applying Bayesian inference when they interact. There is a long history of work in this vein, and it remains controversial. But it allows defining computational models that can be insightful in approaching HCI problems.

### Example
No example for this section; it's just too much to cover in a short example -- but see Andrew's talk on Wednesday instead.

### Papers
#### Paper a: Bayesian cognitive models of visualisation
Kim, Yea-Seul, Logan A Walls, Peter Krafft, and Jessica Hullman. 2019. **“A Bayesian Cognition Approach to Improve Data Visualization.”** In *Proceedings of the 2019 Chi Conference on Human Factors in Computing Systems*, 1–14.

##### Problem
* We lack good models of how visualisations are interpreted
* In particular, how understanding is updated from prior knowledge after viewing a visualisation
##### Solution
* The paper proposes a Bayesian cognitive framework for understanding how prior models are updated by users
    * i.e. it assumes that humans mentally perform approximate Bayesian updates
* Experimental work with a range of visualisations supports a Bayesian cognitive model
* The paper evaluates how different approaches to uncertainty visualisation might impact a Bayesian cognitive model

#### Paper b: Bayesian inference applied to cognitive models

Kangasrääsiö, Antti, Jussi PP Jokinen, Antti Oulasvirta, Andrew Howes, and Samuel Kaski. 2019. **“Parameter Inference for Computational Cognitive Models with Approximate Bayesian Computation.”** *Cognitive Science* 43 (6): e12738.

##### Problem
* Cognitive models that explain user behaviour are well-established (like ACT-R)
* These typically have some parameters that need to be set to align them with human performance
* There is a lack of principled way to estimate these parameters

##### Solution
* Apply Bayesian inference to estimate parameters
* In particular, use existing (non-Bayesian) simulators that **don't** have likelihood functions
* And use *approximate Bayesian computation* to learn parameter distributions instead
* This is inefficient, but can re-use existing cognitive models
* This is applied to a skill acquisition model in ACT-R, and a computational rationality model

# With: Interaction with Bayesian models

## Discussion

### Introduction
Let's turn the tables. What can HCI offer Bayesian modelling? How could HCI help:

* create Bayesian models?
* validate and verify Bayesian models?
* visualise, explore and explain their implications and consequences?

> Bayesian modelling is often seen as hard, and the models as complex and inscrutable. This isn't "really" true, but
> there lots to be done to make the "real" complexity manageable. These are problems of interaction between users and models.

#### Visualisation and exploration
Bayesian analyses produce distributions. These are "correct", in the sense they preserve uncertainty, but they are not easy for humans to understand and interpret. Humans struggle to reason under uncertainty, and they struggle to perceive displays of uncertainty. Designing effective visualisations, particularly when many variables are involved, is a major challenge. 

Display of uncertainty might be relevant in reporting the results of a Bayesian statistical analysis; or it might be used "in-the-loop" to display uncertainty during interaction. In any case, we need primitives to encode probability distributions for human perceptual channels.

#### Worfklow and modelling process
The creation of Bayesian models is a complex practice, and it involves domain experts, statistical modellers, computational engineers and end-users. There is much recent research interest in the design of **workflows** for Bayesian analysis, and the analytical tools to support the creation of robust, efficient Bayesian models that can be validated and reported to end users.

## Example

**Link to the notebook: [examples/5_bayesian_visualisation.ipynb](examples/5_bayesian_visualisation.ipynb)**

## Papers
### Paper a: Animation to reveal uncertainty
1. Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2018. [**“Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data.”**](https://ieeexplore.ieee.org/iel7/2945/4359476/08440816.pdf) *IEEE Transactions on Visualization and Computer Graphics* 25 (1): 892–902.
<img src="imgs/hops.png">

#### The problem
* Uncertainty is hard to communicate visually
* This is partly due to the inherent complexity of an uncertain display
* and partly due to limited visualisation tools available

#### How Bayesian models are involved
* Bayesian models produce posterior distributions
* These have inherent uncertainty that must be handled

#### The proposal
* Show animated samples drawn from a distribution instead of visual summaries
* These spread out *ensemble* visualisations over time

#### The effect
* Users are better able to make judgements with animated HOPs than with comparable static plots

### Paper b: A Bayesian workflow
1. Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. **“Bayesian Workflow.”** *arXiv Preprint arXiv:2011.01808*. https://arxiv.org/abs/2011.01808.

<img src="imgs/bayesian_workflow.png">

#### The problem
* Bayesian models can be complex to construct
* Validation and visualisation are essential to build valid models and choose among alternative formulations
* Computational resources often restrict possible model choices and approximations that might be required

#### The proposal
* A formal procedure for modelling, evaluating, computing and comparing models is outlined
* It emphasises techniques such as posterior predictive checks, fake data simulation, simulation-based calibration, multiverse analyses to ensure the validity and coherence of Bayesian models
* 


### Paper c: Uncertainty visualisation
Padilla, Lace, Matthew Kay, and Jessica Hullman. ["Uncertainty visualization."](https://psyarxiv.com/ebd6r/download?format=pdf) (2020). PsyArxiv pre-print

<img src="imgs/uncertainty_visualisation.png">

#### The problem
* Uncertainty occurs in virtually every modelling problem
* It can be hard to communicate visually and is generally hard for humans to understand and reason about
* Many existing tools and approaches elide uncertainty -- this is inappropriate and risky

#### The proposal
* Good visualisation can reduce the cognitive issues with representing uncertainty
* The paper reviews a number of techniques for uncertainty visualisation
* And describes cognitive theories that help explain how uncertainty visualisations are perceived and understood, including:
    * Frequency framing: representing frequencies of outcomes as opposed to probabilities
    * Attribute substitution: adjusting displays to avoid mental simplifications, e.g. by animating
    * Visual boundaries: visualising without hard boundaries that can lead to categorical thinking
    * Visual semiotics: applying "natural" visual analogues of uncertainty, like blurring.
    
