diff --git a/content/.jupyter_cache/global.db b/content/.jupyter_cache/global.db index 6774f3c..2db3845 100644 Binary files a/content/.jupyter_cache/global.db and b/content/.jupyter_cache/global.db differ diff --git a/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json b/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json index 90bcaf2..7cc5099 100644 --- a/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json +++ b/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "d0a0beea1617b504412af88306c239d7", + "hash": "3a4bfe7751430b4f6a5e2d3c8dd27aa8", "result": { "engine": "jupyter", - "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#def3e6bc .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#3b99ba16 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#60c565e5 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n::: {#fdfd5559 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#1e1a3766 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#ba2b1593 .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#f02ce7d4 .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#62ef3b30 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#4b488c39 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#7b17f69f .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fa8a4426 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#ebf2b747 .cell execution_count=13}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#07b852a9 .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#dbebf70c .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#76460a1b .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1718    0.1792\n  μ_intercept   -1.1590   -1.1327\n  μ_condition    0.3157    0.3430\n  σ_intercept    0.3082    0.3228\n  σ_condition    0.0327    0.0508\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#6a414e1f .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$: The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$: The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$: Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a7089bbe .cell execution_count=18}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#bf20b174 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.3999    0.4062\n  μ_condition    0.0618    0.0721\n  σ_intercept    0.0381    0.0432\n  σ_condition    0.0104    0.0185\n  τ_intercept    0.1052    0.1130\n  τ_condition    0.0641    0.0795\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#d4d95c07 .cell execution_count=20}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next, are different from the previous distributions in that they are not characterized by \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$: A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$: A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$: A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n### Model Specification\n\n::: {#4df349b0 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#b9814b26 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0986    5.3197\n  ν_condition   -1.3387   -1.0493\n  α_intercept    1.6605    1.7456\n  α_condition    0.2060    0.3437\n  τ_intercept    0.1808    0.1870\n  τ_condition   -0.0371   -0.0231\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#cf9d1165 .cell execution_count=23}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#398a7d32 .cell execution_count=24}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -322.27 │ -0.03 │ 0.00 │\n│ Wald │ -379.85 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2465.97 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.49 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n", + "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#def3e6bc .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#3b99ba16 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#60c565e5 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n::: {#fdfd5559 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#1e1a3766 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#ba2b1593 .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#f02ce7d4 .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#62ef3b30 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#4b488c39 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#7b17f69f .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fa8a4426 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#ebf2b747 .cell execution_count=13}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#07b852a9 .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#dbebf70c .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#76460a1b .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1718    0.1792\n  μ_intercept   -1.1590   -1.1327\n  μ_condition    0.3157    0.3430\n  σ_intercept    0.3082    0.3228\n  σ_condition    0.0327    0.0508\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#6a414e1f .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a7089bbe .cell execution_count=18}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#bf20b174 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.3999    0.4062\n  μ_condition    0.0618    0.0721\n  σ_intercept    0.0381    0.0432\n  σ_condition    0.0104    0.0185\n  τ_intercept    0.1052    0.1130\n  τ_condition    0.0641    0.0795\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#d4d95c07 .cell execution_count=20}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#4df349b0 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#b9814b26 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0986    5.3197\n  ν_condition   -1.3387   -1.0493\n  α_intercept    1.6605    1.7456\n  α_condition    0.2060    0.3437\n  τ_intercept    0.1808    0.1870\n  τ_condition   -0.0371   -0.0231\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#cf9d1165 .cell execution_count=23}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#398a7d32 .cell execution_count=24}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -322.27 │ -0.03 │ 0.00 │\n│ Wald │ -379.85 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2465.97 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.49 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n", "supporting": [ "4a_rt_descriptive_files\\figure-html" ], diff --git a/content/.quarto/cites/index.json b/content/.quarto/cites/index.json index 68e2ead..8113712 100644 --- a/content/.quarto/cites/index.json +++ b/content/.quarto/cites/index.json @@ -1 +1 @@ -{"references.qmd":[],"4_1_Normal.qmd":["wagenmakers2008diffusion","theriault2024check","lo2015transform","schramm2019reaction"],"3_scales.qmd":[],"5_individual.qmd":[],"1_introduction.qmd":[],"4_rt.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"2_predictors.qmd":[],"4b_rt_generative.qmd":[],"4a_rt_descriptive.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"index.qmd":[]} +{"4_rt.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"2_predictors.qmd":[],"index.qmd":[],"references.qmd":[],"5_individual.qmd":[],"4a_rt_descriptive.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"3_scales.qmd":[],"1_introduction.qmd":[],"4b_rt_generative.qmd":[],"4_1_Normal.qmd":["wagenmakers2008diffusion","theriault2024check","lo2015transform","schramm2019reaction"]} diff --git a/content/.quarto/idx/4a_rt_descriptive.qmd.json b/content/.quarto/idx/4a_rt_descriptive.qmd.json index e6b4312..d431bc1 100644 --- a/content/.quarto/idx/4a_rt_descriptive.qmd.json +++ b/content/.quarto/idx/4a_rt_descriptive.qmd.json @@ -1 +1 @@ -{"title":"Descriptive Models","markdown":{"headingText":"Descriptive Models","containsRefs":false,"markdown":"\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n```{julia}\n#| code-fold: false\n\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n```{julia}\n#| output: false\n\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n```{julia}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n```{julia}\n#| output: false\n\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n\n```{julia}\n#| fig-width: 10\n#| fig-height: 7\n\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n```{julia}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n\n```{julia}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$: The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$: The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$: Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next, are different from the previous distributions in that they are not characterized by \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$: A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$: A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$: A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_Wald; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n```{julia}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n","srcMarkdownNoYaml":""},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":true,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"execute":true,"engine":"jupyter"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":true,"code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","output-file":"4a_rt_descriptive.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","bibliography":["references.bib"],"theme":"pulse","number-depth":3},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file +{"title":"Descriptive Models","markdown":{"headingText":"Descriptive Models","containsRefs":false,"markdown":"\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n```{julia}\n#| code-fold: false\n\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n```{julia}\n#| output: false\n\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n```{julia}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n```{julia}\n#| output: false\n\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n\n```{julia}\n#| fig-width: 10\n#| fig-height: 7\n\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n```{julia}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n\n```{julia}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_Wald; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n```{julia}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n","srcMarkdownNoYaml":""},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":true,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"execute":true,"engine":"jupyter"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":true,"code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","output-file":"4a_rt_descriptive.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","bibliography":["references.bib"],"theme":"pulse","number-depth":3},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file diff --git a/content/.quarto/xref/15f266d2 b/content/.quarto/xref/15f266d2 index d0da1ed..1d6dc7b 100644 --- a/content/.quarto/xref/15f266d2 +++ b/content/.quarto/xref/15f266d2 @@ -1 +1 @@ -{"entries":[],"headings":["very-quick-intro-to-julia-and-turing","generate-data-from-normal-distribution","recover-distribution-parameters-with-turing","linear-models","boostrapping","hierarchical-models","bayesian-estimation","bayesian-mixed-linear-regression"],"options":{"chapters":true}} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":["very-quick-intro-to-julia-and-turing","generate-data-from-normal-distribution","recover-distribution-parameters-with-turing","linear-models","boostrapping","hierarchical-models","bayesian-estimation","bayesian-mixed-linear-regression"]} \ No newline at end of file diff --git a/content/.quarto/xref/1a47137c b/content/.quarto/xref/1a47137c index b466b0b..20fe9e1 100644 --- a/content/.quarto/xref/1a47137c +++ b/content/.quarto/xref/1a47137c @@ -1 +1 @@ -{"options":{"chapters":true},"headings":[],"entries":[]} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":[]} \ No newline at end of file diff --git a/content/.quarto/xref/26afb962 b/content/.quarto/xref/26afb962 index 9b64c5e..02ac486 100644 --- a/content/.quarto/xref/26afb962 +++ b/content/.quarto/xref/26afb962 @@ -1 +1 @@ -{"headings":["categorical-predictors-condition-group","interactions","ordered-predictors-likert-scales","non-linear-relationships-polynomial-gams"],"options":{"chapters":true},"entries":[]} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":["categorical-predictors-condition-group","interactions","ordered-predictors-likert-scales","non-linear-relationships-polynomial-gams"]} \ No newline at end of file diff --git a/content/.quarto/xref/26e6880e b/content/.quarto/xref/26e6880e index 472aafc..f06678d 100644 --- a/content/.quarto/xref/26e6880e +++ b/content/.quarto/xref/26e6880e @@ -1 +1 @@ -{"headings":["the-data","gaussian-aka-linear-model","model-specification","posterior-predictive-check","scaled-gaussian-model","solution-1-directional-effect-of-condition","solution-2-avoid-exploring-negative-variance-values","the-problem-with-linear-models","shifted-lognormal-model","prior-on-minimum-rt","model-specification-1","interpretation","exgaussian-model","conditional-tau-tau-parameter","interpretation-1","shifted-wald-model","model-specification-2","model-comparison"],"options":{"chapters":true},"entries":[]} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":["the-data","gaussian-aka-linear-model","model-specification","posterior-predictive-check","scaled-gaussian-model","solution-1-directional-effect-of-condition","solution-2-avoid-exploring-negative-variance-values","the-problem-with-linear-models","shifted-lognormal-model","prior-on-minimum-rt","model-specification-1","interpretation","exgaussian-model","conditional-tau-tau-parameter","interpretation-1","shifted-wald-model","model-specification-2","model-comparison"]} \ No newline at end of file diff --git a/content/.quarto/xref/a408ff3e b/content/.quarto/xref/a408ff3e index 0aa5303..69d4f75 100644 --- a/content/.quarto/xref/a408ff3e +++ b/content/.quarto/xref/a408ff3e @@ -1 +1 @@ -{"options":{"chapters":true},"entries":[],"headings":["evidence-accumulation","drift-diffusion-model-ddm","other-models-lba-lnr","including-random-effects","additional-resources"]} \ No newline at end of file +{"headings":["evidence-accumulation","drift-diffusion-model-ddm","other-models-lba-lnr","including-random-effects","additional-resources"],"options":{"chapters":true},"entries":[]} \ No newline at end of file diff --git a/content/4a_rt_descriptive.qmd b/content/4a_rt_descriptive.qmd index 4831d24..be38dd9 100644 --- a/content/4a_rt_descriptive.qmd +++ b/content/4a_rt_descriptive.qmd @@ -362,9 +362,9 @@ However, @matzke2009psychological demonstrate that there is likely no direct cor Descriptively, the three parameters can be interpreted as: -- **Mu** $\mu$: The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution. -- **Sigma** $\sigma$: The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions. -- **Tau** $\tau$: Tail weight / skewness of the distribution. +- **Mu** $\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution. +- **Sigma** $\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions. +- **Tau** $\tau$ : Tail weight / skewness of the distribution. ::: {.callout-important} Note that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. @@ -445,18 +445,22 @@ The **Wald** distribution, also known as the **Inverse Gaussian** distribution, While we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution). However, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted]. -Note that the Wald distribution, similarly to the models that we will be covering next, are different from the previous distributions in that they are not characterized by "location" and "scale" parameters (*mu* $\mu$ and *sigma* $\sigma$). +Note that the Wald distribution, similarly to the models that we will be covering next (the "generative" models), is different from the previous distributions in that it is not characterized by a "location" and "scale" parameters (*mu* $\mu$ and *sigma* $\sigma$). Instead, the parameters of the Shifted Wald distribution are: -- **Nu** $\nu$: A **drift** parameter, corresponding to the strength of the evidence accumulation process. -- **Alpha** $\alpha$: A **threshold** parameter, corresponding to the amount of evidence required to make a decision. -- **Tau** $\tau$: A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model. +- **Nu** $\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process. +- **Alpha** $\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision. +- **Tau** $\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model. ![](media/rt_wald.gif) As we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution. Their interpretation is more complex but, as we will see below, offers a window to a new level of interpretation. +::: {.callout-note} +Explanations regarding these new parameters will be provided in the next chapter. +::: + ### Model Specification ```{julia} diff --git a/content/_freeze/4a_rt_descriptive/execute-results/html.json b/content/_freeze/4a_rt_descriptive/execute-results/html.json index 90bcaf2..7cc5099 100644 --- a/content/_freeze/4a_rt_descriptive/execute-results/html.json +++ b/content/_freeze/4a_rt_descriptive/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "d0a0beea1617b504412af88306c239d7", + "hash": "3a4bfe7751430b4f6a5e2d3c8dd27aa8", "result": { "engine": "jupyter", - "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#def3e6bc .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#3b99ba16 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#60c565e5 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n::: {#fdfd5559 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#1e1a3766 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#ba2b1593 .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#f02ce7d4 .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#62ef3b30 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#4b488c39 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#7b17f69f .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fa8a4426 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#ebf2b747 .cell execution_count=13}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#07b852a9 .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#dbebf70c .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#76460a1b .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1718    0.1792\n  μ_intercept   -1.1590   -1.1327\n  μ_condition    0.3157    0.3430\n  σ_intercept    0.3082    0.3228\n  σ_condition    0.0327    0.0508\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#6a414e1f .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$: The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$: The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$: Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a7089bbe .cell execution_count=18}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#bf20b174 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.3999    0.4062\n  μ_condition    0.0618    0.0721\n  σ_intercept    0.0381    0.0432\n  σ_condition    0.0104    0.0185\n  τ_intercept    0.1052    0.1130\n  τ_condition    0.0641    0.0795\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#d4d95c07 .cell execution_count=20}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next, are different from the previous distributions in that they are not characterized by \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$: A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$: A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$: A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n### Model Specification\n\n::: {#4df349b0 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#b9814b26 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0986    5.3197\n  ν_condition   -1.3387   -1.0493\n  α_intercept    1.6605    1.7456\n  α_condition    0.2060    0.3437\n  τ_intercept    0.1808    0.1870\n  τ_condition   -0.0371   -0.0231\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#cf9d1165 .cell execution_count=23}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#398a7d32 .cell execution_count=24}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -322.27 │ -0.03 │ 0.00 │\n│ Wald │ -379.85 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2465.97 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.49 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n", + "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-WIP-orange)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#def3e6bc .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#3b99ba16 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#60c565e5 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n- The variance $\\sigma$ (correspondong to the \"spread\" of RTs)\n- The mean $\\mu$ for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope).\n\n### Model Specification\n\n::: {#fdfd5559 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Set priors on variance, intercept and effect of condition\n σ ~ truncated(Normal(0, 0.5); lower=0)\n\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#1e1a3766 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#ba2b1593 .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#f02ce7d4 .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the distribution is also *wider* (response time is more variable). \nTypical linear model estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#62ef3b30 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#4b488c39 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#7b17f69f .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fa8a4426 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#ebf2b747 .cell execution_count=13}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** and better capturing the data.\nDespite that, the Gaussian model stil seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed.\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#07b852a9 .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#dbebf70c .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#76460a1b .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1718    0.1792\n  μ_intercept   -1.1590   -1.1327\n  μ_condition    0.3157    0.3430\n  σ_intercept    0.3082    0.3228\n  σ_condition    0.0327    0.0508\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#6a414e1f .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a7089bbe .cell execution_count=18}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#bf20b174 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.3999    0.4062\n  μ_condition    0.0618    0.0721\n  σ_intercept    0.0381    0.0432\n  σ_condition    0.0104    0.0185\n  τ_intercept    0.1052    0.1130\n  τ_condition    0.0641    0.0795\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#d4d95c07 .cell execution_count=20}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#4df349b0 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#b9814b26 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0986    5.3197\n  ν_condition   -1.3387   -1.0493\n  α_intercept    1.6605    1.7456\n  α_condition    0.2060    0.3437\n  τ_intercept    0.1808    0.1870\n  τ_condition   -0.0371   -0.0231\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#cf9d1165 .cell execution_count=23}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#398a7d32 .cell execution_count=24}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -322.27 │ -0.03 │ 0.00 │\n│ Wald │ -379.85 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2465.97 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.49 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n", "supporting": [ "4a_rt_descriptive_files\\figure-html" ],