From 06cedb385e57c10c058500cd3df0c44edcbc7b3b Mon Sep 17 00:00:00 2001 From: Dominique Makowski Date: Sun, 14 Jul 2024 11:01:59 +0100 Subject: [PATCH] correct name of function --- content/.jupyter_cache/global.db | Bin 28672 -> 28672 bytes .../execute-results/html.json | 4 ++-- content/.quarto/cites/index.json | 2 +- .../.quarto/idx/4a_rt_descriptive.qmd.json | 2 +- content/.quarto/xref/04307669 | 2 +- content/.quarto/xref/1a47137c | 2 +- content/.quarto/xref/26afb962 | 2 +- content/.quarto/xref/26e6880e | 2 +- content/.quarto/xref/a408ff3e | 2 +- content/4a_rt_descriptive.qmd | 6 +++--- .../execute-results/html.json | 4 ++-- 11 files changed, 14 insertions(+), 14 deletions(-) diff --git a/content/.jupyter_cache/global.db b/content/.jupyter_cache/global.db index 1c212dbaf4b1bc84cd6aeedbf20031d40b66414b..def5b85b494f85fc415bd951b8e13c975e57e9c0 100644 GIT binary patch delta 40 wcmZp8z}WDBae_3X%S0JxMwg8Vt3&t=4Xg|dt&C0eOpQ#;%uF^*g+7x4017V*FaQ7m delta 40 wcmZp8z}WDBae_3X^F$eEM(2$Qt3&t=EUk>ptxQbxOw0@n4UINSg+7x401C1UF#rGn diff --git a/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json b/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json index 6d2e3ff..8a6977a 100644 --- a/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json +++ b/content/.quarto/_freeze/4a_rt_descriptive/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "4d15f6b80ee77fea4abab523e7f77583", + "hash": "36c8dcab189bdaad35482fdb168f699c", "result": { "engine": "jupyter", - "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#89f33007 .cell execution_count=1}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#6ecfbcc8 .cell execution_count=2}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#3f4733f9 .cell execution_count=3}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n::: {#42b71893 .cell execution_count=4}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#f9c2bb90 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#3101f89d .cell execution_count=6}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#84537d0f .cell fig-height='7' fig-width='10' execution_count=7}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#4e9da49b .cell execution_count=8}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#0b8e6492 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#767151f6 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#7c5ba912 .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n### Solution 3: Use a \"Softplus\" Link Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **exponential link** (**TODO**: is it actually known as an \"exponential\" link or a \"log\" link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n::: {#059cb42b .cell execution_count=12}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.\n\n::: {#2f7b619d .cell execution_count=13}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n::: {#5d9f0735 .cell execution_count=14}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fc5d4ae2 .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n
\n```\n:::\n\n:::\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n::: {#c61fb9f3 .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n```\n:::\n:::\n\n\n:::\n\n#### Conclusion\n\n::: {#50ba1813 .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n```\n:::\n:::\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#a06ec458 .cell execution_count=18}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#547bfe72 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#618db966 .cell execution_count=20}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1721    0.1794\n  μ_intercept   -1.1584   -1.1274\n  μ_condition    0.3123    0.3407\n  σ_intercept    0.3073    0.3232\n  σ_condition    0.0326    0.0523\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0b3676c3 .cell execution_count=21}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a56080b3 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#8fa3f407 .cell execution_count=23}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.4002    0.4059\n  μ_condition    0.0627    0.0733\n  σ_intercept    0.0379    0.0424\n  σ_condition    0.0101    0.0186\n  τ_intercept    0.1047    0.1126\n  τ_condition    0.0631    0.0788\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0074e53b .cell execution_count=24}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=25}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#58b03fce .cell execution_count=25}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#9210576b .cell execution_count=26}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=27}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0817    5.3248\n  ν_condition   -1.3539   -1.0550\n  α_intercept    1.6585    1.7402\n  α_condition    0.2191    0.3506\n  τ_intercept    0.1818    0.1870\n  τ_condition   -0.0363   -0.0233\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#f1f511cf .cell execution_count=27}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=28}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#b9a9f998 .cell execution_count=28}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=29}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -323.67 │ -0.03 │ 0.00 │\n│ Wald │ -381.68 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2466.17 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.91 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n\n", + "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#89f33007 .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#6ecfbcc8 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#3f4733f9 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n::: {#42b71893 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#f9c2bb90 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#3101f89d .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#84537d0f .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#4e9da49b .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#0b8e6492 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#767151f6 .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#7c5ba912 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n### Solution 3: Use a \"Softplus\" Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **log link** (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n::: {#059cb42b .cell execution_count=13}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).\n\n::: {#2f7b619d .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n::: {#5d9f0735 .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fc5d4ae2 .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n
\n```\n:::\n\n:::\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n::: {#c61fb9f3 .cell execution_count=17}\n``` {.julia .cell-code code-fold=\"false\"}\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n```\n:::\n:::\n\n\n:::\n\n#### Conclusion\n\n::: {#50ba1813 .cell execution_count=18}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n```\n:::\n:::\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#a06ec458 .cell execution_count=19}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#547bfe72 .cell execution_count=20}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#618db966 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1721    0.1794\n  μ_intercept   -1.1584   -1.1274\n  μ_condition    0.3123    0.3407\n  σ_intercept    0.3073    0.3232\n  σ_condition    0.0326    0.0523\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0b3676c3 .cell execution_count=22}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a56080b3 .cell execution_count=23}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#8fa3f407 .cell execution_count=24}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.4002    0.4059\n  μ_condition    0.0627    0.0733\n  σ_intercept    0.0379    0.0424\n  σ_condition    0.0101    0.0186\n  τ_intercept    0.1047    0.1126\n  τ_condition    0.0631    0.0788\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0074e53b .cell execution_count=25}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=25}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#58b03fce .cell execution_count=26}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#9210576b .cell execution_count=27}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=27}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0817    5.3248\n  ν_condition   -1.3539   -1.0550\n  α_intercept    1.6585    1.7402\n  α_condition    0.2191    0.3506\n  τ_intercept    0.1818    0.1870\n  τ_condition   -0.0363   -0.0233\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#f1f511cf .cell execution_count=28}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=28}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#b9a9f998 .cell execution_count=29}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=29}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -323.67 │ -0.03 │ 0.00 │\n│ Wald │ -381.68 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2466.17 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.91 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n\n", "supporting": [ "4a_rt_descriptive_files\\figure-html" ], diff --git a/content/.quarto/cites/index.json b/content/.quarto/cites/index.json index a74b502..a6d1e97 100644 --- a/content/.quarto/cites/index.json +++ b/content/.quarto/cites/index.json @@ -1 +1 @@ -{"3_scales.qmd":["makowski2023novel"],"4b_rt_generative.qmd":[],"3b_choices.qmd":[],"3a_scales.qmd":["makowski2023novel"],"4_rt.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"4_1_Normal.qmd":["wagenmakers2008diffusion","theriault2024check","lo2015transform","schramm2019reaction"],"4a_rt_descriptive.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","wiemann2023using","lo2015transform","schramm2019reaction","wagenmakers2005relation","limpert2001log","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"5_individual.qmd":[],"1_introduction.qmd":[],"2_predictors.qmd":[],"index.qmd":[],"references.qmd":[]} +{"1_introduction.qmd":[],"references.qmd":[],"2_predictors.qmd":[],"4b_rt_generative.qmd":[],"3b_choices.qmd":[],"3_scales.qmd":["makowski2023novel"],"5_individual.qmd":[],"4a_rt_descriptive.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","wiemann2023using","lo2015transform","schramm2019reaction","wagenmakers2005relation","limpert2001log","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"],"index.qmd":[],"4_1_Normal.qmd":["wagenmakers2008diffusion","theriault2024check","lo2015transform","schramm2019reaction"],"3a_scales.qmd":["makowski2023novel"],"4_rt.qmd":["wagenmakers2008diffusion","heathcote2012linear","theriault2024check","lo2015transform","schramm2019reaction","balota2011moving","matzke2009psychological","hohle1965inferred","kieffaber2006switch","matzke2009psychological","schwarz2001ex","heathcote2004fitting","anders2016shifted"]} diff --git a/content/.quarto/idx/4a_rt_descriptive.qmd.json b/content/.quarto/idx/4a_rt_descriptive.qmd.json index 95816b1..05bf748 100644 --- a/content/.quarto/idx/4a_rt_descriptive.qmd.json +++ b/content/.quarto/idx/4a_rt_descriptive.qmd.json @@ -1 +1 @@ -{"title":"Descriptive Models","markdown":{"headingText":"Descriptive Models","containsRefs":false,"markdown":"\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n```{julia}\n#| code-fold: false\n\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n```{julia}\n#| output: false\n\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n```{julia}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n```{julia}\n#| output: false\n\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n\n```{julia}\n#| fig-width: 10\n#| fig-height: 7\n\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n\n### Solution 3: Use a \"Softplus\" Link Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **exponential link** (**TODO**: is it actually known as an \"exponential\" link or a \"log\" link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n\n```{julia}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.\n\n```{julia}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n```{julia}\n#| code-fold: false\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n:::\n\n#### Conclusion\n\n```{julia}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n```{julia}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n\n```{julia}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_Wald; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n```{julia}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n","srcMarkdownNoYaml":""},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":true,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"execute":true,"engine":"jupyter"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":true,"code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","output-file":"4a_rt_descriptive.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","bibliography":["references.bib"],"julia":{"exeflags":["--project=@."]},"theme":"pulse","number-depth":3},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file +{"title":"Descriptive Models","markdown":{"headingText":"Descriptive Models","containsRefs":false,"markdown":"\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n```{julia}\n#| code-fold: false\n\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n```{julia}\n#| output: false\n\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n```{julia}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n```{julia}\n#| output: false\n\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n\n```{julia}\n#| fig-width: 10\n#| fig-height: 7\n\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n\n### Solution 3: Use a \"Softplus\" Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **log link** (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n\n```{julia}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).\n\n```{julia}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n```{julia}\n#| code-fold: false\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n:::\n\n#### Conclusion\n\n```{julia}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n```{julia}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n\n```{julia}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n\n### Interpretation\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n```{julia}\n#| code-fold: false\n#| output: false\n\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n\n```{julia}\n#| code-fold: false\n\nhpd(chain_Wald; alpha=0.05)\n```\n\n```{julia}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n```{julia}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n","srcMarkdownNoYaml":""},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":true,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"execute":true,"engine":"jupyter"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":true,"code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","output-file":"4a_rt_descriptive.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","bibliography":["references.bib"],"julia":{"exeflags":["--project=@."]},"theme":"pulse","number-depth":3},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file diff --git a/content/.quarto/xref/04307669 b/content/.quarto/xref/04307669 index b466b0b..20fe9e1 100644 --- a/content/.quarto/xref/04307669 +++ b/content/.quarto/xref/04307669 @@ -1 +1 @@ -{"options":{"chapters":true},"headings":[],"entries":[]} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":[]} \ No newline at end of file diff --git a/content/.quarto/xref/1a47137c b/content/.quarto/xref/1a47137c index 1e1a761..bbf8a95 100644 --- a/content/.quarto/xref/1a47137c +++ b/content/.quarto/xref/1a47137c @@ -1 +1 @@ -{"options":{"chapters":true},"headings":["extracting-individual-parameters-from-mixed-models","population-informed-individual-bayesian-models","task-reliability"],"entries":[]} \ No newline at end of file +{"entries":[],"options":{"chapters":true},"headings":["extracting-individual-parameters-from-mixed-models","population-informed-individual-bayesian-models","task-reliability"]} \ No newline at end of file diff --git a/content/.quarto/xref/26afb962 b/content/.quarto/xref/26afb962 index c011504..daa2cb3 100644 --- a/content/.quarto/xref/26afb962 +++ b/content/.quarto/xref/26afb962 @@ -1 +1 @@ -{"entries":[],"headings":["categorical-predictors-condition-group","interactions","ordered-predictors-likert-scales","non-linear-relationships","polynomials","generalized-additive-models-gams"],"options":{"chapters":true}} \ No newline at end of file +{"headings":["categorical-predictors-condition-group","interactions","ordered-predictors-likert-scales","non-linear-relationships","polynomials","generalized-additive-models-gams"],"entries":[],"options":{"chapters":true}} \ No newline at end of file diff --git a/content/.quarto/xref/26e6880e b/content/.quarto/xref/26e6880e index c54c3ee..15e3d79 100644 --- a/content/.quarto/xref/26e6880e +++ b/content/.quarto/xref/26e6880e @@ -1 +1 @@ -{"options":{"chapters":true},"entries":[],"headings":["the-data","gaussian-aka-linear-model","model-specification","posterior-predictive-check","scaled-gaussian-model","solution-1-directional-effect-of-condition","solution-2-avoid-exploring-negative-variance-values","solution-3-use-a-softplus-link-function","exponential-transformation","softplus-function","the-model","conclusion","the-problem-with-linear-models","shifted-lognormal-model","prior-on-minimum-rt","model-specification-1","interpretation","exgaussian-model","conditional-tau-tau-parameter","interpretation-1","shifted-wald-model","model-specification-2","model-comparison","shifted-lognormal-mixed-model"]} \ No newline at end of file +{"entries":[],"headings":["the-data","gaussian-aka-linear-model","model-specification","posterior-predictive-check","scaled-gaussian-model","solution-1-directional-effect-of-condition","solution-2-avoid-exploring-negative-variance-values","solution-3-use-a-softplus-function","exponential-transformation","softplus-function","the-model","conclusion","the-problem-with-linear-models","shifted-lognormal-model","prior-on-minimum-rt","model-specification-1","interpretation","exgaussian-model","conditional-tau-tau-parameter","interpretation-1","shifted-wald-model","model-specification-2","model-comparison","shifted-lognormal-mixed-model"],"options":{"chapters":true}} \ No newline at end of file diff --git a/content/.quarto/xref/a408ff3e b/content/.quarto/xref/a408ff3e index e047f4e..4cca568 100644 --- a/content/.quarto/xref/a408ff3e +++ b/content/.quarto/xref/a408ff3e @@ -1 +1 @@ -{"headings":["evidence-accumulation","wald-distribution-revisited","drift-diffusion-model-ddm","linear-ballistic-accumulator-lba","other-models-lnr-rdm","including-random-effects","random-intercept","random-slopes","performance-tips","additional-resources"],"entries":[],"options":{"chapters":true}} \ No newline at end of file +{"headings":["evidence-accumulation","wald-distribution-revisited","drift-diffusion-model-ddm","linear-ballistic-accumulator-lba","other-models-lnr-rdm","including-random-effects","random-intercept","random-slopes","performance-tips","additional-resources"],"options":{"chapters":true},"entries":[]} \ No newline at end of file diff --git a/content/4a_rt_descriptive.qmd b/content/4a_rt_descriptive.qmd index 4e3b311..f20657d 100644 --- a/content/4a_rt_descriptive.qmd +++ b/content/4a_rt_descriptive.qmd @@ -245,7 +245,7 @@ hpd(chain_ScaledGaussian; alpha=0.05) ``` -### Solution 3: Use a "Softplus" Link Function +### Solution 3: Use a "Softplus" Function #### Exponential Transformation @@ -256,7 +256,7 @@ We have seen example of applying "non-identity" **link functions** in the previo What function could we use to transform any values of *sigma* $\sigma$ into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an "exponential" transformation to the parameter at the end. -The issue with the **exponential link** (**TODO**: is it actually known as an "exponential" link or a "log" link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in "real" values that can be problematic. +The issue with the **log link** (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in "real" values that can be problematic. ```{julia} @@ -276,7 +276,7 @@ fig #### Softplus Function Popularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using]. -It is defined as $softplus(x) = \log(1 + \exp(x))$ and its main benefit is to approximate an "identity" link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values. +It is defined as $softplus(x) = \log(1 + \exp(x))$ and its main benefit is to approximate an "identity" link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear). ```{julia} xaxis = range(-6, 6, length=1000) diff --git a/content/_freeze/4a_rt_descriptive/execute-results/html.json b/content/_freeze/4a_rt_descriptive/execute-results/html.json index 6d2e3ff..8a6977a 100644 --- a/content/_freeze/4a_rt_descriptive/execute-results/html.json +++ b/content/_freeze/4a_rt_descriptive/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "4d15f6b80ee77fea4abab523e7f77583", + "hash": "36c8dcab189bdaad35482fdb168f699c", "result": { "engine": "jupyter", - "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#89f33007 .cell execution_count=1}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#6ecfbcc8 .cell execution_count=2}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#3f4733f9 .cell execution_count=3}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n::: {#42b71893 .cell execution_count=4}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#f9c2bb90 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#3101f89d .cell execution_count=6}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#84537d0f .cell fig-height='7' fig-width='10' execution_count=7}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#4e9da49b .cell execution_count=8}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#0b8e6492 .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#767151f6 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#7c5ba912 .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n### Solution 3: Use a \"Softplus\" Link Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **exponential link** (**TODO**: is it actually known as an \"exponential\" link or a \"log\" link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n::: {#059cb42b .cell execution_count=12}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.\n\n::: {#2f7b619d .cell execution_count=13}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n::: {#5d9f0735 .cell execution_count=14}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fc5d4ae2 .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n
\n```\n:::\n\n:::\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n::: {#c61fb9f3 .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n```\n:::\n:::\n\n\n:::\n\n#### Conclusion\n\n::: {#50ba1813 .cell execution_count=17}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n```\n:::\n:::\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#a06ec458 .cell execution_count=18}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#547bfe72 .cell execution_count=19}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#618db966 .cell execution_count=20}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1721    0.1794\n  μ_intercept   -1.1584   -1.1274\n  μ_condition    0.3123    0.3407\n  σ_intercept    0.3073    0.3232\n  σ_condition    0.0326    0.0523\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0b3676c3 .cell execution_count=21}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a56080b3 .cell execution_count=22}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#8fa3f407 .cell execution_count=23}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.4002    0.4059\n  μ_condition    0.0627    0.0733\n  σ_intercept    0.0379    0.0424\n  σ_condition    0.0101    0.0186\n  τ_intercept    0.1047    0.1126\n  τ_condition    0.0631    0.0788\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0074e53b .cell execution_count=24}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=25}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#58b03fce .cell execution_count=25}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#9210576b .cell execution_count=26}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=27}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0817    5.3248\n  ν_condition   -1.3539   -1.0550\n  α_intercept    1.6585    1.7402\n  α_condition    0.2191    0.3506\n  τ_intercept    0.1818    0.1870\n  τ_condition   -0.0363   -0.0233\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#f1f511cf .cell execution_count=27}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=28}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#b9a9f998 .cell execution_count=28}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=29}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -323.67 │ -0.03 │ 0.00 │\n│ Wald │ -381.68 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2466.17 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.91 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n\n", + "markdown": "# Descriptive Models\n\n![](https://img.shields.io/badge/status-up_to_date-brightgreen)\n\n## The Data\n\nFor this chapter, we will be using the data from @wagenmakers2008diffusion - Experiment 1 [also reanalyzed by @heathcote2012linear], that contains responses and response times for several participants in two conditions (where instructions emphasized either **speed** or **accuracy**).\nUsing the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow [>2 sec instead of >3 sec, see @theriault2024check for a discussion on outlier removal].\n\n::: {#89f33007 .cell execution_count=2}\n``` {.julia .cell-code code-fold=\"false\"}\nusing Downloads, CSV, DataFrames, Random\nusing Turing, Distributions, StatsFuns, SequentialSamplingModels\nusing GLMakie\n\nRandom.seed!(123) # For reproducibility\n\ndf = CSV.read(Downloads.download(\"https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv\"), DataFrame)\n\n# Show 10 first rows\nfirst(df, 10)\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
10×5 DataFrame
RowParticipantConditionRTErrorFrequency
Int64String15Float64BoolString15
11Speed0.7falseLow
21Speed0.392trueVery Low
31Speed0.46falseVery Low
41Speed0.455falseVery Low
51Speed0.505trueLow
61Speed0.773falseHigh
71Speed0.39falseHigh
81Speed0.587trueLow
91Speed0.603falseLow
101Speed0.435falseHigh
\n```\n:::\n:::\n\n\nIn the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the `\"Speed\"` condition. \nBut how about speed? We are going to first take interest in the RT of **Correct** answers only (as we can assume that errors are underpinned by a different *generative process*). \n\nAfter filtering out the errors, we create a new column, `Accuracy`, which is the \"binarization\" of the `Condition` column, and is equal to 1 when the condition is `\"Accuracy\"` and 0 when it is `\"Speed\"`.\n\n::: {#6ecfbcc8 .cell execution_count=3}\n``` {.julia .cell-code}\ndf = df[df.Error .== 0, :]\ndf.Accuracy = df.Condition .== \"Accuracy\"\n```\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote the usage of *vectorization* `.==` as we want to compare each element of the `Condition` vector to the target `\"Accuracy\"`.\n:::\n\n::: {#3f4733f9 .cell execution_count=4}\n``` {.julia .cell-code}\nfunction plot_distribution(df, title=\"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n fig = Figure()\n ax = Axis(fig[1, 1], title=title,\n xlabel=\"RT (s)\",\n ylabel=\"Distribution\",\n yticksvisible=false,\n xticksvisible=false,\n yticklabelsvisible=false)\n Makie.density!(df[df.Condition .== \"Speed\", :RT], color=(\"#EF5350\", 0.7), label = \"Speed\")\n Makie.density!(df[df.Condition .== \"Accuracy\", :RT], color=(\"#66BB6A\", 0.7), label = \"Accuracy\")\n Makie.axislegend(\"Condition\"; position=:rt)\n Makie.ylims!(ax, (0, nothing))\n return fig\nend\n\nplot_distribution(df, \"Empirical Distribution of Data from Wagenmakers et al. (2018)\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n```\n:::\n:::\n\n\n## Gaussian (aka *Linear*) Model\n\n::: {.callout-note}\nNote that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model).\nWe will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.\n:::\n\nA linear model is the most common type of model. \nIt aims at predicting the **mean** *mu* $\\mu$ of the outcome variable using a **Normal** (aka *Gaussian*) distribution for the residuals.\nIn other words, it models the outcome $y$ as a Normal distribution with a mean *mu* $\\mu$ that is itself the result of a linear function of the predictors $X$ and a variance *sigma* $\\sigma$ that is constant across all values of the predictors.\nIt can be written as $y = Normal(\\mu, \\sigma)$, where $\\mu = intercept + slope * X$.\n\nIn order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:\n\n- **Sigma** $\\sigma$ : The variance (corresponding to the \"spread\" of RTs)\n- **Mu** $\\mu$ : The mean for the intercept (i.e., at the reference condition which is in our case `\"Speed\"`)\n- The effect of the condition (the slope) on the **mean** ($\\mu$) RT.\n\n### Model Specification\n\n::: {#42b71893 .cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Gaussian(rt; condition=nothing)\n\n # Prior on variance \n σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution\n\n # Priors on intercept and effect of condition\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n # Iterate through every observation\n for i in 1:length(rt)\n # Apply formula\n μ = μ_intercept + μ_condition * condition[i]\n # Likelihood family\n rt[i] ~ Normal(μ, σ)\n end\nend\n\n# Fit the model with the data\nfit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)\n# Sample results using MCMC\nchain_Gaussian = sample(fit_Gaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#f9c2bb90 .cell execution_count=6}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_Gaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            σ    0.1652    0.1701\n  μ_intercept    0.5071    0.5168\n  μ_condition    0.1319    0.1457\n
\n```\n:::\n\n:::\n:::\n\n\nThe effect of Condition is significant, people are on average slower (higher RT) when condition is `\"Accuracy\"`.\nBut is our model good?\n\n### Posterior Predictive Check\n\n::: {#3101f89d .cell execution_count=7}\n``` {.julia .cell-code}\npred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)\npred = Array(pred)\n```\n:::\n\n\n::: {#84537d0f .cell fig-height='7' fig-width='10' execution_count=8}\n``` {.julia .cell-code}\nfig = plot_distribution(df, \"Predictions made by Gaussian (aka Linear) Model\")\nfor i in 1:length(chain_Gaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n```\n:::\n:::\n\n\nAs you can see, the linear models are good at predicting the **mean RT** (the center of the distribution), but they are not good at capturing the **spread** and the **shape** of the data.\n\n## Scaled Gaussian Model\n\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the `Accuracy` condition. But it seems like the green distribution is also *wider* (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). \nThis is expected, as typical linear models estimate only one value for sigma $\\sigma$ for the whole model, hence the requirement for **homoscedasticity**.\n\n::: {.callout-note}\n**Homoscedasticity**, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. \nIt is important in linear models as only one value for sigma $\\sigma$ is estimated.\n:::\n\nIs it possible to set sigma $\\sigma$ as a parameter that would depend on the condition, in the same way as mu $\\mu$? In Julia, this is very simple.\n\nAll we need is to set sigma $\\sigma$ as the result of a linear function, such as $\\sigma = intercept + slope * condition$.\nThis means setting a prior on the intercept of sigma $\\sigma$ (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition.\nThis change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., `σ_condition ~ Normal(0, 0.1)`).\n\nBut this leads to an **important problem**.\n\n::: {.callout-important}\nThe combination of an intercept and a (possible negative) slope for sigma $\\sigma$ technically allows for negative variance values, which is impossible (distributions cannot have a negative variance).\nThis issue is one of the most important to address when setting up complex models for RTs.\n:::\n\nIndeed, even if we set a very narrow prior on the intercept of sigma $\\sigma$ to fix it at for instance **0.14**, and a narrow prior on the effect of condition, say $Normal(0, 0.001)$, an effect of condition of **-0.15** is still possible (albeit with very low probability). \nAnd such effect would lead to a sigma $\\sigma$ of **0.14 - 0.15 = -0.01**, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n\n### Solution 1: Directional Effect of Condition\n\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by *Truncating* the prior to a lower bound of 0. \nThis can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn't the case, we could have changed the reference factor.\nHowever, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n::: {#4e9da49b .cell execution_count=9}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#0b8e6492 .cell execution_count=10}\n``` {.julia .cell-code code-fold=\"false\"}\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n
\n```\n:::\n\n:::\n:::\n\n\nWe can see that the effect of condition on sigma $\\sigma$ is significantly positive: the variance is higher in the `Accuracy` condition as compared to the `Speed` condition. \n\n### Solution 2: Avoid Exploring Negative Variance Values\n\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma $\\sigma$ < 0).\nThis can be done by adding a conditional statement when sigma $\\sigma$ is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (`-Inf`) to push away the exploration of this impossible region.\n\n::: {#767151f6 .cell execution_count=11}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#7c5ba912 .cell execution_count=12}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n
\n```\n:::\n\n:::\n:::\n\n\n### Solution 3: Use a \"Softplus\" Function\n\n#### Exponential Transformation\n\nUsing the previous solution feels like a \"hack\" and a workaround a misspecified model.\nOne alternative approach is to apply a function to *sigma* $\\sigma$ to \"transform\" it into a positive value.\nWe have seen example of applying \"non-identity\" **link functions** in the previous chapters with the **logistic** function that transforms any value between $-\\infty$ and $+\\infty$ into a value between 0 and 1.\n\nWhat function could we use to transform any values of *sigma* $\\sigma$ into stricly positive values?\nOne option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an \"exponential\" transformation to the parameter at the end.\n\nThe issue with the **log link** (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in \"real\" values that can be problematic.\n\n::: {#059cb42b .cell execution_count=13}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n```\n:::\n:::\n\n\n#### Softplus Function\n\nPopularized by the machine learning field, the **Softplus** function is an interesting alternative [see @wiemann2023using].\nIt is defined as $softplus(x) = \\log(1 + \\exp(x))$ and its main benefit is to approximate an \"identity\" link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).\n\n::: {#2f7b619d .cell execution_count=14}\n``` {.julia .cell-code}\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n:::\n:::\n\n\n#### The Model\n\nLet us apply the Softplus transformation (available from the `StatsFuns` package) to the sigma $\\sigma$ parameter.\n\n::: {#5d9f0735 .cell execution_count=15}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n```\n:::\n\n\n::: {#fc5d4ae2 .cell execution_count=16}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ScaledGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n
\n```\n:::\n\n:::\n:::\n\n\n::: {.callout-tip title=\"Code Tip\"}\nNote that one can use call the `softplus()` to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\n::: {#c61fb9f3 .cell execution_count=17}\n``` {.julia .cell-code code-fold=\"false\"}\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n```\n:::\n:::\n\n\n:::\n\n#### Conclusion\n\n::: {#50ba1813 .cell execution_count=18}\n``` {.julia .cell-code}\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n```\n:::\n:::\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make **richer conclusions** (e.g., the Accuracy condition leads to slower and **more variable** reaction times) and better capturing the data.\nDespite that, the Gaussian model still seem to be a poor fit to the data.\n\n## The Problem with Linear Models\n\nReaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as *t*-test and ANOVAs. Importantly, linear models - by definition - will try to predict the *mean* of the outcome variable by estimating the \"best fitting\" *Normal* distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean $\\mu$ and standard deviation $\\sigma$) are not good descriptors of the data.\n\n![](media/rt_normal.gif)\n\n> Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).\n\nA popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular *log-transform*. \nHowever, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation [@lo2015transform; @schramm2019reaction].\nInstead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.\n\n\n## Shifted LogNormal Model\n\nOne of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution.\nA LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the *mean* $\\mu$ and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of $\\exp(\\mu_{condition})$). \n\nNote that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters ($\\mu$ and $\\sigma$) are not independent with respect to the mean and the standard deviation (SD).\nThe empirical SD increases when the *mean* $\\mu$ increases [which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, @wagenmakers2005relation].\n\nA **Shifted** LogNormal model introduces a shift (a delay) parameter *tau* $\\tau$ that corresponds to the minimum \"starting time\" of the response process.\n\nWe need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).\n\nWhile $Uniform(0, min(RT))$ is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case.\nIndeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. \nMoreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.\n\n### Prior on Minimum RT\n\nInstead of a $Uniform$ prior, we will use a $Gamma(1.1, 11)$ distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times. \n\n::: {#a06ec458 .cell execution_count=19}\n``` {.julia .cell-code}\nxaxis = range(0, 0.3, 1000)\nfig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label=\"Gamma(1.1, 11)\")\nvlines!([minimum(df.RT)]; color=\"red\", linestyle=:dash, label=\"Min. RT = 0.18 s\")\naxislegend()\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Specification\n\n::: {#547bfe72 .cell execution_count=20}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n τ ~ truncated(Gamma(1.1, 11); upper=min_rt)\n\n μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds\n μ_condition ~ Normal(0, exp(0.3))\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ShiftedLogNormal(μ, σ, τ)\n end\nend\n\nfit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)\nchain_LogNormal = sample(fit_LogNormal, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#618db966 .cell execution_count=21}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_LogNormal; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n            τ    0.1721    0.1794\n  μ_intercept   -1.1584   -1.1274\n  μ_condition    0.3123    0.3407\n  σ_intercept    0.3073    0.3232\n  σ_condition    0.0326    0.0523\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0b3676c3 .cell execution_count=22}\n``` {.julia .cell-code}\npred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_LogNormal)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```{=html}\n\n```\n:::\n:::\n\n\nThis model provides a much better fit to the data, and confirms that the `Accuracy` condition is associated with higher RTs and higher variability (i.e., a larger distribution width).\n\n\n::: {.callout-note}\n\n### LogNormal distributions in nature\n\nThe reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the **Central Limit Theorem**, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the *addition* of many random processes, the Normal distribution is very common in real life.\n\nHowever, it turns out that the multiplication of random variables result in a **LogNormal** distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth's crust, and the elemental mechanisms at stakes in physics and cell biolody [@limpert2001log].\n\nThus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.\n\n:::\n\n\n## ExGaussian Model\n\nAnother popular model to describe RTs uses the **ExGaussian** distribution, i.e., the *Exponentially-modified Gaussian* distribution [@balota2011moving; @matzke2009psychological].\n\nThis distribution is a convolution of normal and exponential distributions and has three parameters, namely *mu* $\\mu$ and *sigma* $\\sigma$ - the mean and standard deviation of the Gaussian distribution - and *tau* $\\tau$ - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). \nIntuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.\n\n![](media/rt_exgaussian.gif)\n\n\nBeyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of **cognitive mechanisms**, arguing for instance that changes in the Gaussian components ($\\mu$ and $\\sigma$) reflect changes in attentional processes [e.g., \"the time required for organization and execution of the motor response\"; @hohle1965inferred], whereas changes in the exponential component ($\\tau$) reflect changes in intentional (i.e., decision-related) processes [@kieffaber2006switch]. \nHowever, @matzke2009psychological demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as **descriptive tools**, rather than models of cognition *per se*.\n\nDescriptively, the three parameters can be interpreted as:\n\n- **Mu** $\\mu$ : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.\n- **Sigma** $\\sigma$ : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.\n- **Tau** $\\tau$ : Tail weight / skewness of the distribution.\n\n::: {.callout-important}\nNote that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. \nBelow is an example of different distributions with the same location (*mu* $\\mu$) and dispersion (*sigma* $\\sigma$) parameters.\nAlthough only the tail weight parameter (*tau* $\\tau$) is changed, the whole distribution appears to shift is centre of mass. \nHence, one should be careful note to interpret the values of *mu* $\\mu$ directly as the \"mean\" or the distribution peak and *sigma* $\\sigma$ as the SD or the \"width\".\n:::\n\n![](media/rt_exgaussian2.gif)\n\n### Conditional Tau $\\tau$ Parameter\n\nIn the same way as we modeled the effect of the condition on the variance component *sigma* $\\sigma$, we can do the same for any other parameters, including the exponential component *tau* $\\tau$.\nAll wee need is to set a prior on the intercept and the condition effect, and make sure that $\\tau > 0$. \n\n::: {#a56080b3 .cell execution_count=23}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_ExGaussian(rt; condition=nothing)\n\n # Priors \n μ_intercept ~ Normal(0, 1) \n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n τ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n τ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ <= 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ ExGaussian(μ, σ, τ)\n end\nend\n\nfit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)\nchain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)\n```\n:::\n\n\n### Interpretation\n\n::: {#8fa3f407 .cell execution_count=24}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_ExGaussian; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.4002    0.4059\n  μ_condition    0.0627    0.0733\n  σ_intercept    0.0379    0.0424\n  σ_condition    0.0101    0.0186\n  τ_intercept    0.1047    0.1126\n  τ_condition    0.0631    0.0788\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#0074e53b .cell execution_count=25}\n``` {.julia .cell-code}\npred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted LogNormal Model\")\nfor i in 1:length(chain_ExGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=25}\n```{=html}\n\n```\n:::\n:::\n\n\nThe ExGaussian model also provides an excellent fit to the data. \nMoreover, by modeling more parameters (including *tau* $\\tau$), we can draw more nuanced conclusions.\nIn this case, the `Accuracy` condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).\n\n## Shifted Wald Model\n\nThe **Wald** distribution, also known as the **Inverse Gaussian** distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate $\\mu$ and a diffusion rate $\\sigma$.\nWhile we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an *exponential* distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution).\nHowever, this **Ex-Wald** model [@schwarz2001ex] was shown to be less appropriate than one of its variant, the **Shifted Wald** distribution [@heathcote2004fitting; @anders2016shifted].\n\nNote that the Wald distribution, similarly to the models that we will be covering next (the \"generative\" models), is different from the previous distributions in that it is not characterized by a \"location\" and \"scale\" parameters (*mu* $\\mu$ and *sigma* $\\sigma$).\nInstead, the parameters of the Shifted Wald distribution are:\n\n- **Nu** $\\nu$ : A **drift** parameter, corresponding to the strength of the evidence accumulation process.\n- **Alpha** $\\alpha$ : A **threshold** parameter, corresponding to the amount of evidence required to make a decision.\n- **Tau** $\\tau$ : A **delay** parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.\n\n![](media/rt_wald.gif)\n\nAs we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution.\nTheir interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.\n\n::: {.callout-note}\nExplanations regarding these new parameters will be provided in the next chapter.\n:::\n\n### Model Specification\n\n::: {#58b03fce .cell execution_count=26}\n``` {.julia .cell-code code-fold=\"false\"}\n@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)\n\n # Priors \n ν_intercept ~ truncated(Normal(1, 3); lower=0)\n ν_condition ~ Normal(0, 1)\n\n α_intercept ~ truncated(Normal(0, 1); lower=0)\n α_condition ~ Normal(0, 0.5)\n\n τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)\n τ_condition ~ Normal(0, 0.01)\n\n for i in 1:length(rt)\n ν = ν_intercept + ν_condition * condition[i]\n if ν <= 0 # Avoid negative drift\n Turing.@addlogprob! -Inf\n return nothing\n end\n α = α_intercept + α_condition * condition[i]\n if α <= 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n τ = τ_intercept + τ_condition * condition[i]\n if τ < 0 # Avoid negative tau values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Wald(ν, α, τ)\n end\nend\n\nfit_Wald = model_Wald(df.RT; condition=df.Accuracy)\nchain_Wald = sample(fit_Wald, NUTS(), 600)\n```\n:::\n\n\n::: {#9210576b .cell execution_count=27}\n``` {.julia .cell-code code-fold=\"false\"}\nhpd(chain_Wald; alpha=0.05)\n```\n\n::: {.cell-output .cell-output-display execution_count=27}\n\n::: {.ansi-escaped-output}\n```{=html}\n
HPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  ν_intercept    5.0817    5.3248\n  ν_condition   -1.3539   -1.0550\n  α_intercept    1.6585    1.7402\n  α_condition    0.2191    0.3506\n  τ_intercept    0.1818    0.1870\n  τ_condition   -0.0363   -0.0233\n
\n```\n:::\n\n:::\n:::\n\n\n::: {#f1f511cf .cell execution_count=28}\n``` {.julia .cell-code}\npred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Shifted Wald Model\")\nfor i in 1:length(chain_Wald)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=28}\n```{=html}\n\n```\n:::\n:::\n\n\n### Model Comparison\n\nAt this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best.\nOne can compare the models using the **Leave-One-Out Cross-Validation (LOO-CV)** method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.\n\n::: {#b9a9f998 .cell execution_count=29}\n``` {.julia .cell-code}\nusing ParetoSmooth\n\nloo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source=\"mcmc\")\nloo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source=\"mcmc\")\nloo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source=\"mcmc\")\nloo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source=\"mcmc\")\nloo_Wald = psis_loo(fit_Wald, chain_Wald, source=\"mcmc\")\n\nloo_compare((\n Gaussian = loo_Gaussian, \n ScaledGaussian = loo_ScaledGaussian, \n LogNormal = loo_LogNormal, \n ExGaussian = loo_ExGaussian, \n Wald = loo_Wald))\n```\n\n::: {.cell-output .cell-output-display execution_count=29}\n```\n\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n┌────────────────┬──────────┬────────┬────────┐\n│ │ cv_elpd │ cv_avg │ weight │\n├────────────────┼──────────┼────────┼────────┤\n│ ExGaussian │ 0.00 │ 0.00 │ 1.00 │\n│ LogNormal │ -323.67 │ -0.03 │ 0.00 │\n│ Wald │ -381.68 │ -0.04 │ 0.00 │\n│ ScaledGaussian │ -2466.17 │ -0.26 │ 0.00 │\n│ Gaussian │ -2974.91 │ -0.31 │ 0.00 │\n└────────────────┴──────────┴────────┴────────┘\n```\n:::\n:::\n\n\nThe `loo_compare()` function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models.\nAs one can see, traditional linear models perform terribly.\n\n\n\n\n## Shifted LogNormal Mixed Model\n\n![](https://img.shields.io/badge/status-good_for_contributing-blue)\n\nComplex example walkthrough.\nAdd Random effects.\n\n", "supporting": [ "4a_rt_descriptive_files\\figure-html" ],