From 6634975f57749cae1d826fb071a1d2d610ed2d35 Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Sun, 14 Jul 2024 10:03:32 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- 4a_rt_descriptive.html | 66 +++++++++++++++++++++--------------------- search.json | 2 +- 3 files changed, 35 insertions(+), 35 deletions(-) diff --git a/.nojekyll b/.nojekyll index 972a844..86be48e 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -88c902ed \ No newline at end of file +7524331c \ No newline at end of file diff --git a/4a_rt_descriptive.html b/4a_rt_descriptive.html index 0a82c1f..f7dbf21 100644 --- a/4a_rt_descriptive.html +++ b/4a_rt_descriptive.html @@ -315,7 +315,7 @@

Table of contents

  • 5.4 The Problem with Linear Models
  • 5.5 Shifted LogNormal Model @@ -364,7 +364,7 @@

    5 

    5.1 The Data

    For this chapter, we will be using the data from Wagenmakers et al. (2008) - Experiment 1 (also reanalyzed by Heathcote and Love 2012), that contains responses and response times for several participants in two conditions (where instructions emphasized either speed or accuracy). Using the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow (>2 sec instead of >3 sec, see Thériault et al. 2024 for a discussion on outlier removal).

    -
    +
    using Downloads, CSV, DataFrames, Random
     using Turing, Distributions, StatsFuns, SequentialSamplingModels
     using GLMakie
    @@ -484,7 +484,7 @@ 

    condition. But how about speed? We are going to first take interest in the RT of Correct answers only (as we can assume that errors are underpinned by a different generative process).

    After filtering out the errors, we create a new column, Accuracy, which is the “binarization” of the Condition column, and is equal to 1 when the condition is "Accuracy" and 0 when it is "Speed".

    -
    +
    Code
    df = df[df.Error .== 0, :]
    @@ -504,7 +504,7 @@ 

    .

    -
    +
    Code
    function plot_distribution(df, title="Empirical Distribution of Data from Wagenmakers et al. (2018)")
    @@ -557,7 +557,7 @@ 

    5.2.1 Model Specification

    -
    +
    @model function model_Gaussian(rt; condition=nothing)
     
         # Prior on variance 
    @@ -581,7 +581,7 @@ 

    # Sample results using MCMC chain_Gaussian = sample(fit_Gaussian, NUTS(), 400)

    -
    +
    # Summary (95% CI)
     hpd(chain_Gaussian; alpha=0.05)
    @@ -600,14 +600,14 @@

    5.2.2 Posterior Predictive Check

    -
    +
    Code
    pred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)
     pred = Array(pred)
    -
    +
    Code
    fig = plot_distribution(df, "Predictions made by Gaussian (aka Linear) Model")
    @@ -663,7 +663,7 @@ 

    5.3.1 Solution 1: Directional Effect of Condition

    One possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.

    -
    +
    @model function model_ScaledlGaussian(rt; condition=nothing)
     
         # Priors
    @@ -683,7 +683,7 @@ 

    fit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy) chain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)

    -
    +
    # Summary (95% CI)
     hpd(chain_ScaledGaussian; alpha=0.05)
    @@ -704,7 +704,7 @@

    5.3.2 Solution 2: Avoid Exploring Negative Variance Values

    The other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \(\sigma\) < 0). This can be done by adding a conditional statement when sigma \(\sigma\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.

    -
    +
    @model function model_ScaledlGaussian(rt; condition=nothing)
     
         # Priors
    @@ -728,7 +728,7 @@ 

    fit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy) chain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)

    -
    +
    hpd(chain_ScaledGaussian; alpha=0.05)
    @@ -744,14 +744,14 @@

    -

    5.3.3 Solution 3: Use a “Softplus” Link Function

    +
    +

    5.3.3 Solution 3: Use a “Softplus” Function

    Exponential Transformation

    Using the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \(\sigma\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \(-\infty\) and \(+\infty\) into a value between 0 and 1.

    What function could we use to transform any values of sigma \(\sigma\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.

    -

    The issue with the exponential link (TODO: is it actually known as an “exponential” link or a “log” link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.

    -
    +

    The issue with the log link (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.

    +
    Code
    xaxis = range(-6, 6, length=1000)
    @@ -777,8 +777,8 @@ 

    Exponential Tra

    Softplus Function

    -

    Popularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \(softplus(x) = \log(1 + \exp(x))\) and its main benefit is to approximate an “identity” link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.

    -
    +

    Popularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \(softplus(x) = \log(1 + \exp(x))\) and its main benefit is to approximate an “identity” link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).

    +
    Code
    xaxis = range(-6, 6, length=1000)
    @@ -805,7 +805,7 @@ 

    Softplus Function

    The Model

    Let us apply the Softplus transformation (available from the StatsFuns package) to the sigma \(\sigma\) parameter.

    -
    +
    @model function model_ScaledlGaussian(rt; condition=nothing)
     
         # Priors
    @@ -825,7 +825,7 @@ 

    The Model

    fit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy) chain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)
    -
    +
    hpd(chain_ScaledGaussian; alpha=0.05)
    @@ -851,7 +851,7 @@

    The Model

    Note that one can use call the softplus() to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).

    -
    +
    σ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])
     σ_condition1 = σ_condition0 +  mean(chain_ScaledGaussian[:σ_condition])
     
    @@ -868,7 +868,7 @@ 

    The Model

    Conclusion

    -
    +
    Code
    pred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)
    @@ -911,7 +911,7 @@ 

    5.5.1 Prior on Minimum RT

    Instead of a \(Uniform\) prior, we will use a \(Gamma(1.1, 11)\) distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times.

    -
    +
    Code
    xaxis = range(0, 0.3, 1000)
    @@ -931,7 +931,7 @@ 

    5.5.2 Model Specification

    -
    +
    @model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)
     
         # Priors 
    @@ -960,7 +960,7 @@ 

    5.5.3 Interpretation

    -
    +
    hpd(chain_LogNormal; alpha=0.05)
    @@ -976,7 +976,7 @@

    -
    +
    Code
    pred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)
    @@ -1043,7 +1043,7 @@ 

    5.6.1 Conditional Tau \(\tau\) Parameter

    In the same way as we modeled the effect of the condition on the variance component sigma \(\sigma\), we can do the same for any other parameters, including the exponential component tau \(\tau\). All wee need is to set a prior on the intercept and the condition effect, and make sure that \(\tau > 0\).

    -
    +
    @model function model_ExGaussian(rt; condition=nothing)
     
         # Priors 
    @@ -1078,7 +1078,7 @@ 

    5.6.2 Interpretation

    -
    +
    hpd(chain_ExGaussian; alpha=0.05)
    @@ -1095,7 +1095,7 @@

    -
    +
    Code
    pred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)
    @@ -1144,7 +1144,7 @@ 

    5.7.1 Model Specification

    -
    +
    @model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)
     
         # Priors 
    @@ -1180,7 +1180,7 @@ 

    fit_Wald = model_Wald(df.RT; condition=df.Accuracy) chain_Wald = sample(fit_Wald, NUTS(), 600)

    -
    +
    hpd(chain_Wald; alpha=0.05)
    @@ -1197,7 +1197,7 @@

    -
    +
    Code
    pred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)
    @@ -1221,7 +1221,7 @@ 

    5.7.2 Model Comparison

    At this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best. One can compare the models using the Leave-One-Out Cross-Validation (LOO-CV) method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.

    -
    +
    Code
    using ParetoSmooth
    diff --git a/search.json b/search.json
    index 6a80c3a..918095f 100644
    --- a/search.json
    +++ b/search.json
    @@ -255,7 +255,7 @@
         "href": "4a_rt_descriptive.html#scaled-gaussian-model",
         "title": "5  Descriptive Models",
         "section": "5.3 Scaled Gaussian Model",
    -    "text": "5.3 Scaled Gaussian Model\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the Accuracy condition. But it seems like the green distribution is also wider (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). This is expected, as typical linear models estimate only one value for sigma \\(\\sigma\\) for the whole model, hence the requirement for homoscedasticity.\n\n\n\n\n\n\nNote\n\n\n\nHomoscedasticity, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. It is important in linear models as only one value for sigma \\(\\sigma\\) is estimated.\n\n\nIs it possible to set sigma \\(\\sigma\\) as a parameter that would depend on the condition, in the same way as mu \\(\\mu\\)? In Julia, this is very simple.\nAll we need is to set sigma \\(\\sigma\\) as the result of a linear function, such as \\(\\sigma = intercept + slope * condition\\). This means setting a prior on the intercept of sigma \\(\\sigma\\) (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition. This change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., σ_condition ~ Normal(0, 0.1)).\nBut this leads to an important problem.\n\n\n\n\n\n\nImportant\n\n\n\nThe combination of an intercept and a (possible negative) slope for sigma \\(\\sigma\\) technically allows for negative variance values, which is impossible (distributions cannot have a negative variance). This issue is one of the most important to address when setting up complex models for RTs.\n\n\nIndeed, even if we set a very narrow prior on the intercept of sigma \\(\\sigma\\) to fix it at for instance 0.14, and a narrow prior on the effect of condition, say \\(Normal(0, 0.001)\\), an effect of condition of -0.15 is still possible (albeit with very low probability). And such effect would lead to a sigma \\(\\sigma\\) of 0.14 - 0.15 = -0.01, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n5.3.1 Solution 1: Directional Effect of Condition\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)  # Same prior as previously\n    σ_condition ~ truncated(Normal(0, 0.1); lower=0)  # Enforce positivity\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        rt[i] ~ Normal(μ, σ)\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n\n\n\n\nWe can see that the effect of condition on sigma \\(\\sigma\\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.\n\n\n5.3.2 Solution 2: Avoid Exploring Negative Variance Values\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \\(\\sigma\\) < 0). This can be done by adding a conditional statement when sigma \\(\\sigma\\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n    σ_condition ~ Normal(0, 0.1)\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        if σ < 0  # Avoid negative variance values\n            Turing.@addlogprob! -Inf\n            return nothing\n        end\n        rt[i] ~ Normal(μ, σ)\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n\n\n\n\n\n\n5.3.3 Solution 3: Use a “Softplus” Link Function\n\nExponential Transformation\nUsing the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \\(\\sigma\\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \\(-\\infty\\) and \\(+\\infty\\) into a value between 0 and 1.\nWhat function could we use to transform any values of sigma \\(\\sigma\\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.\nThe issue with the exponential link (TODO: is it actually known as an “exponential” link or a “log” link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nSoftplus Function\nPopularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \\(softplus(x) = \\log(1 + \\exp(x))\\) and its main benefit is to approximate an “identity” link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nThe Model\nLet us apply the Softplus transformation (available from the StatsFuns package) to the sigma \\(\\sigma\\) parameter.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ Normal(0, 1)\n    σ_condition ~ Normal(0, 0.3)\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        rt[i] ~ Normal(μ, softplus(σ))\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n\n\n\n\n\n\n\n\n\n\nCode Tip\n\n\n\nNote that one can use call the softplus() to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 +  mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n    \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n    \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n\n\n\n\n\n\nConclusion\n\n\nCode\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make richer conclusions (e.g., the Accuracy condition leads to slower and more variable reaction times) and better capturing the data. Despite that, the Gaussian model still seem to be a poor fit to the data.",
    +    "text": "5.3 Scaled Gaussian Model\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the Accuracy condition. But it seems like the green distribution is also wider (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). This is expected, as typical linear models estimate only one value for sigma \\(\\sigma\\) for the whole model, hence the requirement for homoscedasticity.\n\n\n\n\n\n\nNote\n\n\n\nHomoscedasticity, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. It is important in linear models as only one value for sigma \\(\\sigma\\) is estimated.\n\n\nIs it possible to set sigma \\(\\sigma\\) as a parameter that would depend on the condition, in the same way as mu \\(\\mu\\)? In Julia, this is very simple.\nAll we need is to set sigma \\(\\sigma\\) as the result of a linear function, such as \\(\\sigma = intercept + slope * condition\\). This means setting a prior on the intercept of sigma \\(\\sigma\\) (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition. This change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., σ_condition ~ Normal(0, 0.1)).\nBut this leads to an important problem.\n\n\n\n\n\n\nImportant\n\n\n\nThe combination of an intercept and a (possible negative) slope for sigma \\(\\sigma\\) technically allows for negative variance values, which is impossible (distributions cannot have a negative variance). This issue is one of the most important to address when setting up complex models for RTs.\n\n\nIndeed, even if we set a very narrow prior on the intercept of sigma \\(\\sigma\\) to fix it at for instance 0.14, and a narrow prior on the effect of condition, say \\(Normal(0, 0.001)\\), an effect of condition of -0.15 is still possible (albeit with very low probability). And such effect would lead to a sigma \\(\\sigma\\) of 0.14 - 0.15 = -0.01, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n5.3.1 Solution 1: Directional Effect of Condition\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)  # Same prior as previously\n    σ_condition ~ truncated(Normal(0, 0.1); lower=0)  # Enforce positivity\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        rt[i] ~ Normal(μ, σ)\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5081    0.5148\n  μ_condition    0.1330    0.1446\n  σ_intercept    0.1219    0.1271\n  σ_condition    0.0714    0.0810\n\n\n\n\nWe can see that the effect of condition on sigma \\(\\sigma\\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.\n\n\n5.3.2 Solution 2: Avoid Exploring Negative Variance Values\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \\(\\sigma\\) < 0). This can be done by adding a conditional statement when sigma \\(\\sigma\\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n    σ_condition ~ Normal(0, 0.1)\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        if σ < 0  # Avoid negative variance values\n            Turing.@addlogprob! -Inf\n            return nothing\n        end\n        rt[i] ~ Normal(μ, σ)\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5076    0.5148\n  μ_condition    0.1316    0.1444\n  σ_intercept    0.1223    0.1273\n  σ_condition    0.0709    0.0803\n\n\n\n\n\n\n5.3.3 Solution 3: Use a “Softplus” Function\n\nExponential Transformation\nUsing the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \\(\\sigma\\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \\(-\\infty\\) and \\(+\\infty\\) into a value between 0 and 1.\nWhat function could we use to transform any values of sigma \\(\\sigma\\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.\nThe issue with the log link (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nSoftplus Function\nPopularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \\(softplus(x) = \\log(1 + \\exp(x))\\) and its main benefit is to approximate an “identity” link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nThe Model\nLet us apply the Softplus transformation (available from the StatsFuns package) to the sigma \\(\\sigma\\) parameter.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n    # Priors\n    μ_intercept ~ truncated(Normal(0, 1); lower=0)\n    μ_condition ~ Normal(0, 0.3)\n\n    σ_intercept ~ Normal(0, 1)\n    σ_condition ~ Normal(0, 0.3)\n\n    for i in 1:length(rt)\n        μ = μ_intercept + μ_condition * condition[i]\n        σ = σ_intercept + σ_condition * condition[i]\n        rt[i] ~ Normal(μ, softplus(σ))\n    end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n   parameters     lower     upper \n       Symbol   Float64   Float64 \n  μ_intercept    0.5082    0.5144\n  μ_condition    0.1317    0.1447\n  σ_intercept   -2.0420   -1.9979\n  σ_condition    0.4804    0.5405\n\n\n\n\n\n\n\n\n\n\nCode Tip\n\n\n\nNote that one can use call the softplus() to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 +  mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n    \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n    \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n\n\n\n\n\n\nConclusion\n\n\nCode\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make richer conclusions (e.g., the Accuracy condition leads to slower and more variable reaction times) and better capturing the data. Despite that, the Gaussian model still seem to be a poor fit to the data.",
         "crumbs": [
           "Reaction Times",
           "5  Descriptive Models"