using Downloads, CSV, DataFrames, Random
using Turing, Distributions, StatsFuns, SequentialSamplingModels
using GLMakie
@@ -484,7 +484,7 @@ condition. But how about speed? We are going to first take interest in the RT of Correct answers only (as we can assume that errors are underpinned by a different generative process).
After filtering out the errors, we create a new column, Accuracy
, which is the “binarization” of the Condition
column, and is equal to 1 when the condition is "Accuracy"
and 0 when it is "Speed"
.
-
+
Code
= df[df.Error .== 0, :]
@@ -504,7 +504,7 @@ df .
-
+
Code
function plot_distribution(df, title="Empirical Distribution of Data from Wagenmakers et al. (2018)")
@@ -557,7 +557,7 @@
5.2.1 Model Specification
-
+
@model function model_Gaussian(rt; condition=nothing)
# Prior on variance
@@ -581,7 +581,7 @@ # Sample results using MCMC
= sample(fit_Gaussian, NUTS(), 400) chain_Gaussian
-
+
# Summary (95% CI)
hpd(chain_Gaussian; alpha=0.05)
@@ -600,14 +600,14 @@
5.2.2 Posterior Predictive Check
-
+
Code
= predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)
pred = Array(pred) pred
-
+
Code
= plot_distribution(df, "Predictions made by Gaussian (aka Linear) Model")
@@ -663,7 +663,7 @@ fig
5.3.1 Solution 1: Directional Effect of Condition
One possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.
-
+
@model function model_ScaledlGaussian(rt; condition=nothing)
# Priors
@@ -683,7 +683,7 @@ = model_ScaledlGaussian(df.RT; condition=df.Accuracy)
= sample(fit_ScaledlGaussian, NUTS(), 400) chain_ScaledGaussian
fit_ScaledlGaussian
-
+
# Summary (95% CI)
hpd(chain_ScaledGaussian; alpha=0.05)
@@ -704,7 +704,7 @@
5.3.2 Solution 2: Avoid Exploring Negative Variance Values
The other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \(\sigma\) < 0). This can be done by adding a conditional statement when sigma \(\sigma\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf
) to push away the exploration of this impossible region.
-
+
@model function model_ScaledlGaussian(rt; condition=nothing)
# Priors
@@ -728,7 +728,7 @@ = model_ScaledlGaussian(df.RT; condition=df.Accuracy)
= sample(fit_ScaledlGaussian, NUTS(), 400) chain_ScaledGaussian
fit_ScaledlGaussian
-
+
hpd(chain_ScaledGaussian; alpha=0.05)
@@ -744,14 +744,14 @@
-5.3.3 Solution 3: Use a “Softplus” Link Function
+
+5.3.3 Solution 3: Use a “Softplus” Function
Exponential Transformation
Using the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \(\sigma\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \(-\infty\) and \(+\infty\) into a value between 0 and 1.
What function could we use to transform any values of sigma \(\sigma\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.
-The issue with the exponential link (TODO: is it actually known as an “exponential” link or a “log” link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.
-
+The issue with the log link (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.
+
Code
= range(-6, 6, length=1000)
@@ -777,8 +777,8 @@ xaxis Exponential Tra
Softplus Function
-Popularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \(softplus(x) = \log(1 + \exp(x))\) and its main benefit is to approximate an “identity” link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.
-
+Popularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \(softplus(x) = \log(1 + \exp(x))\) and its main benefit is to approximate an “identity” link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).
+
Code
= range(-6, 6, length=1000)
@@ -805,7 +805,7 @@ xaxis Softplus Function
The Model
Let us apply the Softplus transformation (available from the StatsFuns
package) to the sigma \(\sigma\) parameter.
-
+
@model function model_ScaledlGaussian(rt; condition=nothing)
# Priors
@@ -825,7 +825,7 @@ The Model
= model_ScaledlGaussian(df.RT; condition=df.Accuracy)
fit_ScaledlGaussian = sample(fit_ScaledlGaussian, NUTS(), 400) chain_ScaledGaussian
-
+
hpd(chain_ScaledGaussian; alpha=0.05)
@@ -851,7 +851,7 @@ The Model
Note that one can use call the softplus()
to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).
-
+
= mean(chain_ScaledGaussian[:σ_intercept])
σ_condition0 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])
σ_condition1
@@ -868,7 +868,7 @@ The Model
Conclusion
-
+
Code
= predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)
@@ -911,7 +911,7 @@ pred
5.5.1 Prior on Minimum RT
Instead of a \(Uniform\) prior, we will use a \(Gamma(1.1, 11)\) distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times.
-
+
Code
= range(0, 0.3, 1000)
@@ -931,7 +931,7 @@ xaxis
5.5.2 Model Specification
-
+
@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)
# Priors
@@ -960,7 +960,7 @@
5.5.3 Interpretation
-
+
hpd(chain_LogNormal; alpha=0.05)
@@ -976,7 +976,7 @@
-
+
Code
= predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)
@@ -1043,7 +1043,7 @@ pred
5.6.1 Conditional Tau \(\tau\) Parameter
In the same way as we modeled the effect of the condition on the variance component sigma \(\sigma\), we can do the same for any other parameters, including the exponential component tau \(\tau\). All wee need is to set a prior on the intercept and the condition effect, and make sure that \(\tau > 0\).
-
+
@model function model_ExGaussian(rt; condition=nothing)
# Priors
@@ -1078,7 +1078,7 @@
5.6.2 Interpretation
-
+
hpd(chain_ExGaussian; alpha=0.05)
@@ -1095,7 +1095,7 @@
-
+
Code
= predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)
@@ -1144,7 +1144,7 @@ pred
5.7.1 Model Specification
-
+
@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)
# Priors
@@ -1180,7 +1180,7 @@
= model_Wald(df.RT; condition=df.Accuracy)
fit_Wald = sample(fit_Wald, NUTS(), 600) chain_Wald
-
+
hpd(chain_Wald; alpha=0.05)
@@ -1197,7 +1197,7 @@
-
+
Code
= predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)
@@ -1221,7 +1221,7 @@ pred
5.7.2 Model Comparison
At this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best. One can compare the models using the Leave-One-Out Cross-Validation (LOO-CV) method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.
-
+
Code
using ParetoSmooth
diff --git a/search.json b/search.json
index 6a80c3a..918095f 100644
--- a/search.json
+++ b/search.json
@@ -255,7 +255,7 @@
"href": "4a_rt_descriptive.html#scaled-gaussian-model",
"title": "5 Descriptive Models",
"section": "5.3 Scaled Gaussian Model",
- "text": "5.3 Scaled Gaussian Model\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the Accuracy condition. But it seems like the green distribution is also wider (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). This is expected, as typical linear models estimate only one value for sigma \\(\\sigma\\) for the whole model, hence the requirement for homoscedasticity.\n\n\n\n\n\n\nNote\n\n\n\nHomoscedasticity, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. It is important in linear models as only one value for sigma \\(\\sigma\\) is estimated.\n\n\nIs it possible to set sigma \\(\\sigma\\) as a parameter that would depend on the condition, in the same way as mu \\(\\mu\\)? In Julia, this is very simple.\nAll we need is to set sigma \\(\\sigma\\) as the result of a linear function, such as \\(\\sigma = intercept + slope * condition\\). This means setting a prior on the intercept of sigma \\(\\sigma\\) (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition. This change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., σ_condition ~ Normal(0, 0.1)).\nBut this leads to an important problem.\n\n\n\n\n\n\nImportant\n\n\n\nThe combination of an intercept and a (possible negative) slope for sigma \\(\\sigma\\) technically allows for negative variance values, which is impossible (distributions cannot have a negative variance). This issue is one of the most important to address when setting up complex models for RTs.\n\n\nIndeed, even if we set a very narrow prior on the intercept of sigma \\(\\sigma\\) to fix it at for instance 0.14, and a narrow prior on the effect of condition, say \\(Normal(0, 0.001)\\), an effect of condition of -0.15 is still possible (albeit with very low probability). And such effect would lead to a sigma \\(\\sigma\\) of 0.14 - 0.15 = -0.01, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n5.3.1 Solution 1: Directional Effect of Condition\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5081 0.5148\n μ_condition 0.1330 0.1446\n σ_intercept 0.1219 0.1271\n σ_condition 0.0714 0.0810\n\n\n\n\nWe can see that the effect of condition on sigma \\(\\sigma\\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.\n\n\n5.3.2 Solution 2: Avoid Exploring Negative Variance Values\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \\(\\sigma\\) < 0). This can be done by adding a conditional statement when sigma \\(\\sigma\\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5076 0.5148\n μ_condition 0.1316 0.1444\n σ_intercept 0.1223 0.1273\n σ_condition 0.0709 0.0803\n\n\n\n\n\n\n5.3.3 Solution 3: Use a “Softplus” Link Function\n\nExponential Transformation\nUsing the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \\(\\sigma\\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \\(-\\infty\\) and \\(+\\infty\\) into a value between 0 and 1.\nWhat function could we use to transform any values of sigma \\(\\sigma\\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.\nThe issue with the exponential link (TODO: is it actually known as an “exponential” link or a “log” link?) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nSoftplus Function\nPopularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \\(softplus(x) = \\log(1 + \\exp(x))\\) and its main benefit is to approximate an “identity” link (i.e., a linear relationship), only impacting the values close to 0 (where it is not linear) and negative values.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nThe Model\nLet us apply the Softplus transformation (available from the StatsFuns package) to the sigma \\(\\sigma\\) parameter.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5082 0.5144\n μ_condition 0.1317 0.1447\n σ_intercept -2.0420 -1.9979\n σ_condition 0.4804 0.5405\n\n\n\n\n\n\n\n\n\n\nCode Tip\n\n\n\nNote that one can use call the softplus() to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n\n\n\n\n\n\nConclusion\n\n\nCode\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make richer conclusions (e.g., the Accuracy condition leads to slower and more variable reaction times) and better capturing the data. Despite that, the Gaussian model still seem to be a poor fit to the data.",
+ "text": "5.3 Scaled Gaussian Model\nThe previous model, despite its poor fit to the data, suggests that the mean RT is higher for the Accuracy condition. But it seems like the green distribution is also wider (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). This is expected, as typical linear models estimate only one value for sigma \\(\\sigma\\) for the whole model, hence the requirement for homoscedasticity.\n\n\n\n\n\n\nNote\n\n\n\nHomoscedasticity, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. It is important in linear models as only one value for sigma \\(\\sigma\\) is estimated.\n\n\nIs it possible to set sigma \\(\\sigma\\) as a parameter that would depend on the condition, in the same way as mu \\(\\mu\\)? In Julia, this is very simple.\nAll we need is to set sigma \\(\\sigma\\) as the result of a linear function, such as \\(\\sigma = intercept + slope * condition\\). This means setting a prior on the intercept of sigma \\(\\sigma\\) (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition. This change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., σ_condition ~ Normal(0, 0.1)).\nBut this leads to an important problem.\n\n\n\n\n\n\nImportant\n\n\n\nThe combination of an intercept and a (possible negative) slope for sigma \\(\\sigma\\) technically allows for negative variance values, which is impossible (distributions cannot have a negative variance). This issue is one of the most important to address when setting up complex models for RTs.\n\n\nIndeed, even if we set a very narrow prior on the intercept of sigma \\(\\sigma\\) to fix it at for instance 0.14, and a narrow prior on the effect of condition, say \\(Normal(0, 0.001)\\), an effect of condition of -0.15 is still possible (albeit with very low probability). And such effect would lead to a sigma \\(\\sigma\\) of 0.14 - 0.15 = -0.01, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).\n\n5.3.1 Solution 1: Directional Effect of Condition\nOne possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0) # Same prior as previously\n σ_condition ~ truncated(Normal(0, 0.1); lower=0) # Enforce positivity\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\n# Summary (95% CI)\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5081 0.5148\n μ_condition 0.1330 0.1446\n σ_intercept 0.1219 0.1271\n σ_condition 0.0714 0.0810\n\n\n\n\nWe can see that the effect of condition on sigma \\(\\sigma\\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.\n\n\n5.3.2 Solution 2: Avoid Exploring Negative Variance Values\nThe other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \\(\\sigma\\) < 0). This can be done by adding a conditional statement when sigma \\(\\sigma\\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ truncated(Normal(0, 0.5); lower=0)\n σ_condition ~ Normal(0, 0.1)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n if σ < 0 # Avoid negative variance values\n Turing.@addlogprob! -Inf\n return nothing\n end\n rt[i] ~ Normal(μ, σ)\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5076 0.5148\n μ_condition 0.1316 0.1444\n σ_intercept 0.1223 0.1273\n σ_condition 0.0709 0.0803\n\n\n\n\n\n\n5.3.3 Solution 3: Use a “Softplus” Function\n\nExponential Transformation\nUsing the previous solution feels like a “hack” and a workaround a misspecified model. One alternative approach is to apply a function to sigma \\(\\sigma\\) to “transform” it into a positive value. We have seen example of applying “non-identity” link functions in the previous chapters with the logistic function that transforms any value between \\(-\\infty\\) and \\(+\\infty\\) into a value between 0 and 1.\nWhat function could we use to transform any values of sigma \\(\\sigma\\) into stricly positive values? One option that has been used is to express the parameter on the log-scale (which can include negative values) for priors and effects, and apply an “exponential” transformation to the parameter at the end.\nThe issue with the log link (i.e., expressing parameters on the log-scale and then transforming them using the exponential function) is that 1) it quickly generates very big numbers (which can slow down sampling efficiency), 2) The interpretation of the parameters and effects are not linear, which can add up to the complexity, and 3) normal priors on the log scale lead to a sharp peak in “real” values that can be problematic.\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ on the log scale\", ylabel=\"Actual value of σ\", title=\"Exponential function\")\nlines!(ax1, xaxis, exp.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ on the log scale\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after exponential transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, exp.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nxlims!(ax3, (-1, 40))\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nSoftplus Function\nPopularized by the machine learning field, the Softplus function is an interesting alternative (see Wiemann, Kneib, and Hambuckers 2023). It is defined as \\(softplus(x) = \\log(1 + \\exp(x))\\) and its main benefit is to approximate an “identity” link for larger values (i.e., a linear relationship), only impacting negative values and values close to 0 (where the link is not linear).\n\n\nCode\nxaxis = range(-6, 6, length=1000)\n\nfig = Figure()\nax1 = Axis(fig[1:2, 1], xlabel=\"Value of σ before transformation\", ylabel=\"Actual value of σ\", title=\"Softplus function\")\nablines!(ax1, [0], [1], color=:black, linestyle=:dash)\nlines!(ax1, xaxis, softplus.(xaxis), color=:red, linewidth=2)\nax2 = Axis(fig[1, 2], xlabel=\"Value of σ before transformation\", ylabel=\"Plausibility\", title=\"Prior for σ ~ Normal(0, 1)\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax2, xaxis, pdf.(Normal(0, 1), xaxis), color=:blue, linewidth=2)\nax3 = Axis(fig[2, 2], xlabel=\"Value of σ after softplus transformation\", ylabel=\"Plausibility\", yticksvisible=false, yticklabelsvisible=false,)\nlines!(ax3, softplus.(xaxis), pdf.(Normal(0, 1), xaxis), color=:green, linewidth=2)\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\n\n\nThe Model\nLet us apply the Softplus transformation (available from the StatsFuns package) to the sigma \\(\\sigma\\) parameter.\n\n@model function model_ScaledlGaussian(rt; condition=nothing)\n\n # Priors\n μ_intercept ~ truncated(Normal(0, 1); lower=0)\n μ_condition ~ Normal(0, 0.3)\n\n σ_intercept ~ Normal(0, 1)\n σ_condition ~ Normal(0, 0.3)\n\n for i in 1:length(rt)\n μ = μ_intercept + μ_condition * condition[i]\n σ = σ_intercept + σ_condition * condition[i]\n rt[i] ~ Normal(μ, softplus(σ))\n end\nend\n\nfit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)\nchain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)\n\n\nhpd(chain_ScaledGaussian; alpha=0.05)\n\n\nHPD\n parameters lower upper \n Symbol Float64 Float64 \n μ_intercept 0.5082 0.5144\n μ_condition 0.1317 0.1447\n σ_intercept -2.0420 -1.9979\n σ_condition 0.4804 0.5405\n\n\n\n\n\n\n\n\n\n\nCode Tip\n\n\n\nNote that one can use call the softplus() to transform the parameter back to the original scale, which can be useful for negative or small values (as for larger values, it becomes a 1-to-1 relationship).\n\nσ_condition0 = mean(chain_ScaledGaussian[:σ_intercept])\nσ_condition1 = σ_condition0 + mean(chain_ScaledGaussian[:σ_condition])\n\nprintln(\n \"σ for 'Speed' condition: \", round(softplus(σ_condition0); digits=4),\n \"; σ for 'Accuracy' condition: \", round(softplus(σ_condition1); digits=4)\n)\n\nσ for 'Speed' condition: 0.1245; σ for 'Accuracy' condition: 0.1999\n\n\n\n\n\n\nConclusion\n\n\nCode\npred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)\npred = Array(pred)\n\nfig = plot_distribution(df, \"Predictions made by Scaled Gaussian Model\")\nfor i in 1:length(chain_ScaledGaussian)\n lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, \"#388E3C\", \"#D32F2F\"), alpha=0.1)\nend\nfig\n\n\n┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.\n└ @ Makie C:\\Users\\domma\\.julia\\packages\\Makie\\VRavR\\src\\scenes.jl:220\n\n\n\n\n\nAlthough relaxing the homoscedasticity assumption is a good step forward, allowing us to make richer conclusions (e.g., the Accuracy condition leads to slower and more variable reaction times) and better capturing the data. Despite that, the Gaussian model still seem to be a poor fit to the data.",
"crumbs": [
"Reaction Times",
"5 Descriptive Models"