From b84ae09b7e21392fade0505e735edc3d51fe4ddd Mon Sep 17 00:00:00 2001
From: Quarto GHA Workflow Runner
Date: Fri, 12 Jul 2024 21:11:43 +0000
Subject: [PATCH] Built site for gh-pages
---
.nojekyll | 2 +-
3_scales.html | 102 ++++-
4a_rt_descriptive.html | 887 ++++++++++++++++++++++++++++-------------
references.html | 6 +
search.json | 20 +-
5 files changed, 709 insertions(+), 308 deletions(-)
diff --git a/.nojekyll b/.nojekyll
index 0d1d344..0e92562 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-78d717c0
\ No newline at end of file
+2a942af7
\ No newline at end of file
diff --git a/3_scales.html b/3_scales.html
index f3c261d..4d9f39a 100644
--- a/3_scales.html
+++ b/3_scales.html
@@ -21,7 +21,27 @@
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
-
+/* CSS for citations */
+div.csl-bib-body { }
+div.csl-entry {
+ clear: both;
+ margin-bottom: 0em;
+}
+.hanging-indent div.csl-entry {
+ margin-left:2em;
+ text-indent:-2em;
+}
+div.csl-left-margin {
+ min-width:2em;
+ float:left;
+}
+div.csl-right-inline {
+ margin-left:2em;
+ padding-left:1em;
+}
+div.csl-indent {
+ margin-left: 2em;
+}
@@ -244,12 +264,63 @@
Despite this fact, we still most often use linear models to analyze these data, which is not ideal as it assumes that the dependent variable is continuous and normally distributed.
3.1.1 The Problem with Linear Models
+
Let’s take the data from Makowski et al. (2023) that contains data from participants that underwent the Mini-IPIP6 personality test and the PID-5 BF questionnaire for “maladaptive” personality. We will focus on the “Disinhibition” trait from the PID-5 BF questionnaire. Note that altough it is usually computed as the average of items a 4-point Likert scales (0-3), this study used analog slides to obtain more finer-grained scores.
+
#| code-fold: false
+
+using Downloads, CSV, DataFrames, Random
+using Turing, Distributions, SequentialSamplingModels
+using GLMakie
+
+Random.seed!(123) # For reproducibility
+
+df = CSV.read(Downloads.download("https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/makowski2023.csv"), DataFrame)
+
+# Show 10 first rows
+first(df, 10)
+
+# Plot the distribution of "Disinhibition"
+hist(df.Disinhibition, normalization = :pdf, color=:darkred)
+
We will then fit a simple Gaussian model (an “intercept-only” linear model) that estimates the mean and the standard deviation of our variable of interest.
+
#| code-fold: false
+#| output: false
+
+@model function model_Gaussian(x)
+
+ # Priors
+ σ ~ truncated(Normal(0, 1); lower=0) # Strictly positive half normal distribution
+ μ ~ Normal(0, 3)
+
+ # Iterate through every observation
+ for i in 1:length(x)
+ # Likelihood family
+ x[i] ~ Normal(μ, σ)
+ end
+end
+
+# Fit the model with the data
+fit_Gaussian = model_Gaussian(df.Disinhibition)
+# Sample results using MCMC
+chain_Gaussian = sample(fit_Gaussian, NUTS(), 400)
+
Let see if the model managed to recover the mean and standard deviation of the data:
+
println("Mean of the data: $(round(mean(df.Disinhibition); digits=3)) vs. mean from the model: $(round(mean(chain_Gaussian[:μ]); digits=3))")
+println("SD of the data: $(round(std(df.Disinhibition); digits=3)) vs. SD from the model: $(round(mean(chain_Gaussian[:σ]); digits=3))")
+
Impressive! The model managed to almost perfectly recover the mean and standard deviation of the data. That means we must have a good model, right? Not so fast!
+
Linear models are by definition designed at recovering the mean of the outcome variables (and its SD, assuming it is inavriant across groups). That does not mean that they can capture the full complexity of the data.
+
Let us then jump straight into generating predictions from the model and plotting the results against the actual data to see how well the model fits the data (a procedure called the posterior predictive check).
+
#| output: false
+
+pred = predict(model_Gaussian([(missing) for i in 1:length(df.Disinhibition)]), chain_Gaussian)
+pred = Array(pred)
+
fig = hist(df.Disinhibition, normalization = :pdf, color=:darkred)
+for i in 1:length(chain_Gaussian)
+ lines!(Makie.KernelDensity.kde(pred[i, :]), alpha=0.1, color=:black)
+end
+fig
+
As we can see, the model assumes that the data is normally distributed, in a way that allows for negative values and values above 3, which are not possible. In other words, a linear model might be the best choice for our data.
3.1.2 Rescaling
-
#| code-fold: false
-
-
+
Continuous variables can be trivially rescaled, which is often done to improve the interpretability of the results. For instance, a z-score is a rescaled variable with a mean of 0 and a standard deviation of 1.
3.1.3 Beta Models
@@ -285,15 +356,15 @@
#| eval: false
-@model function model_Beta(x)
- μ ~ Beta(1, 1)
- σ ~ Uniform(eps(typeof(μ)), μ * (1 - μ) - eps(typeof(μ)))
- for i in 1:length(x)
- x[i] ~ MeanVarBeta(μ, σ)
- end
-end
-chains = sample(model_Beta(rand(MeanVarBeta(0.5, 0.2), 200)), NUTS(), 500;
- initial_params=[0.5, 0.1])
+# @model function model_Beta(x)
+# μ ~ Beta(1, 1)
+# σ ~ Uniform(eps(typeof(μ)), μ * (1 - μ) - eps(typeof(μ)))
+# for i in 1:length(x)
+# x[i] ~ MeanVarBeta(μ, σ)
+# end
+# end
+# chains = sample(model_Beta(rand(MeanVarBeta(0.5, 0.2), 200)), NUTS(), 500;
+# initial_params=[0.5, 0.1])
3.1.4 OrdBeta Models
@@ -311,6 +382,11 @@
+
+Makowski, Dominique, An Shu Te, Stephanie Kirk, Ngoi Zi Liang, and SH Annabel Chen. 2023. “A Novel Visual Illusion Paradigm Provides Evidence for a General Factor of Illusion Sensitivity and Personality Correlates.”Scientific Reports 13 (1): 6594.
+
For this chapter, we will be using the data from Wagenmakers et al. (2008) - Experiment 1 (also reanalyzed by Heathcote and Love 2012), that contains responses and response times for several participants in two conditions (where instructions emphasized either speed or accuracy). Using the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow (>2 sec instead of >3 sec, see Thériault et al. 2024 for a discussion on outlier removal).
-
#| code-fold: false
-
-using Downloads, CSV, DataFrames, Random
-using Turing, Distributions, SequentialSamplingModels
-using GLMakie
-
-Random.seed!(123) # For reproducibility
-
-df = CSV.read(Downloads.download("https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv"), DataFrame)
-
-# Show 10 first rows
-first(df, 10)
+
+
usingDownloads, CSV, DataFrames, Random
+usingTuring, Distributions, SequentialSamplingModels
+usingGLMakie
+
+Random.seed!(123) # For reproducibility
+
+df = CSV.read(Downloads.download("https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv"), DataFrame)
+
+# Show 10 first rows
+first(df, 10)
+
+
10×5 DataFrame
+
+
+
+
Row
+
Participant
+
Condition
+
RT
+
Error
+
Frequency
+
+
+
+
Int64
+
String15
+
Float64
+
Bool
+
String15
+
+
+
+
+
1
+
1
+
Speed
+
0.7
+
false
+
Low
+
+
+
2
+
1
+
Speed
+
0.392
+
true
+
Very Low
+
+
+
3
+
1
+
Speed
+
0.46
+
false
+
Very Low
+
+
+
4
+
1
+
Speed
+
0.455
+
false
+
Very Low
+
+
+
5
+
1
+
Speed
+
0.505
+
true
+
Low
+
+
+
6
+
1
+
Speed
+
0.773
+
false
+
High
+
+
+
7
+
1
+
Speed
+
0.39
+
false
+
High
+
+
+
8
+
1
+
Speed
+
0.587
+
true
+
Low
+
+
+
9
+
1
+
Speed
+
0.603
+
false
+
Low
+
+
+
10
+
1
+
Speed
+
0.435
+
false
+
High
+
+
+
+
+
+
In the previous chapter, we modelled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the "Speed" condition. But how about speed? We are going to first take interest in the RT of Correct answers only (as we can assume that errors are underpinned by a different generative process).
After filtering out the errors, we create a new column, Accuracy, which is the “binarization” of the Condition column, and is equal to 1 when the condition is "Accuracy" and 0 when it is "Speed".
function plot_distribution(df, title="Empirical Distribution of Data from Wagenmakers et al. (2018)")
- fig = Figure()
- ax = Axis(fig[1, 1], title=title,
- xlabel="RT (s)",
- ylabel="Distribution",
- yticksvisible=false,
- xticksvisible=false,
- yticklabelsvisible=false)
- Makie.density!(df[df.Condition .== "Speed", :RT], color=("#EF5350", 0.7), label = "Speed")
- Makie.density!(df[df.Condition .== "Accuracy", :RT], color=("#66BB6A", 0.7), label = "Accuracy")
- Makie.axislegend("Condition"; position=:rt)
- Makie.ylims!(ax, (0, nothing))
- return fig
-end
-
-plot_distribution(df, "Empirical Distribution of Data from Wagenmakers et al. (2018)")
+
+
+Code
+
functionplot_distribution(df, title="Empirical Distribution of Data from Wagenmakers et al. (2018)")
+ fig =Figure()
+ ax =Axis(fig[1, 1], title=title,
+ xlabel="RT (s)",
+ ylabel="Distribution",
+ yticksvisible=false,
+ xticksvisible=false,
+ yticklabelsvisible=false)
+ Makie.density!(df[df.Condition.=="Speed", :RT], color=("#EF5350", 0.7), label ="Speed")
+ Makie.density!(df[df.Condition.=="Accuracy", :RT], color=("#66BB6A", 0.7), label ="Accuracy")
+ Makie.axislegend("Condition"; position=:rt)
+ Makie.ylims!(ax, (0, nothing))
+return fig
+end
+
+plot_distribution(df, "Empirical Distribution of Data from Wagenmakers et al. (2018)")
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
4.2 Gaussian (aka Linear) Model
@@ -380,51 +538,73 @@
4.2.1 Model Specification
-
#| code-fold: false
-#| output: false
-
-@model function model_Gaussian(rt; condition=nothing)
-
- # Prior on variance
- σ ~ truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution
-
- # Priors on intercept and effect of condition
- μ_intercept ~ truncated(Normal(0, 1); lower=0)
- μ_condition ~ Normal(0, 0.3)
-
- # Iterate through every observation
- for i in 1:length(rt)
- # Apply formula
- μ = μ_intercept + μ_condition * condition[i]
- # Likelihood family
- rt[i] ~ Normal(μ, σ)
- end
-end
-
-# Fit the model with the data
-fit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)
-# Sample results using MCMC
-chain_Gaussian = sample(fit_Gaussian, NUTS(), 400)
@modelfunctionmodel_Gaussian(rt; condition=nothing)
+
+# Prior on variance
+ σ ~truncated(Normal(0, 0.5); lower=0) # Strictly positive half normal distribution
+
+# Priors on intercept and effect of condition
+ μ_intercept ~truncated(Normal(0, 1); lower=0)
+ μ_condition ~Normal(0, 0.3)
+
+# Iterate through every observation
+for i in1:length(rt)
+# Apply formula
+ μ = μ_intercept + μ_condition * condition[i]
+# Likelihood family
+ rt[i] ~Normal(μ, σ)
+end
+end
+
+# Fit the model with the data
+fit_Gaussian =model_Gaussian(df.RT; condition=df.Accuracy)
+# Sample results using MCMC
+chain_Gaussian =sample(fit_Gaussian, NUTS(), 400)
The effect of Condition is significant, people are on average slower (higher RT) when condition is "Accuracy". But is our model good?
4.2.2 Posterior Predictive Check
-
#| output: false
-
-pred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)
-pred = Array(pred)
-
#| fig-width: 10
-#| fig-height: 7
-
-fig = plot_distribution(df, "Predictions made by Gaussian (aka Linear) Model")
-for i in 1:length(chain_Gaussian)
- lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
-end
-fig
+
+
+Code
+
pred =predict(model_Gaussian([(missing) for i in1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)
+pred =Array(pred)
+
+
+
+
+Code
+
fig =plot_distribution(df, "Predictions made by Gaussian (aka Linear) Model")
+for i in1:length(chain_Gaussian)
+lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] ==1, "#388E3C", "#D32F2F"), alpha=0.1)
+end
+fig
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
As you can see, the linear models are good at predicting the mean RT (the center of the distribution), but they are not good at capturing the spread and the shape of the data.
@@ -464,72 +644,106 @@
4.3.1 Solution 1: Directional Effect of Condition
One possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.
We can see that the effect of condition on sigma \(\sigma\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.
The other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \(\sigma\) < 0). This can be done by adding a conditional statement when sigma \(\sigma\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.
pred = predict(model_ScaledlGaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)
-pred = Array(pred)
-
-fig = plot_distribution(df, "Predictions made by Scaled Gaussian Model")
-for i in 1:length(chain_ScaledGaussian)
- lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
-end
-fig
pred =predict(model_ScaledlGaussian([(missing) for i in1:length(df.RT)], condition=df.Accuracy), chain_ScaledGaussian)
+pred =Array(pred)
+
+fig =plot_distribution(df, "Predictions made by Scaled Gaussian Model")
+for i in1:length(chain_ScaledGaussian)
+lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] ==1, "#388E3C", "#D32F2F"), alpha=0.1)
+end
+fig
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
@@ -555,55 +769,91 @@
4.5.1 Prior on Minimum RT
Instead of a \(Uniform\) prior, we will use a \(Gamma(1.1, 11)\) distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times.
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
4.5.2 Model Specification
-
#| code-fold: false
-#| output: false
-
-@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)
-
- # Priors
- τ ~ truncated(Gamma(1.1, 11); upper=min_rt)
-
- μ_intercept ~ Normal(0, exp(1)) # On the log-scale: exp(μ) to get value in seconds
- μ_condition ~ Normal(0, exp(0.3))
-
- σ_intercept ~ truncated(Normal(0, 0.5); lower=0)
- σ_condition ~ Normal(0, 0.1)
-
- for i in 1:length(rt)
- μ = μ_intercept + μ_condition * condition[i]
- σ = σ_intercept + σ_condition * condition[i]
- if σ < 0 # Avoid negative variance values
- Turing.@addlogprob! -Inf
- return nothing
- end
- rt[i] ~ ShiftedLogNormal(μ, σ, τ)
- end
-end
-
-fit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)
-chain_LogNormal = sample(fit_LogNormal, NUTS(), 400)
pred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)
-pred = Array(pred)
-
-fig = plot_distribution(df, "Predictions made by Shifted LogNormal Model")
-for i in 1:length(chain_LogNormal)
- lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
-end
-fig
pred =predict(model_LogNormal([(missing) for i in1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)
+pred =Array(pred)
+
+fig =plot_distribution(df, "Predictions made by Shifted LogNormal Model")
+for i in1:length(chain_LogNormal)
+lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] ==1, "#388E3C", "#D32F2F"), alpha=0.1)
+end
+fig
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
This model provides a much better fit to the data, and confirms that the Accuracy condition is associated with higher RTs and higher variability (i.e., a larger distribution width).
@@ -651,53 +901,78 @@
4.6.1 Conditional Tau \(\tau\) Parameter
In the same way as we modeled the effect of the condition on the variance component sigma\(\sigma\), we can do the same for any other parameters, including the exponential component tau\(\tau\). All wee need is to set a prior on the intercept and the condition effect, and make sure that \(\tau > 0\).
pred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)
-pred = Array(pred)
-
-fig = plot_distribution(df, "Predictions made by Shifted LogNormal Model")
-for i in 1:length(chain_ExGaussian)
- lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
-end
-fig
pred =predict(model_ExGaussian([(missing) for i in1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)
+pred =Array(pred)
+
+fig =plot_distribution(df, "Predictions made by Shifted LogNormal Model")
+for i in1:length(chain_ExGaussian)
+lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] ==1, "#388E3C", "#D32F2F"), alpha=0.1)
+end
+fig
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
The ExGaussian model also provides an excellent fit to the data. Moreover, by modeling more parameters (including tau\(\tau\)), we can draw more nuanced conclusions. In this case, the Accuracy condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).
pred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)
-pred = Array(pred)
-
-fig = plot_distribution(df, "Predictions made by Shifted Wald Model")
-for i in 1:length(chain_Wald)
- lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
-end
-fig
pred =predict(model_Wald([(missing) for i in1:length(df.RT)]; condition=df.Accuracy), chain_Wald)
+pred =Array(pred)
+
+fig =plot_distribution(df, "Predictions made by Shifted Wald Model")
+for i in1:length(chain_Wald)
+lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] ==1, "#388E3C", "#D32F2F"), alpha=0.1)
+end
+fig
+
+
+
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
+└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220
+
+
+
+
+
4.7.2 Model Comparison
At this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best. One can compare the models using the Leave-One-Out Cross-Validation (LOO-CV) method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.
The loo_compare() function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models. As one can see, traditional linear models perform terribly.