- Overview
- Package installation and update
- Attaching packages
- Steps of the analysis
- Test data
- Priors for test characteristics
- Priors for the model parameters related to status dynamics
- Running the STOC free model in Stan
- Running the STOC free model in JAGS
- Model results
- Inclusion of risk factors
The aim of the STOCfree
package is to predict herd level probabilities
of freedom from infection from longitudinal data collected as part of
surveillance programmes. A full description of the model principles is
available as an
article.
This article has been peer-reviewed as part of a publishing initiative
called Peer Community In Animal
Science. The reviews can be
accessed by clicking this
link. Some
aspects of the model were evaluated on simulated data by Mercat et
al. (2022).
The model has been developed as part of the EFSA funded project STOC free. An overall description of the project can be found in a 2019 article by van Roon et al. published in frontiers in Veterinary Science.
The problem addressed is the following. For some major infectious diseases of cattle, there exist local surveillance programmes aimed at identifying infected herds or animals with the ultimate goal of controlling or eradicating the disease. Although such programmes usually lead to disease control or eradication, they can create some difficulty when trading animals. Herds considered as free from infection within one programme may not be considered free under another programme. This lack of comparability can be problematic. This is due to the differences in the programmes in terms of the frequency of testing, the categories of the animals tested, the tests used… as well as in the way freedom from infection is defined from the data collected. The aim of the STOC free project is to provide a framework allowing the comparison of the outputs generated by different surveillance programmes. This is known as output-based surveillance.
In the STOC free project, infection by the Bovine Viral Diarrhoea Virus (BVDV) in cattle is taken as an example. A description of BVDV surveillance programmes in the countries involved in the project was published by van Roon et al. in the Journal of Dairy Science in 2020 and shows the variety in existing programmes.
The statistical framework described on this page is meant to model surveillance programmes in which all herds in the programme are tested at regular time intervals. The model can be described as a Hidden Markov Model, running at the month level. The variable of interest is a latent status regarding infection that is imperfectly measured with tests and that can be predicted by risk factors. The model returns a herd level posterior distribution for the probability of being status positive given a sequence of test results and risk factors. This probability of being positive to the status (usually infection) is predicted for the last month in the dataset. Data collected before are used as historical data to train the model. Risk factors of new infection (more broadly of becoming status positive) are considered. The model is run in a Bayesian framework with estimation and prediction performed in either Stan or JAGS. The model runs faster and better in Stan.
The next sections describe how to install the package, set up and run the model.
Before installing the STOCfree package, you need to install either Stan or JAGS.
-
The Stan implementation relies on an interface called CmdStan which communicates with R through the CmdStanR R package. The instructions to install both CmdStan and CmdStanR can be found on the following page: https://mc-stan.org/cmdstanr/articles/cmdstanr.html. Running the Stan model also requires installing the posterior R package.
-
JAGS can be installed from the following website: https://sourceforge.net/projects/mcmc-jags/files/. If R cannot find JAGS upon calling
STOCfree_JAGS()
try running the following in the consoleSys.setenv(JAGS_HOME=“PATH/JAGS installation”)
.
The STOCfree
package needs to be installed from Github. This requires
installing the remotes
(or devtools
) package first. You will need R
version 4.0 or later to install the package. You may be asked to install
or update several packages during the installation.
install.packages("remotes")
In order to install (or update) the STOCfree package, run the following line:
remotes::install_github("AurMad/STOCfree")
The STOCfree
package needs to be attached.
library(STOCfree)
The list of available functions and datasets can be accessed by typing
help(package="STOCfree")
We also attach the following packages that will be used later:
library(ggplot2)
Modelling will usually consist in the following steps:
- Set up the test data
- Define the prior distributions for test characteristics
- Define the prior distributions for the model parameters related to status
- Set up the risk factor data
- Define the prior distributions for the association between risk factors and probability of becoming infected (status positive)
- Run the STOC free model
- Analyse the model outputs
We demonstrate the use of the package by using a toy dataset called
herdBTM
which is included in the package. These are not real data and
should not be used to draw any conclusion regarding BVD. This dataset
contains the results of tests performed in 100 herds. Two different
tests are used. The first test is an antibody ELISA performed on bulk
tank milk. When this test is positive, a confirmatory test with a high
specificity is performed one month later. The result of the ELISA test
is an optical density ratio (continuous) which is converted into a
negative or positive test result based on a threshold of 35. The dataset
looks as follows:
head(herdBTM)
## # A tibble: 6 × 6
## Farm DateOfTest ODR Test TestResult LocalSeroPrev
## <chr> <date> <dbl> <chr> <dbl> <dbl>
## 1 FR001 2014-02-04 57.6 BTM_ODR 1 0.12
## 2 FR001 2014-03-01 NA confirm 1 0.12
## 3 FR001 2014-09-10 66.5 BTM_ODR 1 0.1
## 4 FR001 2014-10-01 NA confirm 1 0.1
## 5 FR001 2015-02-01 52.5 BTM_ODR 1 0.08
## 6 FR001 2015-03-01 NA confirm 1 0.08
The columns of the dataset correspond to:
Farm
: farm IDDateOfTest
: date when the test was performedODR
: optical density ratio associated with ELISA testTest
: type of test used, either routine ELISA on bulk tank milk (“BTM_ODR”) or confirmatory test (“confirm”)TestResult
: test result. 0 for negative; 1 for positiveLocalSeroPrev
: local seroprevalence, a risk factor of new infection
All the later analyses rely on the construction of a STOCfree_data
object. As an example, a STOCfree_data
object is constructed from the
herdBTM
dataset. To create this object, we use the STOCfree_data()
function. Type ?STOCfree_data
in the console to see the list of
arguments that can be used.
sfd0 <- STOCfree_data(test_data = herdBTM,
test_herd_col = "Farm",
test_date_col = "DateOfTest",
test_res_col = "TestResult",
test_name_col = "Test",
status_dynamics_scale = "proba")
In this example, the function will gather the test data from the
herdBTM
dataset, which must be a data.frame or a tibble. Herd
identifiers are looked up in the Farm
column of the herdBTM
dataset,
dates of test in the DateOfTest
column…
One argument worth mentioning is the status_dynamics_scale
argument.
By default, infection dynamics parameters, i.e. probabilities of
infection on the first time step, acquiring and eliminating the
infection between time steps, are modelled on the probability scale
using Beta distributions (status_dynamics_scale = "proba"
). Changing
to status_dynamics_scale = "logit"
, these probabilities are modelled
on the logit scale and normal prior distributions are used. In order to
illustrate this point, we create a second STOCfree_data object with
dynamics parameters on the logit scale.
sfd <- STOCfree_data(test_data = herdBTM,
test_herd_col = "Farm",
test_date_col = "DateOfTest",
test_res_col = "TestResult",
test_name_col = "Test",
status_dynamics_scale = "logit")
The STOCfree_data()
function returns an object of class
STOCfree_data
.
class(sfd)
## [1] "herd_dynLogit" "STOCfree_data"
A STOCfree_data
object is in fact a list of data.frames
. Below we
provide a brief explanation on the content of this list.
str(sfd)
## List of 8
## $ var_names : Named chr [1:4] "Farm" "DateOfTest" "TestResult" "Test"
## ..- attr(*, "names")= chr [1:4] "test_herd_col" "test_date_col" "test_res_col" "test_name_col"
## $ herd_id_corresp :'data.frame': 100 obs. of 2 variables:
## ..$ Farm : chr [1:100] "FR001" "FR002" "FR003" "FR004" ...
## ..$ herd_id: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
## $ test_data :'data.frame': 924 obs. of 6 variables:
## ..$ status_id : int [1:924] 1 2 8 9 13 14 20 21 25 26 ...
## ..$ herd_id : int [1:924] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ month_id : int [1:924] 1 2 8 9 13 14 20 21 25 26 ...
## ..$ status_type: num [1:924] 1 2 2 2 2 2 2 2 2 2 ...
## ..$ test_id : int [1:924] 1 2 1 2 1 2 1 2 1 2 ...
## ..$ test_res : num [1:924] 1 1 1 1 1 1 1 0 1 0 ...
## $ herd_test_data :'data.frame': 100 obs. of 3 variables:
## ..$ herd_id: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ ind_i : int [1:100] 1 34 67 100 133 166 199 232 265 298 ...
## ..$ ind_p : int [1:100] 33 66 99 132 165 198 231 264 297 330 ...
## $ test_perf_prior :'data.frame': 2 obs. of 6 variables:
## ..$ test : chr [1:2] "BTM_ODR" "confirm"
## ..$ test_id: int [1:2] 1 2
## ..$ Se_a : logi [1:2] NA NA
## ..$ Se_b : logi [1:2] NA NA
## ..$ Sp_a : logi [1:2] NA NA
## ..$ Sp_b : logi [1:2] NA NA
## $ risk_factors :'data.frame': 1 obs. of 3 variables:
## ..$ risk_factor: chr "Intercept"
## ..$ type : chr "intercept"
## ..$ modality : logi NA
## $ risk_factor_data:'data.frame': 3300 obs. of 4 variables:
## ..$ status_id: int [1:3300] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ herd_id : int [1:3300] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ month_id : int [1:3300] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ intercept: num [1:3300] 1 1 1 1 1 1 1 1 1 1 ...
## $ inf_dyn_priors : Named logi [1:6] NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:6] "logit_pi1_mean" "logit_pi1_sd" "logit_tau1_mean" "logit_tau1_sd" ...
## - attr(*, "level")= chr "herd"
## - attr(*, "status dynamics scale")= chr "logit"
## - attr(*, "number of herds")= int 100
## - attr(*, "number of tests")= int 2
## - attr(*, "month first test")= chr "2014-02"
## - attr(*, "month last test")= chr "2016-10"
## - attr(*, "number of risk factors")= num 0
## - attr(*, "class")= chr [1:2] "herd_dynLogit" "STOCfree_data"
The list components are:
var_names
: correspondence between column names in the original datasets and relevant data used in the modelsherd_id_corresp
: for the analysis herds are numbered from 1 to the number of herds. This table contains the correspondence between these IDs and the original IDstest_data
: data with test resultsherd_test_data
: herd level indices in the sequence of all months for all herds used in the model. For each herd, the model needs the indices associated with the first, second, penultimate and last tests in the sequence.test_perf_prior
: table with parameters for the prior distributions for test characteristics.risk_factors
: list and type of risk factors that will be included in the modelrisk_factor_data
: dataset containing risk factor values for all herds and all months in the analysisinf_dyn_priors
: table with priors for infection dynamics: probability of being status positive on the first month, probability of becoming status positive between consecutive months when applicable, probability of not eliminating the infection between consecutive months
The model accounts for the fact that the sensitivity and specificity of
the different tests used can be below 1. Hypotheses about these
parameters are included through the use of Beta distributions as priors
for the different sensitivities and specificities. We can check the
current values for the parameters associated with these distributions
using the show_tests()
function.
show_tests(sfd)
## test test_id Se_a Se_b Sp_a Sp_b
## 1 BTM_ODR 1 NA NA NA NA
## 2 confirm 2 NA NA NA NA
Column names ending with _a
and _b
contain the alpha and beta values
for Beta distributions. For example Se_a = 20
and Se_b = 2
define
the prior distribution for sensitivity as Se ~ Beta(20, 2).
As can be seen, no value has been defined yet. This can be done as follows:
sfd <- set_priors_tests(sfd,
test = "BTM_ODR",
Se_a = 5000,
Se_b = 260,
Sp_a = 20,
Sp_b = 2)
sfd <- set_priors_tests(sfd,
test = "confirm",
Se_a = 20,
Se_b = 2,
Sp_a = 10000,
Sp_b = 1)
We can check that the priors have changed:
show_tests(sfd)
## test test_id Se_a Se_b Sp_a Sp_b
## 1 BTM_ODR 1 5000 260 20 2
## 2 confirm 2 20 2 10000 1
The prior distributions can be plotted with the plot_priors_tests()
function.
plot_priors_tests(sfd)
In order to help selecting appropriate parameter values for the Beta distributions, we have designed a Shiny app which is available from Github. See https://github.com/AurMad/betadistapp
These parameters are the probability of being status positive on the first time step (pi1), the probability of becoming status positive between consecutive time steps (tau1) and the probability of remaining status positive between consecutive time steps (tau2). When risk factors are included in the model, tau1 is modelled as a function of these risk factors and the parameters for the prior distribution of tau1 will not be taken into account.
For STOCfree_data
objects constructed with the argument
status_dynamics_scale = "proba"
, the prior distributions for status
dynamics are provided as Beta distributions on the probability scale.
The show_priors_status_dyn()
shows the current parameter values for
these Beta distributions and gives the syntax the use to change them.
show_priors_status_dyn(sfd0)
## pi1_a pi1_b tau1_a tau1_b tau2_a tau2_b
## NA NA NA NA NA NA
## Set prior distributions for status dynamics using:
## sfd0 <- set_priors_status_dyn(sfd0, pi1_a = , pi1_b = , tau1_a = , tau1_b = , tau2_a = , tau2_b = )
Parameter values are provided for these prior distributions as follows:
sfd0 <- set_priors_status_dyn(sfd0,
pi1_a = 1,
pi1_b = 1,
tau1_a = 1.5,
tau1_b = 10,
tau2_a = 10,
tau2_b = 1.5)
These distributions can be plotted.
plot_priors_status_dyn(sfd0)
To demonstrate how to set dynamics parameters on the logit scale, we use
the sfd1
object created above. As can be seen, the parameter values
requested consist of means and standard deviations. In order to help
selecting appropriate parameter values for the normal distributions on
the logit scale, we have designed a Shiny app which is available from
Github. See https://github.com/AurMad/logitnormdistapp
show_priors_status_dyn(sfd)
## logit_pi1_mean logit_pi1_sd logit_tau1_mean logit_tau1_sd logit_tau2_mean
## NA NA NA NA NA
## logit_tau2_sd
## NA
## Set prior distributions for status dynamics using:
## sfd <- set_priors_status_dyn(sfd, logit_pi1_mean = , logit_pi1_sd = , logit_tau1_mean = , logit_tau1_sd = , logit_tau2_mean = , logit_tau2_sd = )
Values are provided.
sfd <- set_priors_status_dyn(sfd,
logit_pi1_mean = 0,
logit_pi1_sd = 1.5,
logit_tau1_mean = -3,
logit_tau1_sd = 1,
logit_tau2_mean = 3,
logit_tau2_sd = 1)
These distributions can be plotted.
plot_priors_status_dyn(sfd)
The Stan implementation uses the forward algorithm for the estimation.
The code was adapted from a tutorial on Hidden Markov models in Stan by
Damiano et al. (2017). The
model is run using the STOCfree_Stan()
function. The main argument is
a STOCfree_data
object as created above. By default, the model outputs
are stored in a folder called ‘STOCfree_files’. This output folder can
be changed using through out_path
argument.
sfm_stan <- STOCfree_Stan(sfd,
n_chains = 3,
n_iter = 1000,
n_thin = 1,
out_path = "STOCfree_Stan_1")
## Running MCMC with 3 sequential chains...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
## Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
## Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
## Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 46.6 seconds.
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
## Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
## Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
## Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2 finished in 48.7 seconds.
## Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
## Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
## Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
## Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
## Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3 finished in 46.6 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 47.3 seconds.
## Total execution time: 142.6 seconds.
The STOC free model is run using the STOCfree_JAGS()
function. This
function is a wrapper for the run.jags()
function from the runjags
package. The main argument is a STOCfree_data
object as created above.
By default, the models outputs are stored in a folder called
‘STOCfree_files’. This output folder can be changed using through
out_path
argument.
sfm_jags <- STOCfree_JAGS(sfd,
n_chains = 3,
n_burnin = 1000,
n_iter = 1000,
n_thin = 1,
out_path = "STOCfree_JAGS_1")
## Warning: You attempted to start parallel chains without setting different PRNG
## for each chain, which is not recommended. Different .RNG.name values have been
## added to each set of initial values.
## Calling 3 simulations using the parallel method...
## Following the progress of chain 1 (the program will wait for all chains
## to finish before continuing):
## Welcome to JAGS 4.3.1 on Mon Sep 12 11:16:13 2022
## JAGS is free software and comes with ABSOLUTELY NO WARRANTY
## Loading module: basemod: ok
## Loading module: bugs: ok
## . . Reading data file data.txt
## . Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 824
## Unobserved stochastic nodes: 3406
## Total graph size: 27912
## . Reading parameter file inits1.txt
## . Initializing model
## . Adapting 1000
## -------------------------------------------------| 1000
## ++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
## Adaptation successful
## . Updating 1000
## -------------------------------------------------| 1000
## ************************************************** 100%
## . . . . . . . Updating 1000
## -------------------------------------------------| 1000
## ************************************************** 100%
## . . . . Updating 0
## . Deleting model
## .
## All chains have finished
## Simulation complete. Reading coda files...
## Coda files loaded successfully
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## Warning: `gather_()` was deprecated in tidyr 1.2.0.
## Please use `gather()` instead.
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
## Please use `spread()` instead.
The Stan model returns an object of classes CmdStanMCMC
, CmdStanFit
and R6
. Therefore, all methods and functions working with objects of
these classes will work with outputs created by STOCfree_Stan()
.
The JAGS model returns an object of class runjags
. Therefore, all
methods and functions working with runjags objects will work with
outputs created by STOCfree_JAGS()
.
The STOCfree
package contains functions to explore the model outputs
that work in the same way, regardless of whether the output was
generated with Stan or JAGS.
Two specific types of model outputs are of interest and can be
extracted, either directly from the model output or loaded from the disk
when STOCfree_model()
was run using save_output = TRUE
. These model
outputs are the model parameters (Se, Sp,
By model parameters, we mean test characteristics, parameters related to infection dynamics and association with risk factors. These can be extracted from the model output:
param <- extract_STOCfree_param(sfm_stan)
or loaded from the disk
param <- read_STOCfree_param(out_path = "STOCfree_Stan_1")
These 2 functions create objects of class STOCfree_param
for which
print and plot methods are defined in the package.
print(param)
## MCMC samples from STOC free model parameters
##
## Parameters: Se1, Se2, Sp1, Sp2, tau1, tau2
## Number of chains: 3
## Number of iterations per chain: 1000
## Number of draws: 3000
Traceplots can be generated to check convergence visually.
plot(param, parameter = "Se1", type = "traceplot")
Density plots can be generated.
plot(param, parameter = "Sp1", type = "density")
Summary values for each parameter can be obtained as follows:
summary(param)
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## mean sd median 2.5% 97.5% ess
## Se1 0.95021170 2.944234e-03 0.95028050 0.9443617 0.95558250 4531.024
## Se2 0.88369697 2.002153e-02 0.88420750 0.8427434 0.92128647 4754.958
## Sp1 0.98564383 9.752482e-03 0.98770150 0.9609376 0.99824453 3694.541
## Sp2 0.99990148 9.649939e-05 0.99992900 0.9996370 0.99999700 5730.500
## pi1 0.52999931 5.037150e-02 0.52984800 0.4306047 0.62888923 4787.779
## tau1 0.04605133 6.832599e-03 0.04581995 0.0339436 0.06070055 4186.314
## tau2 0.96378436 6.547002e-03 0.96410600 0.9499575 0.97548610 3980.593
MCMC samples for the herd level predicted probabilities of infection can also be extracted or loaded.
pred <- extract_STOCfree_pred(sfm_stan, sfd)
or
pred <- read_STOCfree_pred("STOCfree_Stan_1")
Calling print on these objects will display the list of herds.
print(pred)
## MCMC samples from STOC free model herd level predicted probabilities of infection
##
## Number of herds: 100
##
## Herds: FR001 FR002 FR003 FR004 FR005 FR006 FR007 FR008 FR009 FR010 FR011 FR012 FR013 FR014 FR015 FR016 FR017 FR018 FR019 FR020 FR021 FR022 FR023 FR024 FR025 FR026 FR027 FR028 FR029 FR030 FR031 FR032 FR033 FR034 FR035 FR036 FR037 FR038 FR039 FR040 FR041 FR042 FR043 FR044 FR045 FR046 FR047 FR048 FR049 FR050 FR051 FR052 FR053 FR054 FR055 FR056 FR057 FR058 FR059 FR060 FR061 FR062 FR063 FR064 FR065 FR066 FR067 FR068 FR069 FR070 FR071 FR072 FR073 FR074 FR075 FR076 FR077 FR078 FR079 FR080 FR081 FR082 FR083 FR084 FR085 FR086 FR087 FR088 FR089 FR090 FR091 FR092 FR093 FR094 FR095 FR096 FR097 FR098 FR099 FR100
Calling plot on these predicted probabilities will plot the density of predicted probabilities for all herds across all iterations.
plot(pred)
It is also possible to plot the predicted probabilities of infection for specific herds:
plot(pred, herd = c("FR001", "FR002"), type = "individual", legend = TRUE)
A summary method is implemented for these predicted probabilities. It is possible to specify for which herds a summary is required.
summary(pred, herd = c("FR001", "FR003"))
## herd mean sd median 2.5% 97.5%
## 1 FR001 0.129 0.019 0.128 0.097 0.172
## 2 FR003 0.062 0.009 0.062 0.045 0.082
This is only available for the JAGS model. The predicted monthly prevalences of infection can be extracted or loaded using:
prev <- extract_STOCfree_month_prev(sfm_jags, sfd)
or
prev <- read_STOCfree_month_prev("STOCfree_JAGS_1")
Calling plot on this object will create boxplots of estimated monthly infection prevalences.
plot(prev)
Risk factors are included with the objective of detecting infection earlier when the interval between tests is long, as is usually the case. They can also improve the performance of the prediction of the probability of being status positive when test performance is not good.
The association between the probability of becoming status positive (tau1) and risk factors is modelled with logistic regression. The uncertainty in status as measured by test result is accounted for in the regression through test sensitivity and specificity.
Bayesian inference in JAGS can take a lot of time. This time increases with the size of the dataset modelled as well as with the number of MCMC iterations. One way to overcome this problem is to force covariates in the model based on what is known about the disease. In this case, but is also possible to include informative prior distributions for the coefficients of the logistic regression between tau1 and risk factors.
The STOCfree package contains a set of functions designed to help in the selection of risk factors from the data. For this purpose, it is considered that all herds with a negative test result are eligible for becoming test positive on the next test. The probability of becoming test positive (versus remaining test negative) is modelled with logistic regression. A further thing that is considered is the fact that risk factors occurring at one point in time may be associated with becoming test positive later.
To show how to use the available functions, we use the intro
dataset
which is included in the package.
intro
## # A tibble: 1,324 × 3
## Farm dateIntr nAnim
## <chr> <fct> <int>
## 1 FR001 2010-01-22 1
## 2 FR001 2015-05-04 14
## 3 FR002 2011-02-11 2
## 4 FR002 2010-12-23 5
## 5 FR003 2012-11-07 3
## 6 FR003 2016-05-25 5
## 7 FR005 2016-03-31 13
## 8 FR005 2016-04-20 17
## 9 FR008 2012-06-04 2
## 10 FR008 2010-04-30 1
## # … with 1,314 more rows
This dataset contains the number of animals purchased as well as the
dates of introduction for the 100 herds in the herdBTM
data. Using
these datasets, the probability of seroconversion (becoming test
positive between 2 tests in a given herd) is modelled as a function of
the number of animals purchased between the month of seroconversion
(month 0) and 24 months earlier. The time of seroconversion is taken as
the midpoint between the 2 test dates considered.
nAnim_lagged <- logit_nwinf_lagged(
sf_data = sfd,
rf_data = intro,
rf_date_col = "dateIntr",
rf_col = "nAnim",
time_of_inf = "mid",
lag1 = 0,
lag2 = 24)
The function returns a table with the AICs of all the models evaluated.
str(nAnim_lagged)
## 'data.frame': 325 obs. of 4 variables:
## $ lag1: int 0 0 1 0 1 2 0 1 2 3 ...
## $ lag2: int 0 1 1 2 2 2 3 3 3 3 ...
## $ l : int 0 1 0 2 1 0 3 2 1 0 ...
## $ AIC : num 366 365 365 365 365 ...
## - attr(*, "out.attrs")=List of 2
## ..$ dim : Named int [1:4] 25 25 1 1
## .. ..- attr(*, "names")= chr [1:4] "lag1" "lag2" "l" "AIC"
## ..$ dimnames:List of 4
## .. ..$ lag1: chr [1:25] "lag1= 0" "lag1= 1" "lag1= 2" "lag1= 3" ...
## .. ..$ lag2: chr [1:25] "lag2= 0" "lag2= 1" "lag2= 2" "lag2= 3" ...
## .. ..$ l : chr "l=NA"
## .. ..$ AIC : chr "AIC=NA"
This can be plotted using the code below. From the Figure below it can be seen that between the worst (AIC > 350) and the best (AIC < 342) model, the difference in AIC is rather limited. There is a drop in AIC for intervals starting from 8 months before seroconversion.
ggplot(data = nAnim_lagged, aes(x = lag2, y = lag1, fill = AIC)) +
geom_tile() +
xlab("Time Lag 2 (months)") +
ylab("Time Lag 1 (months)") +
scale_fill_gradient(low = "red", high = "yellow", aesthetics = "fill") +
ggtitle("Number of animals purchased")
When the function is used to evaluate several candidate variables, all these variables can be included in a multivariate logistic model. For this, we create a dataset with the different variables of interest.
- Model outcome
nwinf <- make_nwinf_data(sfd,
time_of_inf = "mid")
- Covariates
Here, we include the number of animals purchased 8 months before the
test of interest. This is done by calling the add_risk_factor()
.
nwinf <- add_risk_factor(nwinf,
intro,
rf_col = "nAnim",
rf_date_col = "dateIntr",
lag1 = 8,
lag2 = 8)
- Model
The multivariate logistic model is estimated with R glm()
function
called by logit_nwinf()
function from the STOCfree
package:
modl <- logit_nwinf(nwinf,
risk_factors = "nAnim_8_8")
The model results are:
summary(modl)
##
## Call:
## glm(formula = formula, family = binomial(link = "logit"), data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.631 -0.849 -0.849 1.546 1.546
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.83502 0.13206 -6.323 2.56e-10 ***
## nAnim_8_8 0.12384 0.05958 2.078 0.0377 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 361.60 on 288 degrees of freedom
## Residual deviance: 356.94 on 287 degrees of freedom
## AIC: 360.94
##
## Number of Fisher Scoring iterations: 4
The STOCfree_data()
object is updated to include the risk factors
selected for the analysis. This is done with the sf_add_risk_factor()
function.
sfd <- sf_add_risk_factor(
sfd = sfd,
risk_factor_data = intro,
risk_herd_col = "Farm",
risk_date_col = "dateIntr",
risk_factor_col = "nAnim",
risk_factor_type = "continuous",
lag1 = 8,
lag2 = 8,
FUN = sum)
## Joining, by = "Farm"
## Joining, by = "date__1"
## Joining, by = c("herd_id", "lagged_month")
Doing this will result in the priors required for infection related parameters to be different from above. The priors for tau1 are not needed any more. They are replaced by prior distributions for the logistic regression.
show_priors_status_dyn(sfd)
## logit_pi1_mean logit_pi1_sd logit_tau2_mean logit_tau2_sd
## 0.0 1.5 3.0 1.0
The Bayesian model models the probability of becoming status positive
(usually getting infected) as a function of one or several risk factors,
with logistic regression. Priors for the coefficients in the logistic
regression need to be provided. The prior distributions used are normal
distributions on the logit scale. In our case, we need to provide prior
distributions for the intercept as well as for coefficient associated
with the number of animals purchased 8 months before the month of
surveillance. Below is the list of covariates included in the
STOCfree_data
so far.
show_rf(sfd)
## risk_factor type modality
## 1 Intercept intercept NA
## 2 nAnim_8_8 continuous NA
sfd <- set_priors_rf(sfd,
risk_factor = "Intercept",
mean = -3, sd = 1)
sfd <- set_priors_rf(sfd,
risk_factor = "nAnim_8_8",
mean = 0, sd = 2)
This has updated the list of covariates with the normal distribution parameters.
show_rf(sfd)
## risk_factor type modality mean_prior sd_prior
## 1 Intercept intercept NA -3 1
## 2 nAnim_8_8 continuous NA 0 2
These priors can be plotted on the probability scale. The prior distribution for the intercept implies that in herds buying no cows, the probability of becoming positive is most likely below 0.4, with most of the density below 0.2.
plot_priors_rf(sfd)
This is done as explained above by calling the STOCfree_Stan()
or
STOCfree_JAGS()
functions. In this case, estimates for the logistic
regression parameters will appear as theta1 (model intercept) and theta2
(coefficient for the nAnim_8_8 variable).
Below is a complete example.
## Creation of the STOCfree_data object
sfd <- STOCfree_data(test_data = herdBTM,
test_herd_col = "Farm",
test_date_col = "DateOfTest",
test_res_col = "TestResult",
test_name_col = "Test",
status_dynamics_scale = "logit")
## Priors for test characteristics
sfd <- set_priors_tests(sfd,
test = "BTM_ODR",
Se_a = 5000,
Se_b = 260,
Sp_a = 20,
Sp_b = 2)
sfd <- set_priors_tests(sfd,
test = "confirm",
Se_a = 20,
Se_b = 2,
Sp_a = 10000,
Sp_b = 1)
## Prior for infection dynamics
sfd <- set_priors_status_dyn(sfd,
logit_pi1_mean = 0,
logit_pi1_sd = 4,
logit_tau2_mean = 3,
logit_tau2_sd = 1)
## Set prior distributions for status dynamics using:
## sfd <- set_priors_status_dyn(sfd, logit_tau1_mean = , logit_tau1_sd = )
## Adding a risk factor to the TSOCfree_data object
sfd <- sf_add_risk_factor(
sfd = sfd,
risk_factor_data = intro,
risk_herd_col = "Farm",
risk_date_col = "dateIntr",
risk_factor_col = "nAnim",
risk_factor_type = "continuous",
lag1 = 8,
lag2 = 8,
FUN = sum)
## Joining, by = "Farm"
## Joining, by = "date__1"
## Joining, by = c("herd_id", "lagged_month")
## Priors for risk factors association with log-odds of becoming positive
sfd <- set_priors_rf(sfd,
risk_factor = "Intercept",
mean = -3, sd = 1)
sfd <- set_priors_rf(sfd,
risk_factor = "nAnim_8_8",
mean = 0, sd = 2)
Stan version:
set.seed(124)
sfm_stan_rf <- STOCfree_Stan(sfd,
n_chains = 3,
n_warmup = 10,
n_iter = 100,
n_thin = 1,
out_path = "STOCfree_Stan_2")
## Running MCMC with 3 sequential chains...
##
## Chain 1 WARNING: No variance estimation is
## Chain 1 performed for num_warmup < 20
## Chain 1 Iteration: 1 / 110 [ 0%] (Warmup)
## Chain 1 Iteration: 11 / 110 [ 10%] (Sampling)
## Chain 1 Iteration: 110 / 110 [100%] (Sampling)
## Chain 1 finished in 11.9 seconds.
## Chain 2 WARNING: No variance estimation is
## Chain 2 performed for num_warmup < 20
## Chain 2 Iteration: 1 / 110 [ 0%] (Warmup)
## Chain 2 Iteration: 11 / 110 [ 10%] (Sampling)
## Chain 2 Iteration: 110 / 110 [100%] (Sampling)
## Chain 2 finished in 20.1 seconds.
## Chain 3 WARNING: No variance estimation is
## Chain 3 performed for num_warmup < 20
## Chain 3 Iteration: 1 / 110 [ 0%] (Warmup)
## Chain 3 Iteration: 11 / 110 [ 10%] (Sampling)
## Chain 3 Iteration: 110 / 110 [100%] (Sampling)
## Chain 3 finished in 9.2 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 13.7 seconds.
## Total execution time: 41.6 seconds.
## Warning: 9 of 300 (3.0%) transitions ended with a divergence.
## See https://mc-stan.org/misc/warnings for details.
JAGS version
sfm_jags_rf <- STOCfree_JAGS(sfd,
n_chains = 3,
n_burnin = 1000,
n_iter = 1000,
n_thin = 1,
out_path = "STOCfree_JAGS_2")
## Warning: You attempted to start parallel chains without setting different PRNG
## for each chain, which is not recommended. Different .RNG.name values have been
## added to each set of initial values.
## Calling 3 simulations using the parallel method...
## Following the progress of chain 1 (the program will wait for all chains
## to finish before continuing):
## Welcome to JAGS 4.3.1 on Mon Sep 12 11:18:53 2022
## JAGS is free software and comes with ABSOLUTELY NO WARRANTY
## Loading module: basemod: ok
## Loading module: bugs: ok
## . . Reading data file data.txt
## . Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 824
## Unobserved stochastic nodes: 3407
## Total graph size: 37768
## . Reading parameter file inits1.txt
## . Initializing model
## . Adapting 1000
## -------------------------------------------------| 1000
## ++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
## Adaptation successful
## . Updating 1000
## -------------------------------------------------| 1000
## ************************************************** 100%
## . . . . . . . Updating 1000
## -------------------------------------------------| 1000
## ************************************************** 100%
## . . . . Updating 0
## . Deleting model
## .
## All chains have finished
## Simulation complete. Reading coda files...
## Coda files loaded successfully
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
Summary of parameters estimated with the Stan model.
param <- extract_STOCfree_param(sfm_stan_rf)
summary(param)
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## Warning: Dropping 'draws_df' class as required metadata was removed.
## mean sd median 2.5% 97.5% ess
## Se1 0.9499266 3.138467e-03 0.95014050 0.9437878 0.95555128 534.92502
## Se2 0.8822056 2.033061e-02 0.88216400 0.8436933 0.91806948 234.79601
## Sp1 0.9852370 9.566092e-03 0.98677450 0.9643835 0.99876630 35.12823
## Sp2 0.9998894 9.548877e-05 0.99992050 0.9996453 0.99999500 48.62949
## pi1 0.5348903 5.309744e-02 0.53516300 0.4268829 0.63639030 311.47774
## tau2 0.9645711 6.932421e-03 0.96434450 0.9511997 0.97729753 246.65370
## theta1 -3.0481417 1.714629e-01 -3.04740000 -3.3983385 -2.71569500 281.12591
## theta2 -0.5137474 9.223785e-01 -0.09879875 -3.3216173 0.02813575 53.13678