Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with bootstrapping: Observations within subjects ... are not ordered by time #22

Closed
ewinter64 opened this issue Feb 5, 2020 · 11 comments

Comments

@ewinter64
Copy link

data_test.txt

Using the attached data I can successfully run an msm model:

Q <- rbind(c(0, 0, 0.25, 0, 0.25), c(0, 0, 0.25, 0, 0.25), c(0.125, 0.125, 0, 0.125, 0.125), c(0, 0, 0.25, 0, 0.25), c(0, 0, 0, 0, 0))

model1 <- msm(state ~ time, subject = id, data = data, qmatrix = Q, gen.inits = T)

But when I try to perform any function involving bootstrapping, I get the following error message:

Error in msm.check.times(mf$"(time)", mf$"(subject)", mf$"(state)") :
Observations within subjects 49, 111, 240 and others are not ordered by time

I've tried

msm.check.times(data$time, data$id, data$state)

using the function found on GitHub and there doesn't appear to be a problem with the data. I've also ordered the dataframe according to:

data <- data[order(data$id, data$time),]

This has really got me stuck. Any help greatly appreciated.

@chjackson
Copy link
Owner

Strange, are you sure you're using the current version? This works fine for me using either the current CRAN version or the current github version:

Q <- rbind(c(0, 0, 0.25, 0, 0.25), c(0, 0, 0.25, 0, 0.25), c(0.125, 0.125, 0, 0.125, 0.125), c(0, 0, 0.25, 0, 0.25), c(0, 0, 0, 0, 0))
data_test <- read.csv("data_test.txt")
library(msm)
model1 <- msm(state ~ time, subject = id, data = data_test, qmatrix = Q, gen.inits = T)
set.seed(1)
pmatrix.msm(model1, ci="boot", B=3)

Are you using more bootstrap iterations? If it still breaks, can you set the seed and post a reproducible example?

I notice in your fitted model, the 1-5 transition rate has an implausibly wide confidence interval - there's essentially no information about this parameter - so I'd expect problems working with this model.

@ewinter64
Copy link
Author

Thank you for your prompt reply.

I have now discovered that the function will work on some types of data frames and not others. I saved my original data frame 'data' using write.csv and re-read into R as new data frame 'data_test'. Both are of class data.frame. Bootstrapping was possible for the model built from data_test, but not data

Capture

The original data frame has POSIXct and Date variables, which seems to affect whether the blue filter arrow is present in the environment pane and perhaps is also affecting something else behind the scenes.

I also appreciate you noting the implausible CIs for some transition rates. Now the code is working fine I will add more data and hopefully resolve this issue too. Thanks again!

@StefanoMasier
Copy link

Hello,
I am having the same problem with the current version of msm: I am trying to run a model with one covariate.
The model works fine (fine-ish: there is one CI that goes from 0 to Inf that I have to look into), but when I ask for:

pmatrix.msm(msm.mod.cov, t=1200, covariates=list(cluster.grouped="2"), ci="boot", B=100)

I get the same error:

Error in msm.check.times(time, subject, state) :
Observations within subjects 1324, 1452, 1576 and others are not ordered by time

with numers changing at each bootstrap iteration.

The dataframe does not contain any Date or POSIXct objects, only numeric (from 1 to 6, the code for the state), integer (time), character (subject id) and factor (covariate), and it's ordered by subject and time.

What this could come from? Could this be due to the (0,Inf) CI?
Let me know if you need the dataframe to reproduce the error (it's a pretty big one, ~270k lines).

Thank you!

@chjackson
Copy link
Owner

Hi Stefano - any data and code (with seed set) that reproduces the error (with the seed set) would be good if possible. Perhaps a subset of the data would be sufficient?

@StefanoMasier
Copy link

StefanoMasier commented Nov 26, 2022

Hello, and thanks for your quick answer.
I tried to make both dataset and script as agile as possible: you'll find them both in a zip file at the end.

The data are a series of trials on animal behaviour (6 possible behaviours: 1 is the starting state - in the cage at the beginning of the trial, while 6 is the absorbing state: whenever it was observed, the trial was immediately concluded).
Each test lasted ~20 minutes (or less, if the target behaviour was spotted earlier), and the behaviour of the subject was recorded every second, for a total of ~1200 entries per trial.

The 4 variables are behaviorN (numeric; code of the observed behaviour at that time), second (numeric), id_trial (character, unique for each trial) and cluster.grouped (factor with 3 levels; the origin site of each individual, that I want to use as covariate in the analysis).

Attached you find the dataset and a script, including my sessionInfo for reference; the model converges after 58 iterations, there are two CIs that go (0-Inf) which I have to look into, and lastly you have the pmatrix() call that produces the error.

Thank you for your help!

msm_test.zip

@chjackson
Copy link
Owner

That looks like the wrong data, it is called "df.leptidea.RDS" in the zip file, while the code refers to "df.run.rds" and the variable names are different.

@StefanoMasier
Copy link

Apologies. Now it should be correct

@chjackson
Copy link
Owner

OK I think this was already fixed in the development version (b5681a3). To install this, see the instructions at https://github.com/chjackson/msm , for the moment, but I'll also put a CRAN release together shortly because it has been a while.

@StefanoMasier
Copy link

By using the development version it seems to work.
Thank you a lot for your help!

@deepchocolate
Copy link

I had this problem too, but it dissappeared after I sorted the data on subject and time.
(MSM version 1.7)

@chjackson
Copy link
Owner

Bugfix now included in CRAN version 1.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants