-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add progress bar to futuremice()
#516
Comments
Thanks for the suggestion. This would unfortunately not work in #devtools::install_github("gerkovink/mice@progressbar")
library(mice, warn.conflicts = FALSE)
mice:::match.cluster(n.core = 7, m = 357)
#> cores imps
#> 357 7 51
imp <- progressr::with_progress(futuremice(nhanes, m = 357, n.core = 7))
imp$m
#> [1] 357 Created on 2022-11-10 with reprex v2.0.2 This Including This has no urgency, as |
Closing as adding a progressbar to |
I understand if a progress bar isn't a high priority for development. However, such a feature would be very helpful to users with large data sets or complicated models. A key reason one would use parallel processing is because using |
It's not that I don't want to, it is that it's not informative in the implementation of To demonstrate why I would gladly forego the informative progress bar implementation in favour of the speedy implementation: library(future)
library(futuremice)
library(mice, warn.conflicts = FALSE)
# how many cores
future::availableCores()
#> system
#> 10
st <- Sys.time()
set.seed(123)
future::plan("multisession", workers = pmin(2L, future::availableCores()))
A <- future_mice(nhanes, m = 150, seed = 123, maxit = 50)
#> Converged in 33 iterations
#> R-hat: 1.025/1.016/1.027/1.02
B <- future_mice(nhanes, m = 150, seed = 123, maxit = 50)
#> Converged in 33 iterations
#> R-hat: 1.025/1.016/1.027/1.02
identical(A$imp, B$imp)
#> [1] TRUE
identical(complete(A, 5), complete(B, 5))
#> [1] TRUE
!identical(complete(A, 1), complete(A, 2))
#> [1] FALSE
future::plan("sequential")
Sys.time() - st
#> Time difference of 10.98159 mins
st <- Sys.time()
set.seed(123)
A <- futuremice(nhanes, m = 150, n.core = 10, maxit = 50, parallelseed = 123)
B <- futuremice(nhanes, m = 150, n.core = 10, maxit = 50, parallelseed = 123)
identical(A$imp, B$imp)
#> [1] TRUE
identical(complete(A, 5), complete(B, 5))
#> [1] TRUE
!identical(complete(A, 1), complete(A, 2))
#> [1] TRUE
Sys.time() - st
#> Time difference of 11.47086 secs Created on 2022-11-12 with reprex v2.0.2
|
The problem here is that In my opinion, the ideal solution would be to have an option to parallelize the sampler in I agree that it would be very useful to have a progress bar, but the current implementation, which was chosen for its efficiency, does not allow for a straightforward implementation of a progress bar, as @gerkovink detailed above. If efficiency is less important than knowing at what stage of the imputations you are, you can use the following code to implement a progress bar yourself. In the meantime, I will try to think of a way to implement a progress bar in library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(furrr)
#> Warning: package 'furrr' was built under R version 4.2.2
#> Loading required package: future
library(purrr)
set.seed(123)
future::availableCores()
#> system
#> 8
m <- 80
plan(multisession)
progressr::with_progress({
p <- progressr::progressor(along = 1:m)
imps <- future_map(1:m, function(x) {
p(sprintf("x=%g", x))
imp <- mice(boys,
m=1,
maxit=50,
printFlag = F)
imp
}, .options = furrr_options(seed = TRUE))
})
obj <- imps[[1]]
for(i in 2:length(imps)) {
obj <- ibind(obj, imps[[i]])
}
obj$imp <- map(obj$imp, function(x) {
colnames(x) <- 1:ncol(x)
x
})
complete(obj, 1:2, mild = TRUE) |>
map(head)
#> $`1`
#> age hgt wgt bmi hc gen phb tv reg
#> 3 0.035 50.1 3.650 14.54 33.7 G2 P2 2 south
#> 4 0.038 53.5 3.370 11.77 35.0 G3 P4 1 south
#> 18 0.057 50.0 3.140 12.56 35.2 G3 P4 1 south
#> 23 0.060 54.5 4.270 14.37 36.7 G1 P1 3 south
#> 28 0.062 57.5 5.030 15.21 37.3 G1 P1 3 south
#> 36 0.068 55.5 4.655 15.11 37.0 G1 P1 1 south
#>
#> $`2`
#> age hgt wgt bmi hc gen phb tv reg
#> 3 0.035 50.1 3.650 14.54 33.7 G1 P1 3 south
#> 4 0.038 53.5 3.370 11.77 35.0 G3 P3 2 south
#> 18 0.057 50.0 3.140 12.56 35.2 G4 P3 2 south
#> 23 0.060 54.5 4.270 14.37 36.7 G1 P2 2 south
#> 28 0.062 57.5 5.030 15.21 37.3 G1 P1 1 south
#> 36 0.068 55.5 4.655 15.11 37.0 G1 P1 1 south Created on 2022-11-15 with reprex v2.0.2 |
Thanks @thomvolker. |
Thanks very much for considering this! Would it be possible to re-open the issue? |
It is not an issue with |
@gerkovink We could implement a progress bar by just adding the code in my Otherwise, I don't see any merit in reopening this issue, because the only useful way of doing this is by revising the sampler, which is not really a priority. |
@thomvolker wouldn't that impact speed? |
It would. Especially for large data sets, I suppose. That's why I would generally not use a progress bar. But if users want to sacrifice speed for a progress bar, who am I to stop them? By default, I would set such an argument to |
Call me old-fashioned, but I don't like advocating a suboptimal implementation. @stefvanbuuren what do you think? |
I can see the value of a progress bar. In single-core Implementing interprocess communication inevitably slows down things. We could bypass Adding a progress bar should increase execution time by not more than 10 percent. Are we able to achieve that? |
Imputations using large data can take a long time, and it can be helpful to have a sense of how long it will take to complete the imputation. It would be nice to have a progress bar for imputations when using
futuremice()
for parallel processing. For instance, thefuturemice
package implements a progress bar with the following code:The text was updated successfully, but these errors were encountered: