Parallel wrapper function for mice #104

gerkovink · 2018-06-14T12:14:31Z

One addition:

parlmice(): runs mice in parallel

One fix:

ibind(): Empty list was created based on numeric input and extended based on character input. Resulted in twice the length and half empty. Hence, the resulting mids object could not be parsed to mice::complete()

Version bumped to 3.0.11

…ed on names

stefvanbuuren · 2018-06-15T09:16:13Z

parlmice seems a useful addition. Few remarks/questions:

If I do parlmice(nhanes) I get m = 6, where I would expect m = 5. Could you add a test on m?
I think we should export only the name parlmice. The parlMICEfunction does not seem to offer anything extra, so I'd prefer to stick to just one name in lowercase.
Will the parallel imputes be identical to non-parallel imputes?
What would be needed to integrate parallel functionality fully into mice(.., parallel = TRUE), or mice(..., parallel = list(...)), so that we stick to one function for imputation? Would that introduce new complications?

gerkovink · 2018-06-15T10:20:37Z

Rationale for 6 instead of 5: The default is n.cores - 1 with 2 imputations per core. You have 2 cores with hyperthreading - that is 4 logical cores - and hence you get 6 imputations. Generating those 6 imputations is equally fast to generating 5 imputations, but then you would leave 1 core half unused during a parallel stream.
- We can do 5 per thread by default (as in mice()) if you'd prefer; Then the following behaviour is expected when logical cores (threads) are used:

CPU cores	logical cores	m (2 per core)	m (5 per core)
2	4	6	15
4	8	14	35
6	12	22	55
8	16	30	75

As m is system dependent. I am not sure if adding a test on m would be useful.

I have exported parlMICE because the current implementation is unfortunately called parlMICE. This may lead to users not being able to run the code they have already created. We can deprecate parlMICE with a message, but this seemed more elegant to me.
No, setting 1 core and 5 imputations does not yield the same result as mice:

A <- parlmice(nhanes, n.core = 1, n.imp.core = 5, seed = 123)
B <- mice(nhanes, m = 5, print = FALSE, seed = 123)
all.equal(complete(A), complete(B))
[1] "Component “bmi”: Mean relative difference: 0.08665105"
[2] "Component “hyp”: Mean relative difference: 0.7142857" 
[3] "Component “chl”: Mean relative difference: 0.1851648"

However, I am not sure if getting the same output is something we would desire to achieve. As opposed to mice, with parlmice the seed does not pertain to the sampler, but rather governs the randomness on the level of the parallel streams. Getting the stream random, but the sampler equal to a single run in mice might violate randomness. Naturally, the process is exactly reproducible for every parlmice() instance:

C <- parlmice(nhanes, seed = 123)
D <- parlmice(nhanes, seed = 123)
all.equal(complete(C, "long"), complete(D, "long"))
[1] TRUE

@RianneSchouten Can you give your 2 cents on whether it is possible/desirable to have parlmice and mice return equal imputations when n.core =1 and n.imp.core = m?

This is ultimately the goal, but requires a parallel version of sampler, or a wrapper of some sorts, but on a deeper level where there is less overhead. I am working on this. However, I believe that the current implementation is useful for the time being. At this point it seems redundant to keep maintaining parlmice outside of mice and joining them may give parlmice and the development of parallel computation in missing data more exposure. When the mice(..., parallel = list(...)) arrives, we can deprecate parlmice. Let me know what you think.

stefvanbuuren · 2018-06-15T11:20:53Z

Thanks Gerko.

I would not tamper with m, and just generate the number that is requested. By default that would be m = 5, even if that means that some cores is not used. Perhaps you can add a fullcore (volkoren) switch that changes m to use all hardware, but that is FALSE by default.
As parlmice is a new feature in mice it will not be breaking any existing mice code. I suppose the old parlMICE will still be online for some time, and existing users will be able to replicate their work in that way. To me, this seems like a good time straighten the names.
It's just fine to document that switching between non-parellel and parallel calculations does not reproduce imputations exactly, to set expectations.
Sounds fine to go ahead with parlmice for now, and keep working on integration.

gerkovink · 2018-06-15T12:25:08Z

I'll make the commits.

RianneSchouten · 2018-06-15T12:47:40Z

For 2, a solution like this seems nice. We add an argument mice.seed = NA.
We call mice.seed when we call for mice().
It works when the parallelization seed is not NA (don't know why only in those cases).

A <- parlMICE(nhanes, n.core = 1, n.imp.core = 5, seed = 1, mice.seed = 123) B <- mice(nhanes, m = 5, print = FALSE, seed = 123) all.equal(complete(A), complete(B)) [1] TRUE

I have made a pull request to Gerko's github.

gerkovink · 2018-06-15T12:50:45Z

Thanks. That's nifty. Perhaps its nicer to make seed consistent and use parl.seed to govern the seed for the parallel streams.

I'll update the code.

RianneSchouten · 2018-06-15T13:02:53Z

Regarding 1)
Gerko is correct that the number of 6 is due to the default settings of detectCores() - 1 and n.imp.core = 2 = 3 * 2 = 6.
What we can do is setting these defaults:
n.core = 2, n.imp.core = 2.5
That will give m = 5, most likely because the system itself gives 3 to one core and 2 to the other core. Let me know if I should add this in the code.

stefvanbuuren · 2018-06-18T20:36:40Z

mice running now in parallel..

gerkovink · 2018-06-18T20:39:50Z

New function. Highlight of new/improved functionality:

NEW: Now defaults to m=5 on every machine.
NEW: match.cluster function that calculates optimal equal division of n.imp.core over n.core to match m.
NEW: automatically calculates the cluster if some, none, or all arguments pertaining to the cluster are provided. With warnings to avoid accidental mistakes.
NEW: cluster.seed argument for setting the parallel stream seed.
CHANGED: seed now conforms to mice.
CHANGED: arguments parsed to mice via do.call(), thereby conforming to mice 3.0.x functionality.
ADDED: testthat functionality
ADDED argument cl.type to switch between the default PSOCK and FORK.
FIXED: proper column naming for $imp

The new function yields the following results on machine with 8 logical cores [max 7 used by parlmice()]:

`n.core = NULL`	`n.imp.core = NULL`	`m = 5`	resulting m	Why and how?
`NULL`	`NULL`	5	5	`n.core = 5` and `n.imp.core = 1` set by `mice:::match.cluster()`
`NULL`	5	5	35	`n.core` defaults to `detectCores() - 1`, i.e. `n.core = 7`
6	`NULL`	5	30	`n.imp.core` defaults to `m`
6	4	5	24	`m` overruled by `n.core * n.imp.core`
`NULL`	`NULL`	93	93	`n.core = 3` and `n.imp.core = 31` set by `mice:::match.cluster()`

gerkovink · 2018-06-18T21:23:11Z

over 3 times faster:

> system.time(mice(nhanes, m = 500, print = FALSE))
   user  system elapsed 
 13.336   0.040  13.393
 
> system.time(parlmice(nhanes, m = 500, cl.type = "FORK"))
   user  system elapsed 
 15.488   0.892   4.194

over 4 times faster:

> system.time(mice(boys, m = 70, print = FALSE))
   user  system elapsed 
 64.443   0.892  65.543
 
> system.time(parlmice(boys, m = 70, cl.type = "FORK"))
   user  system elapsed 
 28.854   1.078  15.505

RianneSchouten · 2018-06-19T06:00:36Z

nice with the match.cluster! 2018-06-18 23:23 GMT+02:00 Gerko Vink <notifications@github.com>:

…

over 3 times faster: > system.time(mice(nhanes, m = 500, print = FALSE)) user system elapsed 13.336 0.040 13.393 > system.time(parlmice(nhanes, m = 500, cl.type = "FORK")) user system elapsed 15.488 0.892 4.194 over 4 times faster: > system.time(mice(boys, m = 70, print = FALSE)) user system elapsed 64.443 0.892 65.543 > system.time(parlmice(boys, m = 70, cl.type = "FORK")) user system elapsed 28.854 1.078 15.505 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#104 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVQqe6Mhy9And5Mv_zEFdvg6Ip1MSIl-ks5t-BpAgaJpZM4Un2Ki> .

gerkovink added 10 commits June 14, 2018 12:28

wrapper function to run mice in parallel

211176b

bugfix: empty list created based on dimensions but later extended bas…

6b33eaa

…ed on names

defined imports, exluded require

4d18192

typo

0df9e4b

manage imports

17948fe

make naming consistent lowercase

8836970

updated documentation

f6fe7f6

code imports defined

eee9e7e

update documentation

9d48469

update version

f9d47dd

gerkovink mentioned this pull request Jun 14, 2018

ibind returns only the incomplete data #103

Closed

gerkovink added 3 commits June 14, 2018 14:23

date adjust

9c01972

example updated

065628c

correct name order

510e3d3

gerkovink added 5 commits June 18, 2018 22:11

remove parlMICE export

4d2c5e4

parlmice redesigned

cbaa8ce

updated documentation

55e0a95

added testthat for parlmice

0bb8b3d

conflict resolved

d7c163c

stefvanbuuren merged commit a150a3e into amices:master Jun 18, 2018

gerkovink mentioned this pull request Jun 18, 2018

Made it into a package gerkovink/parlMICE#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel wrapper function for mice #104

Parallel wrapper function for mice #104

gerkovink commented Jun 14, 2018

stefvanbuuren commented Jun 15, 2018

gerkovink commented Jun 15, 2018 •

edited

stefvanbuuren commented Jun 15, 2018

gerkovink commented Jun 15, 2018

RianneSchouten commented Jun 15, 2018 •

edited

gerkovink commented Jun 15, 2018

RianneSchouten commented Jun 15, 2018 •

edited

stefvanbuuren commented Jun 18, 2018

gerkovink commented Jun 18, 2018

gerkovink commented Jun 18, 2018

RianneSchouten commented Jun 19, 2018 via email

Parallel wrapper function for mice #104

Parallel wrapper function for mice #104

Conversation

gerkovink commented Jun 14, 2018

stefvanbuuren commented Jun 15, 2018

gerkovink commented Jun 15, 2018 • edited

stefvanbuuren commented Jun 15, 2018

gerkovink commented Jun 15, 2018

RianneSchouten commented Jun 15, 2018 • edited

gerkovink commented Jun 15, 2018

RianneSchouten commented Jun 15, 2018 • edited

stefvanbuuren commented Jun 18, 2018

gerkovink commented Jun 18, 2018

gerkovink commented Jun 18, 2018

RianneSchouten commented Jun 19, 2018 via email

gerkovink commented Jun 15, 2018 •

edited

RianneSchouten commented Jun 15, 2018 •

edited

RianneSchouten commented Jun 15, 2018 •

edited