parallel random number generation #99

piccolbo opened this Issue May 16, 2012 · 1 comment


None yet
1 participant

piccolbo commented May 16, 2012

One aspect that is important to make parallelization easier in a language like R is parallel random generation. If we do a

sapply(1:100, function(i) rnorm(1))

we have certain guarantees about the distribution of the vector thus created. But if you do a

mapreduce(to.dfs(1:100), function(k,v) keyval(NULL, rnorm(1))

we need to switch to parallel number generation, maybe transparently to the user, maybe as an easy switch. Would unique seeding per task attempt do the trick?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment