Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRerun repeated uniqueN test #3438
Comments
|
Up to date timings below. There is still a problem in uniqueN by group. busy machine (20 cores)#[1] 10
# user system elapsed
#172.084 1.640 18.179
#[1] 1
# user system elapsed
# 10.608 0.596 11.206
#[1] 20
#still computingidle machine (32 cores)#[1] 16
# user system elapsed
#258.708 0.876 16.638
#[1] 1
# user system elapsed
# 8.405 0.132 8.539
#[1] 32
# user system elapsed
#1066.565 1.180 35.085 codelibrary(data.table)
N_X = 1e6
n_day = 60
n_clientid = 1e5
n_Platform = 7
X = data.table(
day = sample(1:n_day, N_X, TRUE),
clientid = as.character(sample(1:n_clientid, N_X, TRUE)),
Platform = as.character(sample(1:n_Platform, N_X, TRUE))
)
setDTthreads(NULL) # default
getDTthreads()
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
)
setDTthreads(1)
getDTthreads()
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
)
setDTthreads(0)
getDTthreads()
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
) |
|
Closed by #4484. See benchmark near the top of the top comment here: #4484 (comment) |
Was reported here : #3395 (comment)
I said I'd follow up here: #3435 (comment)
Double check all those results under the new default.