-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forder should detect and switch to integer type on columns that aren't really double #1738
Comments
This looks OK for Date where we don't have to verify that real is actually integer. For ordinary numeric fields we would need to call |
I asked R-core about this once. The reason they made Date
Agree. How about just emitting a message for now. Here are current timings: > set.seed(1L)
> dt = data.table(d=sample(seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by="days"),
1e7, TRUE))
> typeof(dt$d)
[1] "double"
> dt[, i:=as.integer(d)]
> dt
d i
<Date> <int>
1: 2015-04-07 16532
2: 2015-05-16 16571
3: 2015-07-29 16645
4: 2015-11-28 16767
5: 2015-03-15 16509
---
9999996: 2015-05-04 16559
9999997: 2015-11-19 16758
9999998: 2015-12-31 16800
9999999: 2015-01-24 16459
10000000: 2015-08-08 16655
> system.time(dt[, .N, by=d])
user system elapsed
0.953 0.072 0.377
> system.time(dt[, .N, by=i])
user system elapsed
0.425 0.072 0.172
> system.time(dt[, .N, by=as.integer(d)])
user system elapsed
0.385 0.061 0.183
> isReallyReal(dt$d) # full scan needed here to confirm it is not really real
[1] 0
> system.time(isReallyReal(dt$d))
user system elapsed
0.036 0.000 0.037
> system.time(as.integer(dt$d))
user system elapsed
0.044 0.000 0.043 with 10x the rows (1e8) [ one timing shown here to keep it simple; I confirmed timing was stable locally ] > dt = data.table(d=sample(seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by="days"),
1e8, TRUE))
> dt[, i:=as.integer(d)]
> system.time(dt[, .N, by=d])
user system elapsed
7.929 0.778 2.801
> system.time(dt[, .N, by=i])
user system elapsed
3.415 0.647 1.206
> system.time(dt[, .N, by=as.integer(d)])
user system elapsed
3.745 0.715 1.630
> system.time(isReallyReal(dt$d))
user system elapsed
0.364 0.000 0.364
> system.time(as.integer(dt$d))
user system elapsed
0.348 0.076 0.424 |
The text was updated successfully, but these errors were encountered: