-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on the performance of anydate()
#132
Comments
(Aside: That's gowdawfully formatted code. But that's just me and 25+ years of ESS use.) I have the feeling that has come up before. Did you check old issues? Could you also please measure the overhead of computing unique values at those size for vectors that are in fact unique without replicates? |
Maybe add a third column using this value:
|
Only saw #109 and Results with |
@schochastics Please see above -- @etiennebacher did some digging and touches upon an issue that may matter for your benchmarks too. I have the default for unique on 'off' because where I came from (in my former field of high-ish frequency finance) our timestamps tend to indeed be unique (and by now the field is of course more occoupied with nanoseconds resolution so POSIXct is of limited usefulness, that was different when I wrote @etiennebacher We could think about some |
Could be, but I'm not an active user of |
Upfront I need to clarify that my benchmark is not that fair yet, because I just return a character vector so far, not a POSIXct formatted object. calcUnique doesnt make a difference in my current benchmark since I used a vector of unique dates. I try to remember to report back here once i did some more rigorous testing with chronos and a better interface |
No need to report back here then if you also use unique values. |
Hello Dirk, I saw the announcement of
chronos
on Mastodon, and when I tried to reproduce the benchmarks I was a bit surprised by the performance ofanytime
(without comparing it tochronos
). It seems to me that applyinganydate()
/anytime()
on the unique values and then match them on the original vector could lead to time gains (at the expense of memory usage) in some contexts, mostly for dates as it is more likely to have duplicated values than for datetime.Here's a small benchmark to compare the current
anydate()
with the alternative of applying on unique values only. I ran this with 10k-70k values (with steps of 10k):Of course I'm aware of the difficulties of producing benchmarks that accurately reproduce real-world situations and I'm not arguing that my alternative is better. I'm sure I also let aside many important details, such as time zones, missing values, etc. I was just curious and since I found those results I'm wondering if this is something that you already considered.
The text was updated successfully, but these errors were encountered: