Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Performance of 'by' Operations when verbose=TRUE #6296

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joshhwuu
Copy link
Member

@joshhwuu joshhwuu commented Jul 17, 2024

Closes #6286

This PR changes calls to C's clock() to internal wallclock() in dogroups.c. Apparently calling clock() repeatedly on non-Windows platforms used to cause a significant overhead, leading to significantly longer processing times. I found this SO thread that discusses some of the differences between C's clock() on different platforms.

  • Add a performance test

Copy link

Comparison Plot

Generated via commit fc0c1e7

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 12 minutes and 1 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 51 seconds

@joshhwuu
Copy link
Member Author

@tdhock @Anirban166 I'm trying to run the suite of atime tests locally so I can make sure everything is working before I add a new test, however I'm having trouble with running it based on instructions in https://github.com/Rdatatable/data.table/wiki/Performance-testing

I tried to use atime::atime_pkg(getwd(),".ci"), which gives me a host of C level errors associated with the first commit listed in the test list:

ERROR: compilation failed for package ‘data.table.b1b1832b0d2d4032b46477d9fe6efb15006664f4’
* removing ‘/home/joshhwuu/R/x86_64-pc-linux-gnu-library/4.4/data.table.b1b1832b0d2d4032b46477d9fe6efb15006664f4’
Error in atime_versions_install(Package, normalizePath(pkg.path), new.Package.vec,  : 
  '/usr/lib/R/bin/R' CMD INSTALL -l '/home/joshhwuu/R/x86_64-pc-linux-gnu-library/4.4' /tmp/RtmpyGd6uW/file1d8c26f36015d/data.table.b1b1832b0d2d4032b46477d9fe6efb15006664f4 returned error status code 1

Not sure why this is happening, can anyone chime in?

Anirban166 added a commit to Anirban166/data.table that referenced this pull request Jul 18, 2024
@Anirban166
Copy link
Member

I created a performance test for this - Result

Could probably do with a better/succinct name for the title (open to suggestions)

I'm not using a 'Fast' label (as displayed in the latest plot) since it would only exist after this PR is merged onto master here and is available to be installed using the commit SHA associated with it (after this is merged, I can create a follow-up PR for the test). Doesn't seem to be the case where this was previously fast so 'Before' and related labels won't fit. I initially tested 1.15.4 vs 1.15.99 for checking on this - no change (I assume you switched operating systems in between the process of testing other versions yesterday?) so likely not a regression.

@tdhock
Copy link
Member

tdhock commented Jul 18, 2024

Not sure why this is happening, can anyone chime in?

please file an issue with additional details (OS, full input/output, traceback) https://github.com/tdhock/atime/issues

@joshhwuu
Copy link
Member Author

no change (I assume you switched operating systems in between the process of testing other versions yesterday?) so likely not a regression.

Yup, I confirmed this here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'by' operations much slower when verbose=TRUE
3 participants