New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15332: [C++] Add new cases and fix issues in IPC read/write benchmark #12150
ARROW-15332: [C++] Add new cases and fix issues in IPC read/write benchmark #12150
Conversation
|
Sample Results:
|
From these benchmarks I was a little surprised about how much impact reading a small number of columns has on overall performance. Although for all benchmarks except UncachedFile more columns means the metadata / data ratio is bigger as well. This is the motivation for ARROW-14577:
This is concerning since the two benchmarks should be doing the exact same task (a partial read with 1 column should be the same as a full read with 1 column):
|
Filed ARROW-15333 for the concerning benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
// This will not be correct if your system mounts /tmp to RAM (using tmpfs | ||
// or ramfs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are our benchmark machines set up? (Trying to find out now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the benchmark machine for C++ mounts /tmp to disk, so we should be all set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe this warning is superfluous. For some reason I always thought /tmp was mounted to tmpfs but now, reading up on it, it seems that is quite rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the opposite impression, so that's good to know! But I think we can keep it to be safe. I just wanted to double-check what Conbench was doing.
@ursabot please benchmark lang=C++ |
Benchmark runs are scheduled for baseline = c52f9cc and contender = 9d456d8. Results will be available as each benchmark for each run completes. |
return IOErrorFromErrno(errno, "open on ", path, | ||
" to clear from cache did not succeed."); | ||
} | ||
int err = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it also seems this isn't supported on MacOS: https://github.com/apache/arrow/runs/4811739968?check_suite_focus=true#step:7:779
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I looked at file.cc and realized we are using:
#if defined(POSIX_FADV_WILLNEED)
which seems much better. I can't use that #if in the benchmark itself though (unless I want to import fcntl.h which wouldn't be the worst thing) so I changed it to SkipWithError. This has the added advantage of making it clear to the user that we aren't running all the benchmarks.
… to 'Only on Linux' as it turns out Mac also does not support the call
…e posix_fadvise call. Changed the benchmark behavior to skip with error instead of using an ifdef
9d456d8
to
2927e92
Compare
Benchmark runs are scheduled for baseline = 093fdad and contender = f585a47. f585a47 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
NOTE: This PR will invalidate some previous results from arrow-ipc-read-write-benchmark, disrupting conbench & other monitoring efforts. This is because those previous results were wrong.
It also likely invalidates even more arrow-ipc-read-write-benchmark results because we added a new parameter and renamed some of the benchmarks.