New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice :-) Some comments below.
python/benchmarks/plasma.py
Outdated
plasma_store_name, p = self.plasma_store_ctx.__enter__() | ||
self.plasma_client = plasma.connect(plasma_store_name, "", 64) | ||
|
||
self.data_1kb = np.random.randn(1000 // 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use parametrization for the various sizes (you can look at the conversion benchmarks to see how that is done, or see docs). Also I don't think we should test so many sizes. Testing a very small size (1kb) and a large-ish size (10mb) sounds sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/benchmarks/plasma.py
Outdated
from . import common | ||
|
||
|
||
class PlasmaBenchmarks(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high-level point of view, what do we want to benchmark here? The plasma client, or client + store? You may want to choose an explicit timer that reflects that decision (see the timer
attribute here). Also add a docstring :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring, not sure what you mean with the timer attribute; the default seems fine to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that if you want to time the client overhead, you need a timer that measures the elapsed CPU time of the current process is required (but that is asv
's default, AFAICT). If OTOH you want to measure the whole client + server roundtrip, you need a timer that measures the elapsed wall clock time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, thanks for pointing that out. I do want to measure the roundtrip wallclock time, and think timeit which is the default for asv is doing that (according to https://docs.python.org/3.0/library/timeit.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, according to https://asv.readthedocs.io/en/latest/writing_benchmarks.html#timing the default is process CPU time (by way of time.process_time
). Which makes sense in most cases but not here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't realize they were overriding the timeit default, should be fixed now :)
Codecov Report
@@ Coverage Diff @@
## master #2038 +/- ##
==========================================
+ Coverage 87.44% 87.47% +0.02%
==========================================
Files 189 178 -11
Lines 29368 28595 -773
==========================================
- Hits 25682 25014 -668
+ Misses 3686 3581 -105 Continue to review full report at Codecov.
|
@pitrou Is there a way to get all the numbers for each parameter from a command line run? For me it only shows the first one and the rest is omitted with ... (at the very right):
|
@pcmoritz The |
This adds some initial ASV benchmarks for plasma:
It also includes some minor code restructuring to expose the start_plasma_store method.