ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038

pcmoritz · 2018-05-14T01:15:27Z

This adds some initial ASV benchmarks for plasma:

Put latency
Get latency
Put throughput for 1KB, 10KB, 100KB, 1MB, 10MB, 100MB

It also includes some minor code restructuring to expose the start_plasma_store method.

pitrou

Looks nice :-) Some comments below.

pitrou · 2018-05-14T11:06:16Z

python/benchmarks/plasma.py

+        plasma_store_name, p = self.plasma_store_ctx.__enter__()
+        self.plasma_client = plasma.connect(plasma_store_name, "", 64)
+
+        self.data_1kb = np.random.randn(1000 // 8)


You should use parametrization for the various sizes (you can look at the conversion benchmarks to see how that is done, or see docs). Also I don't think we should test so many sizes. Testing a very small size (1kb) and a large-ish size (10mb) sounds sufficient.

pitrou · 2018-05-14T11:08:28Z

python/benchmarks/plasma.py

+from . import common
+
+
+class PlasmaBenchmarks(object):


From a high-level point of view, what do we want to benchmark here? The plasma client, or client + store? You may want to choose an explicit timer that reflects that decision (see the timer attribute here). Also add a docstring :-)

I added a docstring, not sure what you mean with the timer attribute; the default seems fine to me :)

I mean that if you want to time the client overhead, you need a timer that measures the elapsed CPU time of the current process is required (but that is asv's default, AFAICT). If OTOH you want to measure the whole client + server roundtrip, you need a timer that measures the elapsed wall clock time.

Oh I see, thanks for pointing that out. I do want to measure the roundtrip wallclock time, and think timeit which is the default for asv is doing that (according to https://docs.python.org/3.0/library/timeit.html).

Actually, according to https://asv.readthedocs.io/en/latest/writing_benchmarks.html#timing the default is process CPU time (by way of time.process_time). Which makes sense in most cases but not here :-)

Thanks, I didn't realize they were overriding the timeit default, should be fixed now :)

codecov-io · 2018-05-14T23:33:06Z

Codecov Report

Merging #2038 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2038      +/-   ##
==========================================
+ Coverage   87.44%   87.47%   +0.02%     
==========================================
  Files         189      178      -11     
  Lines       29368    28595     -773     
==========================================
- Hits        25682    25014     -668     
+ Misses       3686     3581     -105

Impacted Files	Coverage Δ
cpp/src/arrow/util/thread-pool-test.cc	`98.87% <0%> (-0.57%)`	⬇️
rust/src/bitmap.rs
rust/src/builder.rs
rust/src/memory.rs
rust/src/error.rs
rust/src/buffer.rs
rust/src/memory_pool.rs
rust/src/datatypes.rs
rust/src/list.rs
rust/src/array.rs
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b8511f...34a0684. Read the comment docs.

pcmoritz · 2018-05-15T00:50:08Z

@pitrou Is there a way to get all the numbers for each parameter from a command line run? For me it only shows the first one and the rest is omitted with ... (at the very right):

ubuntu@ip-172-31-49-70:~/arrow/python$ asv run --python=same
· Discovering benchmarks
· Running 14 total benchmarks (1 commits * 1 environments * 14 benchmarks)
[  0.00%] ·· Building for existing-py_home_ubuntu_anaconda3_bin_python
[  0.00%] ·· Benchmarking existing-py_home_ubuntu_anaconda3_bin_python
[  7.14%] ··· Running array_ops.ScalarAccess.time_as_py                                                                                                                        9.41ms
[ 14.29%] ··· Running array_ops.ScalarAccess.time_getitem                                                                                                                     41.09ms
[ 21.43%] ··· Running convert_builtins.ConvertArrayToPyList.time_convert                                                                                                  41.60ms;...
[ 28.57%] ··· Running convert_builtins.ConvertPyListToArray.time_convert                                                                                                  15.41ms;...
[ 35.71%] ··· Running convert_builtins.InferPyListToArray.time_infer                                                                                                      19.75ms;...
[ 42.86%] ··· Running convert_pandas.PandasConversionsFromArrow.time_to_series                                                                                             1.28ms;...
[ 50.00%] ··· Running convert_pandas.PandasConversionsToArrow.time_from_series                                                                                           594.60μs;...
[ 57.14%] ··· Running convert_pandas.ZeroCopyPandasRead.time_deserialize_from_buffer                                                                                         609.45μs
[ 64.29%] ··· Running convert_pandas.ZeroCopyPandasRead.time_deserialize_from_components                                                                                     597.48μs
[ 71.43%] ··· Running microbenchmarks.PandasObjectIsNull.time_PandasObjectIsNull                                                                                           2.76ms;...
[ 78.57%] ··· Running plasma.SimplePlasmaLatency.time_plasma_put                                                                                                             500.94ms
[ 85.71%] ··· Running plasma.SimplePlasmaLatency.time_plasma_putget                                                                                                          691.88ms
[ 92.86%] ··· Running plasma.SimplePlasmaThroughput.time_plasma_put_data                                                                                                 562.55μs;...
[100.00%] ··· Running streaming.StreamReader.time_read_to_dataframe                                                                                                      232.14ms;...

pitrou · 2018-05-15T08:54:21Z

@pcmoritz The -e flag should do it, IIRC.

pcmoritz force-pushed the plasma-asv branch from fc07a7d to 527ba3e Compare May 14, 2018 01:40

pcmoritz added 2 commits May 14, 2018 01:04

Add asv benchmarks for plasma

7d4d685

fix linting errors

1261177

pcmoritz force-pushed the plasma-asv branch from 527ba3e to 1261177 Compare May 14, 2018 08:04

pcmoritz added 2 commits May 14, 2018 01:44

fix

47671b3

build plasma in asv

eca1767

pitrou reviewed May 14, 2018

View reviewed changes

pcmoritz added 3 commits May 14, 2018 11:09

fix windows build

3567ddc

parametrize tests

c89256f

measure wallclock time instead of process cpu time

34a0684

pcmoritz closed this in 75acaba May 14, 2018

pcmoritz deleted the plasma-asv branch May 14, 2018 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038

ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038

pcmoritz commented May 14, 2018

pitrou left a comment

pitrou May 14, 2018

pcmoritz May 14, 2018

pitrou May 14, 2018

pcmoritz May 14, 2018

pitrou May 14, 2018

pcmoritz May 14, 2018

pitrou May 14, 2018

pcmoritz May 14, 2018

codecov-io commented May 14, 2018

pcmoritz commented May 15, 2018

pitrou commented May 15, 2018

ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038

ARROW-2577: [Plasma] Add asv benchmarks for plasma #2038

Conversation

pcmoritz commented May 14, 2018

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented May 14, 2018

Codecov Report

pcmoritz commented May 15, 2018

pitrou commented May 15, 2018