Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamEx.ofTree: rewrite with custom spliterator #17

Closed
amaembo opened this issue Oct 30, 2015 · 2 comments
Closed

StreamEx.ofTree: rewrite with custom spliterator #17

amaembo opened this issue Oct 30, 2015 · 2 comments
Milestone

Comments

@amaembo
Copy link
Owner

amaembo commented Oct 30, 2015

Currently StreamEx.ofTree implementation is flatMap-based and inherits all the flatMap issues: poor short-circuiting and buffering when external iteration used. It's possible to write custom spliterator which would be free of these disadvantages and probably could split better.

@amaembo
Copy link
Owner Author

amaembo commented Aug 19, 2016

Benchmarks show that new implementation is indeed better

Stream source in all tests is:

StreamEx.ofTree("", s -> s.length() == depth ? null : Stream.of("a", "b", "c").map(s::concat));

count = .count(), has = .has("a") (find the second element in the stream), hasNone = .has("foo") (find non-existing element). Quasi means that .pairMap(String::concat) quasi-intermediate op is added.

Benchmark (depth) 0.6.1, us 0.6.2, us 0.6.1, bytes 0.6.2, bytes
streamExCount 1 0.363 0.261 1064 680
streamExCount 3 4.364 2.997 10224 5984
streamExCount 5 44.677 28.528 94914 54976
streamExHas 1 0.271 0.254 856 792
streamExHas 3 1.604 0.307 3920 1104
streamExHas 5 13.745 0.310 32768 1080
streamExHasNone 1 0.440 0.371 1136 888
streamExHasNone 3 5.306 4.619 10232 8184
streamExHasNone 5 46.949 39.143 96840 75810
streamExQuasiCount 1 0.509 0.410 1208 856
streamExQuasiCount 3 6.104 4.501 11488 7024
streamExQuasiCount 5 55.119 43.443 110783 66262
streamExQuasiHas 1 0.307 0.290 936 944
streamExQuasiHas 3 1.639 0.353 4024 1208
streamExQuasiHas 5 13.941 0.349 32848 1160
streamExQuasiHasNone 1 0.569 0.572 1240 1160
streamExQuasiHasNone 3 6.620 5.925 11704 10448
streamExQuasiHasNone 5 60.728 52.358 110928 89887

Short-circuiting ops indeed become short-circuiting. All times are better except streamExQuasiHasNone (difference 0.569/0.572 is statistically insignificant here). Allocated bytes are always less except streamExQuasiHas (936/944).

Also stack traces for non-short-circuit operations in 0.6.2 contain 2 frames less per tree depth. Short-circuiting operations have constant stack depth.

Current implementation refuses to parallelize at all. 0.6.1 implementation could parallelize the top level of the tree. It would be nice to preserve this level of parallelization in 0.6.2 as well.

amaembo added a commit that referenced this issue Aug 19, 2016
amaembo added a commit that referenced this issue Aug 19, 2016
@amaembo amaembo added this to the 0.6.2 milestone Aug 23, 2016
@amaembo
Copy link
Owner Author

amaembo commented Aug 23, 2016

It was discovered that if mapper function contains an intermediate operation and does not have explicit .parallel() it was not parallelized (only root node was handled in parallel with other nodes). So it was necessary to specify .parallel() twice like StreamEx.ofTree(root, x -> x.stream().map(...).parallel()).parallel(). Now it's fixed: only the resulting stream should be paralleled.

Here's benchmark results for depth 5,7,9:

Parallel:

test (depth) 0.6.1 bytes 0.6.2 bytes 0.6.1 us 0.6.2 us
streamExCount 5 96551 56305 31.2 22.4
streamExCount 7 883352 503408 234.9 133.6
streamExCount 9 7932148 4684978 2113.4 1245.3
streamExHas 5 97970 6832 34.2 7.9
streamExHas 7 902202 40231 270.1 26.4
streamExHas 9 8071357 108467 2547.6 51.7
streamExHasNone 5 98719 77180 34.4 29.5
streamExHasNone 7 902192 691459 270.7 190.2
streamExHasNone 9 8071358 6377684 2690.6 1742.1
streamExQuasiCount 5 110627 70785 59.3 28.8
streamExQuasiCount 7 1035622 669264 415.0 180.4
streamExQuasiCount 9 9479318 6415613 3452.0 1824.7
streamExQuasiHas 5 107859 31855 65.6 21.0
streamExQuasiHas 7 1010192 289295 481.0 174.3
streamExQuasiHas 9 9142122 2764214 4417.1 1488.2
streamExQuasiHasNone 5 112680 91537 68.4 34.1
streamExQuasiHasNone 7 1054365 844144 478.8 231.7
streamExQuasiHasNone 9 9678526 7984309 4404.6 2168.2

Sequential:

test (depth) 0.6.1 bytes 0.6.2 bytes 0.6.1 us 0.6.2 us
streamExCount 5 94914 53056 44.7 28.5
streamExCount 7 882456 475888 433.5 258.2
streamExCount 9 7996216 4447533 3744.5 2272.8
streamExHas 5 32776 1080 14.2 0.3
streamExHas 7 296464 1080 130.5 0.3
streamExHas 9 2769330 1080 1221.2 0.3
streamExHasNone 5 96845 75809 48.9 38.7
streamExHasNone 7 907648 690096 470.4 354.7
streamExHasNone 9 8129346 6376351 4412.0 3209.5
streamExQuasiCount 5 108990 66262 56.2 41.4
streamExQuasiCount 7 1034775 649621 539.1 386.3
streamExQuasiCount 9 9694931 6506464 4822.2 3527.2
streamExQuasiHas 5 32856 1216 14.2 0.3
streamExQuasiHas 7 296528 1224 129.9 0.3
streamExQuasiHas 9 2763426 1208 1231.9 0.3
streamExQuasiHasNone 5 112552 89885 64.4 53.8
streamExQuasiHasNone 7 1059959 842384 628.2 475.6
streamExQuasiHasNone 9 9735493 7982569 5874.8 4280.5

Here new implementation is better for every test, both for time and memory consumption, both for parallel and sequential. Also stream closure issues seem to be solved. Thus the feature is considered to be complete.

@amaembo amaembo closed this as completed Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant