IntSeries / IntMutableList for joins and filters #26

andrus · 2019-04-07T17:51:20Z

Let's create IntMutableList (an appendable collection of primitive "int" values) that can be converted to IntSeries, which is immutable.

While working with collections of primitives in Java is painful, there can be real performance gains. My prototype of the data structures above speeds up joins by ~ 25-30% when used for indexing joined DataFrames.

This task will switch joins and filters to int-based implementation. Sorters and groupers will be switched separately, as this requires our own custom sorter.

The text was updated successfully, but these errors were encountered:

andrus · 2019-04-08T06:51:54Z

Note that an implementation for joins is fairly straightforward. However an implementation for "sort" operation is more quirky, as JDK libs do not support sorting of int[] with a custom Comparator. Will need to write our own sort algorithm.

* switching filtering to IntSeries

andrus · 2019-04-10T10:11:16Z

Latest performance measurements:

Hash joins: 21-23% faster
Nested loop joins: 2-14% slower (why ?!!)
Filter: 37% faster

andrus · 2019-04-10T15:51:59Z

After related #27 implementation, the numbers are improved:

Latest performance measurements:

Hash joins: 33-35% faster
Nested loop joins: 3-13% slower
Filter: 37% faster

andrus added a commit that referenced this issue Apr 10, 2019

IntSeries / IntMutableList - let's try using primitives #26

999e6f8

andrus added a commit that referenced this issue Apr 10, 2019

IntSeries / IntMutableList - let's try using primitives #26

3a681bc

* switching filtering to IntSeries

andrus added a commit that referenced this issue Apr 10, 2019

IntSeries / IntMutableList - let's try using primitives #26

42f418e

* switching filtering to IntSeries

andrus changed the title ~~IntSeries / IntMutableList - let's try using primitives~~ IntSeries / IntMutableList for joins and filters Apr 10, 2019

andrus added this to the 0.6 milestone Apr 10, 2019

andrus closed this as completed Apr 10, 2019

andrus added a commit that referenced this issue Apr 10, 2019

IntSeries / IntMutableList for joins and filters #26

4c4d398

This was referenced Apr 10, 2019

IntSeries / IntMutableList for sort and group #27

Closed

Make IntSeries compatible with Series<Integer> #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntSeries / IntMutableList for joins and filters #26

IntSeries / IntMutableList for joins and filters #26

andrus commented Apr 7, 2019 •

edited

andrus commented Apr 8, 2019

andrus commented Apr 10, 2019

andrus commented Apr 10, 2019

IntSeries / IntMutableList for joins and filters #26

IntSeries / IntMutableList for joins and filters #26

Comments

andrus commented Apr 7, 2019 • edited

andrus commented Apr 8, 2019

andrus commented Apr 10, 2019

andrus commented Apr 10, 2019

andrus commented Apr 7, 2019 •

edited