You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Primitive Series (IntSeries, LongSeries, etc.) are much more efficient than their object counterparts (Series<Integer>, Series<Long>). They take up to 5x less memory per cell and provide opportunities to implement faster numeric operations, as they don't require (un)boxing. But they don't allow to store null values.
Implementation
The idea here to create null-aware numeric Series objects with performance closer to primitive Series. A prototype for IntegerSeries implemented as two arrays - int[] for values and boolean[] for null tracking has the following performance characteristics:
vs. ObjectSeries<Integer>
4x less memory used
3x faster to create
"get" is 6 orders of magnitude slower due to boxing of ints. Since "get" is a small % of any real operation, the operations are something like 30% slower.
vs. IntSeries
25% more memory used
same creation speed
same "get" speed for boxed Numbers (we can't use fast "getInt" because nulls may be present)
Conclusions
The new type of Series and accumulators save a lot of memory, are much faster to create, and provide opportunities for creation-time optimization (if no nulls are found, IntSeries is created).
The downside is slower "get" due to boxing, though of course the current IntSeries.get() is just as slow
TODO
Implement numeric Series for Integer, Double, Long
Integrate them to various data adapters (CSV, DB, Avro)
(Integrate in the Expressions API, so that exps could take advantage of the faster primitive access)
The text was updated successfully, but these errors were encountered:
Primitive Series (
IntSeries
,LongSeries
, etc.) are much more efficient than their object counterparts (Series<Integer>
,Series<Long>
). They take up to 5x less memory per cell and provide opportunities to implement faster numeric operations, as they don't require (un)boxing. But they don't allow to store null values.Implementation
The idea here to create null-aware numeric Series objects with performance closer to primitive Series. A prototype for
IntegerSeries
implemented as two arrays -int[]
for values andboolean[]
for null tracking has the following performance characteristics:ObjectSeries<Integer>
IntSeries
Conclusions
The new type of Series and accumulators save a lot of memory, are much faster to create, and provide opportunities for creation-time optimization (if no nulls are found, IntSeries is created).
The downside is slower "get" due to boxing, though of course the current
IntSeries.get()
is just as slowTODO
The text was updated successfully, but these errors were encountered: