Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much faster deserialization via new ImmutableRoaringBitmap(ByteBuffer.wrap(data)).toRoaringBitmap() #319

Closed
PedroAlvarado opened this Issue Mar 15, 2019 · 29 comments

Comments

Projects
None yet
5 participants
@PedroAlvarado
Copy link

PedroAlvarado commented Mar 15, 2019

The two seemingly equivalent operations below yield deserialization performance results. "Example 1" below seems to perform 5x to 10x faster than the typical bitmap deserialization as shown in "Example 2".

//Example 1
//byte[] -> ImmutableRoaringBitmap -> RoaringBitmap
byte[] = data
RoaringBitmap bitmap = new ImmutableRoaringBitmap(ByteBuffer.wrap(data)).toRoaringBitmap();

vs

//Example 2
//byte[] -> RoaringBitmap
byte[] = data
RoaringBitmap bitmap = new RoaringBitmap();
ByteArrayInputStream bais = new ByteArrayInputStream(data);

try (DataInputStream dis = new DataInputStream(bais)) {
            bitmap.deserialize(dis);
 }
...
@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

Is the second example valid code?

@PedroAlvarado

This comment has been minimized.

Copy link
Author

PedroAlvarado commented Mar 15, 2019

Apologies, I pasted the wrong code.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

I am intrigued. The new ImmutableRoaringBitmap(ByteBuffer.wrap(data)).toRoaringBitmap() approach appears to be much faster.

Here is my gist...
https://gist.github.com/lemire/9e83b22d19b19a687ff18573311538da

Here are my results:

SerializeToByteArrayExample
recommended:
142 ms 204 ms 213 ms 119 ms 174 ms 125 ms 127 ms 119 ms 129 ms 119 ms 190 ms 185 ms 183 ms 184 ms 151 ms 124 ms 132 ms 120 ms 126 ms 128 ms 134 ms 127 ms 130 ms 122 ms 121 ms 128 ms 123 ms 121 ms 133 ms 121 ms
average: 141
via ByteArrayInputStream:
233 ms 144 ms 141 ms 149 ms 156 ms 144 ms 148 ms 153 ms 161 ms 171 ms 171 ms 156 ms 159 ms 166 ms 161 ms 155 ms 170 ms 171 ms 160 ms 157 ms 162 ms 157 ms 171 ms 158 ms 158 ms 156 ms 157 ms 159 ms 170 ms 157 ms
average: 161
via Immutable:
54 ms 29 ms 28 ms 28 ms 35 ms 30 ms 27 ms 26 ms 27 ms 29 ms 37 ms 30 ms 26 ms 27 ms 27 ms 26 ms 32 ms 31 ms 28 ms 25 ms 28 ms 26 ms 37 ms 31 ms 27 ms 26 ms 26 ms 25 ms 39 ms 25 ms
average: 29

@lemire lemire changed the title RoaringBitmap Deserialization Tradeoff Much faster deserialization via new ImmutableRoaringBitmap(ByteBuffer.wrap(data)).toRoaringBitmap() Mar 15, 2019

@PedroAlvarado

This comment has been minimized.

Copy link
Author

PedroAlvarado commented Mar 15, 2019

Yes, I'm noticing the exact same results. A quick inspection shows that bitmap.deserialize spends most of its compute time at the the line below.

int nbrruns = Util.toIntUnsigned(Short.reverseBytes(in.readShort()));

I'm unable to find the analogous code in ImmutableRoaringBitmap.toRoaringBitmap().

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

These results need investigation but they point at a serious performance 'bug' as to how I have been recommending that people deserialize data from a byte array. It is not entirely surprising given that DataInputStream is not super fast as an interface... but the gap is huge here (between 5x and 10x difference).

cc @richardstartin @blacelle @ppiotrow @incaseoftrouble @owenkaser

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

@PedroAlvarado

Well. Util.toIntUnsigned(Short.reverseBytes(...)) should be something like two machine instructions. A much more likely cause of performance problems is Java's failure to inline the calls to readShort... but that's an API issue... I am not sure we can fix it per se.

This does not mean that this issue cannot be fixed. I think we should seriously consider a fast deserialization routine...

@incaseoftrouble

This comment has been minimized.

Copy link
Contributor

incaseoftrouble commented Mar 15, 2019

Side comment, not sure if relevant: DataInput offers a method to write to a byte[] buffer. As far as I understand the docs, the read bytes already are machine independent, in particular (short)((b[2 * i] << 8) | (b[2 * i + 1] & 0xff)) (where b is the buffer we wrote data into) should yield the shorts on the stream.

This should be rather easy to implement I guess.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

@incaseoftrouble

Yes. This may be relevant in the sense that a single call to read that retrieves a large block of bytes could be a lot better than many small calls to something like readShort.

@PedroAlvarado

This comment has been minimized.

Copy link
Author

PedroAlvarado commented Mar 15, 2019

@lemire Acknowledged. The point I was making is that such operation does not take place in a tight loop under the new ImmutableRoaringBitmap(ByteBuffer.wrap(data)).toRoaringBitmap() code path. This may help explain the performance difference, that is, somehow in the faster case we happen to be making bigger reads than a short at a time.

@richardstartin

This comment has been minimized.

Copy link
Contributor

richardstartin commented Mar 15, 2019

Which JVM version?

@PedroAlvarado

This comment has been minimized.

Copy link
Author

PedroAlvarado commented Mar 15, 2019

This can be reproduced with the JVM below.

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
@blacelle

This comment has been minimized.

Copy link
Member

blacelle commented Mar 15, 2019

IO effect is quite big: #320

I'm left with a 10%-30% penalty with recommended over Immutable

@incaseoftrouble

This comment has been minimized.

Copy link
Contributor

incaseoftrouble commented Mar 15, 2019

I'd be curious how large the difference is in a larger benchmark. There are some real-world benchmarks in the repo, right?

Direct reading should definitely be as least as fast as data -> immutable -> mutable and probably as fast as immutable reading up to a small factor?

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

Direct reading should definitely be as least as fast

This is not obvious to me. Count the minimal number of copies.

  • The data -> immutable -> mutable does zero copy between data -> immutable, and one copy between immutable -> mutable.
  • The current @blacelle approach does one copy from the input data to temporary byte arrays, then another final copy.

So you have two copies versus just one, assuming that you are doing a minimal number of copies.

Now what @blacelle did not do, but we could do, is provide a new entry point where we take as an input a byte array. Then we could definitively beat or match the data -> immutable -> mutable path.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

The real benefit of @blacelle 's current work is that it could help the more general case (e.g., where we are reading data from disk).

@incaseoftrouble

This comment has been minimized.

Copy link
Contributor

incaseoftrouble commented Mar 15, 2019

"should be" in the sense that one should aim for that. I don't see any reason why it should be fundamentally slower, if you see what I mean.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

I don't see any reason why it should be fundamentally slower

Well... how do you go from the byte array to DataInput API and then back to the data we need without doing two copies of the data?

That is, the reason is the DataInput API.

It is the price of abstraction.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

People tend to assume that abstract has no price... and sometimes that's true, but you can't assume that it generally holds true. More general code, more often than not, also runs slower. There is a trade-off. Easier software engineering vs. lower performance.

@incaseoftrouble

This comment has been minimized.

Copy link
Contributor

incaseoftrouble commented Mar 15, 2019

We speak of the same thing :) As someone who overeagerly abstracts things I learned how costly abstraction can be :D

In any case I'd be interested in the numbers.

@richardstartin

This comment has been minimized.

Copy link
Contributor

richardstartin commented Mar 15, 2019

Actually it has nothing to do with abstraction. The cost differential isn't even constant as a function of data set. More to follow.

(taskset -c 0 java -jar benchmarks.jar -wi 10 -f 1 -i 5 -r 1 -w 2 RealDataSerializationBenchmark)
Benchmark                                            (dataset)   Mode  Cnt     Score      Error  Units
RealDataSerializationBenchmark.buffer            census-income  thrpt    5  1965.244 ±   37.796  ops/s

RealDataSerializationBenchmark.buffer               census1881  thrpt    5  1621.815 ±  107.444  ops/s

RealDataSerializationBenchmark.buffer            dimension_008  thrpt    5   395.556 ±   25.508  ops/s
RealDataSerializationBenchmark.buffer            dimension_003  thrpt    5   323.339 ±   16.314  ops/s
RealDataSerializationBenchmark.buffer            dimension_033  thrpt    5  1015.797 ±   31.888  ops/s

RealDataSerializationBenchmark.buffer             uscensus2000  thrpt    5  8341.783 ±  655.165  ops/s

RealDataSerializationBenchmark.buffer          weather_sept_85  thrpt    5   489.562 ±   22.612  ops/s
RealDataSerializationBenchmark.buffer       wikileaks-noquotes  thrpt    5  4105.618 ±   56.753  ops/s
RealDataSerializationBenchmark.buffer        census-income_srt  thrpt    5  2082.076 ±   35.345  ops/s
RealDataSerializationBenchmark.buffer           census1881_srt  thrpt    5  1594.191 ±   88.228  ops/s
RealDataSerializationBenchmark.buffer      weather_sept_85_srt  thrpt    5   711.106 ±   16.112  ops/s
RealDataSerializationBenchmark.buffer   wikileaks-noquotes_srt  thrpt    5  2310.206 ±  384.540  ops/s
RealDataSerializationBenchmark.roaring           census-income  thrpt    5   187.191 ±   18.361  ops/s

RealDataSerializationBenchmark.roaring              census1881  thrpt    5   135.563 ±    7.407  ops/s

RealDataSerializationBenchmark.roaring           dimension_008  thrpt    5   111.083 ±    6.518  ops/s
RealDataSerializationBenchmark.roaring           dimension_003  thrpt    5    73.517 ±    1.896  ops/s
RealDataSerializationBenchmark.roaring           dimension_033  thrpt    5   102.152 ±    3.588  ops/s

RealDataSerializationBenchmark.roaring            uscensus2000  thrpt    5  7343.555 ± 1631.849  ops/s

RealDataSerializationBenchmark.roaring         weather_sept_85  thrpt    5    47.184 ±    3.112  ops/s
RealDataSerializationBenchmark.roaring      wikileaks-noquotes  thrpt    5   350.872 ±   11.326  ops/s
RealDataSerializationBenchmark.roaring       census-income_srt  thrpt    5   204.230 ±   18.267  ops/s
RealDataSerializationBenchmark.roaring          census1881_srt  thrpt    5   454.772 ±   48.922  ops/s
RealDataSerializationBenchmark.roaring     weather_sept_85_srt  thrpt    5    81.597 ±    3.115  ops/s
RealDataSerializationBenchmark.roaring  wikileaks-noquotes_srt  thrpt    5   854.884 ±  153.669  ops/s
(taskset -c 0 java -jar benchmarks.jar -wi 10 -f 1 -i 5 -r 1 -w 2 -prof perfnorm -p dataset=census1881 RealDataSerializationBenchmark)
Benchmark                                                      (dataset)   Mode  Cnt          Score     Error  Units
RealDataSerializationBenchmark.buffer                         census1881  thrpt    5       1558.545 ± 196.937  ops/s
RealDataSerializationBenchmark.buffer:CPI                     census1881  thrpt               0.614             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-load-misses   census1881  thrpt           71126.488             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-loads         census1881  thrpt         1201799.098             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-stores        census1881  thrpt         1189704.902             #/op
RealDataSerializationBenchmark.buffer:L1-icache-load-misses   census1881  thrpt             874.612             #/op
RealDataSerializationBenchmark.buffer:LLC-load-misses         census1881  thrpt             591.424             #/op
RealDataSerializationBenchmark.buffer:LLC-loads               census1881  thrpt            2035.927             #/op
RealDataSerializationBenchmark.buffer:LLC-store-misses        census1881  thrpt           28960.104             #/op
RealDataSerializationBenchmark.buffer:LLC-stores              census1881  thrpt           30023.560             #/op
RealDataSerializationBenchmark.buffer:branch-misses           census1881  thrpt            2885.000             #/op
RealDataSerializationBenchmark.buffer:branches                census1881  thrpt          256547.289             #/op
RealDataSerializationBenchmark.buffer:cycles                  census1881  thrpt         2041543.656             #/op
RealDataSerializationBenchmark.buffer:dTLB-load-misses        census1881  thrpt             526.276             #/op
RealDataSerializationBenchmark.buffer:dTLB-loads              census1881  thrpt         1209089.372             #/op
RealDataSerializationBenchmark.buffer:dTLB-store-misses       census1881  thrpt              11.130             #/op
RealDataSerializationBenchmark.buffer:dTLB-stores             census1881  thrpt         1188860.332             #/op
RealDataSerializationBenchmark.buffer:iTLB-load-misses        census1881  thrpt              20.266             #/op
RealDataSerializationBenchmark.buffer:iTLB-loads              census1881  thrpt              18.230             #/op
RealDataSerializationBenchmark.buffer:instructions            census1881  thrpt         3327291.856             #/op
RealDataSerializationBenchmark.roaring                        census1881  thrpt    5        100.749 ±   5.857  ops/s
RealDataSerializationBenchmark.roaring:CPI                    census1881  thrpt               0.254             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-load-misses  census1881  thrpt           70394.828             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-loads        census1881  thrpt        37380403.631             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-stores       census1881  thrpt        11837498.822             #/op
RealDataSerializationBenchmark.roaring:L1-icache-load-misses  census1881  thrpt            5865.272             #/op
RealDataSerializationBenchmark.roaring:LLC-load-misses        census1881  thrpt             519.400             #/op
RealDataSerializationBenchmark.roaring:LLC-loads              census1881  thrpt            2461.178             #/op
RealDataSerializationBenchmark.roaring:LLC-store-misses       census1881  thrpt           27675.073             #/op
RealDataSerializationBenchmark.roaring:LLC-stores             census1881  thrpt           28956.043             #/op
RealDataSerializationBenchmark.roaring:branch-misses          census1881  thrpt            5717.834             #/op
RealDataSerializationBenchmark.roaring:branches               census1881  thrpt        17734151.039             #/op
RealDataSerializationBenchmark.roaring:cycles                 census1881  thrpt        33454843.412             #/op
RealDataSerializationBenchmark.roaring:dTLB-load-misses       census1881  thrpt             568.225             #/op
RealDataSerializationBenchmark.roaring:dTLB-loads             census1881  thrpt        37599995.434             #/op
RealDataSerializationBenchmark.roaring:dTLB-store-misses      census1881  thrpt              22.576             #/op
RealDataSerializationBenchmark.roaring:dTLB-stores            census1881  thrpt        11904857.586             #/op
RealDataSerializationBenchmark.roaring:iTLB-load-misses       census1881  thrpt              91.858             #/op
RealDataSerializationBenchmark.roaring:iTLB-loads             census1881  thrpt             328.489             #/op
RealDataSerializationBenchmark.roaring:instructions           census1881  thrpt       131519842.178             #/op
(taskset -c 0 java -jar benchmarks.jar -wi 10 -f 1 -i 5 -r 1 -w 2 -prof perfnorm -p dataset=uscensus2000 RealDataSerializationBenchmark)
Benchmark                                                        (dataset)   Mode  Cnt        Score     Error  Units
RealDataSerializationBenchmark.buffer                         uscensus2000  thrpt    5     7972.276 ± 583.118  ops/s
RealDataSerializationBenchmark.buffer:CPI                     uscensus2000  thrpt             0.299             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-load-misses   uscensus2000  thrpt         12734.092             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-loads         uscensus2000  thrpt        389515.867             #/op
RealDataSerializationBenchmark.buffer:L1-dcache-stores        uscensus2000  thrpt        308430.020             #/op
RealDataSerializationBenchmark.buffer:L1-icache-load-misses   uscensus2000  thrpt           182.069             #/op
RealDataSerializationBenchmark.buffer:LLC-load-misses         uscensus2000  thrpt            17.825             #/op
RealDataSerializationBenchmark.buffer:LLC-loads               uscensus2000  thrpt           158.584             #/op
RealDataSerializationBenchmark.buffer:LLC-store-misses        uscensus2000  thrpt           621.620             #/op
RealDataSerializationBenchmark.buffer:LLC-stores              uscensus2000  thrpt          3038.805             #/op
RealDataSerializationBenchmark.buffer:branch-misses           uscensus2000  thrpt           219.277             #/op
RealDataSerializationBenchmark.buffer:branches                uscensus2000  thrpt        206084.201             #/op
RealDataSerializationBenchmark.buffer:cycles                  uscensus2000  thrpt        420864.195             #/op
RealDataSerializationBenchmark.buffer:dTLB-load-misses        uscensus2000  thrpt            16.855             #/op
RealDataSerializationBenchmark.buffer:dTLB-loads              uscensus2000  thrpt        391997.975             #/op
RealDataSerializationBenchmark.buffer:dTLB-store-misses       uscensus2000  thrpt             0.947             #/op
RealDataSerializationBenchmark.buffer:dTLB-stores             uscensus2000  thrpt        306969.713             #/op
RealDataSerializationBenchmark.buffer:iTLB-load-misses        uscensus2000  thrpt             4.964             #/op
RealDataSerializationBenchmark.buffer:iTLB-loads              uscensus2000  thrpt             5.886             #/op
RealDataSerializationBenchmark.buffer:instructions            uscensus2000  thrpt       1407305.160             #/op
RealDataSerializationBenchmark.roaring                        uscensus2000  thrpt    5     8296.650 ± 588.376  ops/s
RealDataSerializationBenchmark.roaring:CPI                    uscensus2000  thrpt             0.277             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-load-misses  uscensus2000  thrpt          4534.410             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-loads        uscensus2000  thrpt        453572.466             #/op
RealDataSerializationBenchmark.roaring:L1-dcache-stores       uscensus2000  thrpt        141228.923             #/op
RealDataSerializationBenchmark.roaring:L1-icache-load-misses  uscensus2000  thrpt           169.028             #/op
RealDataSerializationBenchmark.roaring:LLC-load-misses        uscensus2000  thrpt            12.908             #/op
RealDataSerializationBenchmark.roaring:LLC-loads              uscensus2000  thrpt            73.774             #/op
RealDataSerializationBenchmark.roaring:LLC-store-misses       uscensus2000  thrpt           209.356             #/op
RealDataSerializationBenchmark.roaring:LLC-stores             uscensus2000  thrpt           710.555             #/op
RealDataSerializationBenchmark.roaring:branch-misses          uscensus2000  thrpt          1043.492             #/op
RealDataSerializationBenchmark.roaring:branches               uscensus2000  thrpt        215319.795             #/op
RealDataSerializationBenchmark.roaring:cycles                 uscensus2000  thrpt        406584.658             #/op
RealDataSerializationBenchmark.roaring:dTLB-load-misses       uscensus2000  thrpt            11.308             #/op
RealDataSerializationBenchmark.roaring:dTLB-loads             uscensus2000  thrpt        453625.811             #/op
RealDataSerializationBenchmark.roaring:dTLB-store-misses      uscensus2000  thrpt             0.374             #/op
RealDataSerializationBenchmark.roaring:dTLB-stores            uscensus2000  thrpt        141501.241             #/op
RealDataSerializationBenchmark.roaring:iTLB-load-misses       uscensus2000  thrpt             2.774             #/op
RealDataSerializationBenchmark.roaring:iTLB-loads             uscensus2000  thrpt             2.165             #/op
RealDataSerializationBenchmark.roaring:instructions           uscensus2000  thrpt       1469445.984             #/op
@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

Actually it has nothing to do with abstraction

I await the rest...

@PedroAlvarado

This comment has been minimized.

Copy link
Author

PedroAlvarado commented Mar 15, 2019

In our use case, we happen to have thousands of bitmaps cached in-memory as byte arrays. Allow me to second the idea of having an interface that takes a byte[] directly for deserialization.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

@PedroAlvarado Let us see what @richardstartin has to reveal first. :-)

@richardstartin

This comment has been minimized.

Copy link
Contributor

richardstartin commented Mar 15, 2019

@lemire don't wait for me because I'm quite sure deserialising from a byte[] would be beneficial. Just wanted to blurt out in response to you blaming failure to inline that, if anything, there's too much inlining - look at the way the instruction and iTLB counters blow up and take a look at -XX:+PrintInlining. It's also extremely sensitive to run optimisation. Let me synthesise my results this weekend and get back to you.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 15, 2019

Good.

@blacelle

This comment has been minimized.

Copy link
Member

blacelle commented Mar 15, 2019

I implemented deserialization from ByteBuffer directly in a RoaringBitmap.

recommended:
256 ms 82 ms 75 ms 143 ms 74 ms 73 ms 97 ms 74 ms 76 ms 99 ms 136 ms 80 ms 74 ms 76 ms 76 ms 137 ms 74 ms 74 ms 75 ms 75 ms 74 ms 129 ms 192 ms 74 ms 75 ms 77 ms 81 ms 83 ms 83 ms 108 ms
average: 96
via ByteArrayInputStream:
102 ms 101 ms 91 ms 76 ms 77 ms 77 ms 78 ms 79 ms 76 ms 75 ms 76 ms 80 ms 78 ms 75 ms 80 ms 76 ms 75 ms 75 ms 75 ms 76 ms 75 ms 84 ms 75 ms 74 ms 76 ms 76 ms 77 ms 75 ms 89 ms 75 ms
average: 79
via Immutable:
94 ms 67 ms 66 ms 65 ms 63 ms 67 ms 66 ms 62 ms 62 ms 65 ms 63 ms 64 ms 65 ms 71 ms 68 ms 67 ms 63 ms 66 ms 61 ms 63 ms 88 ms 65 ms 66 ms 65 ms 63 ms 66 ms 63 ms 64 ms 67 ms 71 ms
average: 66
via Deserialize ByteBuffer:
83 ms 71 ms 62 ms 59 ms 62 ms 93 ms 61 ms 60 ms 59 ms 60 ms 59 ms 60 ms 60 ms 73 ms 60 ms 60 ms 60 ms 60 ms 61 ms 60 ms 59 ms 65 ms 61 ms 61 ms 67 ms 62 ms 60 ms 59 ms 64 ms 62 ms
average: 63

@richardstartin

This comment has been minimized.

Copy link
Contributor

richardstartin commented Mar 16, 2019

What we're seeing here is that DataInputStream is quite bad, despite everything inlining. DataInputStream isn't really representative of a RandomAccessFile on a system with an SSD. In fact, we can sometimes do really well without copying any data or writing any new code by providing a ByteBuffer backed DataInput, which is also quite friendly to the garbage collector.

public static class BufferDataInput implements DataInput {

    private final ByteBuffer data;

    public BufferDataInput(ByteBuffer data) {
      this.data = data;
    }

    @Override
    public void readFully(byte[] bytes) throws IOException {
      data.get(bytes);
    }

    @Override
    public void readFully(byte[] bytes, int i, int i1) throws IOException {
      data.get(bytes, i, i1);
    }

    @Override
    public int skipBytes(int i) throws IOException {
      data.position(data.position() + i);
      return data.position();
    }

    @Override
    public boolean readBoolean() throws IOException {
      return data.get() != 0;
    }

    @Override
    public byte readByte() throws IOException {
      return data.get();
    }

    @Override
    public int readUnsignedByte() throws IOException {
      return data.get() & 0xFF;
    }

    @Override
    public short readShort() throws IOException {
      return data.getShort();
    }

    @Override
    public int readUnsignedShort() throws IOException {
      return data.getShort() & 0xffff;
    }

    @Override
    public int readInt() throws IOException {
      return data.getInt();
    }

    @Override
    public long readLong() throws IOException {
      return data.getLong();
    }
...
  }

full results

Benchmark                                                          (dataset)  (runOptimise)   Mode  Cnt      Score      Error  Units
RealDataSerializationBenchmark.bufferBackedDataInput              census1881           true  thrpt    5    754.988 ±   41.436  ops/s
RealDataSerializationBenchmark.bufferBackedDataInput              census1881          false  thrpt    5    259.896 ±   13.848  ops/s
RealDataSerializationBenchmark.bufferBackedDataInput            uscensus2000           true  thrpt    5  19097.221 ± 1378.555  ops/s
RealDataSerializationBenchmark.bufferBackedDataInput            uscensus2000          false  thrpt    5  21454.035 ±  447.602  ops/s
RealDataSerializationBenchmark.streamBackedDataInput              census1881           true  thrpt    5     93.026 ±   18.178  ops/s
RealDataSerializationBenchmark.streamBackedDataInput              census1881          false  thrpt    5     56.886 ±   38.233  ops/s
RealDataSerializationBenchmark.streamBackedDataInput            uscensus2000           true  thrpt    5   7342.377 ± 2062.902  ops/s
RealDataSerializationBenchmark.streamBackedDataInput            uscensus2000          false  thrpt    5   8140.991 ± 1940.242  ops/s
RealDataSerializationBenchmark.viaImmutable                       census1881           true  thrpt    5   1453.401 ± 1117.323  ops/s
RealDataSerializationBenchmark.viaImmutable                       census1881          false  thrpt    5   1482.254 ±  125.103  ops/s
RealDataSerializationBenchmark.viaImmutable                     uscensus2000           true  thrpt    5   6614.448 ±  733.655  ops/s
RealDataSerializationBenchmark.viaImmutable                     uscensus2000          false  thrpt    5   3036.661 ± 1342.902  ops/s

So I'm not sure this can be blamed on the abstraction, but on a bad implementation of the abstraction. In fact, binding to the abstraction allows the user to provide a better implementation, or a file, or a Kryo object... Despite that, I think providing a way to deserialise from a ByteBuffer (as opposed to byte[] as it can take fast paths to convert bytes to wider types) is a very good idea, so long as it doesn't assume the ByteBuffer is array backed.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 20, 2019

Thanks to @blacelle with contributions from @richardstartin, this is now resolved, complete with a new release where one has new functions to deserialize directly from a ByteBuffer, much faster than you could from a stream. (Richard has a blog post coming up which will bring some nuances.)

Now we can move to...
#322

I suspect that the same performance issues will arise again during serialization.

@lemire

This comment has been minimized.

Copy link
Member

lemire commented Mar 20, 2019

Closing.

@lemire lemire closed this Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.