OAK-10182: use streams to avoid buffer positioning issues leading to corrupted files#893
Conversation
| try (InputStream inputStream = sourceUrl.getInputStream(); | ||
| FileOutputStream outputStream = new FileOutputStream(destinationPath.toFile())) { |
There was a problem hiding this comment.
I wonder if using buffered output/input streams here would help in speeding up the download? Probably not. What was the rationale for not using buffered streams?
There was a problem hiding this comment.
I have not tested it. But I have compared this solution with the previous one based on channels and there is no performance degradation when checksum is not enabled.
| if (md != null) { | ||
| md.update(buffer); | ||
| md.update(buffer, 0, bytesRead); |
There was a problem hiding this comment.
The checksum could in principle be handed over to another thread, but we have to be careful about managing the buffer. One idea is to have 2 buffers and swap them between the download thread and the checksum thread, so that when one buffer is being used for checksum calculations the other one is used for downloading. Just an idea for future improvement.
There was a problem hiding this comment.
I agree but this would complicate the logic. An even better option could be to use multiple blocking queues where we put the buffer. We can then have separate downstream consumer threads (eg: one for writing the file, and another to compute the checksum). In this way, we can actually decouple the reads from the writes further boosting performance. Complexity will obviously increase. A reactive library might help here.
#886 introduced a regression when checksum validation is enabled. The buffer mark is wrongly positioned when the destination file gets written leading to corrupted files. This also explains why the performance results with and without checksum validation were similar.
This PR uses streams and byte arrays instead of channels and ByteBuffers to perform write file operations and checksum. When the latter is enabled, the performance degradation is around 15%.