Skip to content

Conversation

pnowojski
Copy link
Contributor

@pnowojski pnowojski commented Feb 7, 2018

This big PR depends on #4552 and #5314. Main purpose of this change is to increase network throughput/performance in low latency cases. On its own, #4552 and #5314 are causing huge performance degradation for ~1ms flushing intervals (on top of already very poor Flink's performance in such case). This PR is fixing making throughput in ~1ms more or less similar to ~100ms flushing interval.

Quick (noisy) benchmark results:

master branch:

Benchmark    (number of output channels, flush interval)   Mode  Cnt      Score       Error   Units
networkThroughput                 1,100ms  thrpt    5  53776.816 ± 8566.861  ops/ms
networkThroughput                 100,1ms  thrpt    5    536.800 ±  821.872  ops/ms
networkThroughput              1000,100ms  thrpt    5  30679.754 ± 3737.085  ops/ms

master + credit based flow control

Benchmark    (number of output channels, flush interval)   Mode  Cnt      Score       Error   Units
networkThroughput                 1,100ms  thrpt    5  49768.778 ± 13329.952  ops/ms
networkThroughput                 100,1ms  thrpt    5  BENCHMARK TIMEOUT! below ~150 ops/ms
networkThroughput              1000,100ms  thrpt    5  27793.594 ±  3428.951  ops/ms

credit based + low latency fixes (this PR):

Benchmark    (number of output channels, flush interval)   Mode  Cnt      Score       Error   Units
networkThroughput                 1,100ms  thrpt    5  47576.352 ± 12641.958  ops/ms
networkThroughput                 100,1ms  thrpt    5  41898.764 ±  4450.404  ops/ms
networkThroughput              1000,100ms  thrpt    5  27642.259 ±  9086.744  ops/ms

Brief change log

This last one ([FLINK-8591]) is the one commit that actually improves the performance by allowing sender to append a records to a memory segment, while PartitionRequestQueue in Netty is busy handling/processing/flushing previous memory segment and when it is blocked for a new credit to arrive.

Verifying this change

This change is a trivial rework ;)

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)

@pnowojski pnowojski changed the title Low latency network changes [FLINK-8581] Improve performance for low latency network Feb 8, 2018
@pnowojski pnowojski force-pushed the buffer-consumer branch 3 times, most recently from 3938b02 to 0896f88 Compare February 8, 2018 13:27
Copy link
Contributor

@StefanRRichter StefanRRichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes look good and I had some comments (inlined), but nothing blocking.

* {@link BufferBuilder} and there can be a different thread reading from it using {@link BufferConsumer}.
*/
@NotThreadSafe
public class BufferConsumer implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought about names: this is called BufferConsumer, but it does not "consume" buffers. It is coordinating the production of read slices from a shared buffer. BufferBuilder makes more sense then this. Even worse, this class has a build() : Buffer method :-(.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. Can you propose some different naming scheme? BufferWriter and BufferBuilder?

* @return how much information was written to the target buffer and
* whether this buffer is full
*/
SerializationResult setNextBufferBuilder(BufferBuilder bufferBuilder) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remark from reading the code, I found it a bit surprising that a method that looks like a setter will case the write to continue. Maybe this is better called something like continueWritingWithNextBufferBuilder or split the setter from a continueWrite method?

bufferBuilders[targetChannel] = Optional.empty();

numBytesOut.inc(bufferBuilder.getWrittenBytes());
bufferBuilder.finish();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could combine this into numBytesOut.inc(bufferBuilder.finish()) or maybe finish() should not need to have a return value?

result = serializer.setNextBufferBuilder(bufferBuilder);
SerializationResult result = serializer.addRecord(record);

while (result.isFullBuffer()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this loop could not be simplified to

		while (!result.isFullRecord()) {
			tryFinishCurrentBufferBuilder(targetChannel, serializer);
			BufferBuilder bufferBuilder = requestNewBufferBuilder(targetChannel);
			result = serializer.setNextBufferBuilder(bufferBuilder);
		}

This would introduce a minor change in behaviour in cases where the end of the record falls exactly to the end of a buffer. With the change, the buffer is only finished by the next record and not on the spot. However this should not be a problem because this outcome is what usually should happen for almost every record beside those corner cases and thus the code should already handle them well.
With this change, tryFinishCurrentBufferBuilder also does not longer require a return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I'm not entirely sure. This "minor change" can be a significant overhead in case of many channels and large records. I don't want to risk increasing the scope of potential problems with this PR :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Can introduce this change later after some more extensive tests.

public static void waitForAll(long timeoutMillis, Collection<Future<?>> futures) throws Exception {
long startMillis = System.currentTimeMillis();
Set<Future<?>> futuresSet = new HashSet<>();
for (Future<?> future : futures) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be replaced with addAll() or even the constructor taking collection.

Buffer buffer,
private boolean tryFinishCurrentBufferBuilder(
int targetChannel,
RecordSerializer<T> serializer) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code no longer throws IOException.

reader.setRegisteredAsAvailable(true);
}

private NetworkSequenceViewReader poolAvailableReader() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be pollAvailableReader()

} else {
// This channel was now removed from the available reader queue.
// We re-add it into the queue if it is still available
if (next.moreAvailable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the most common case, and I wonder why we cannot just peek the queue and only remove reader in the other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the most common case - except of super low latencies cases, network is much faster then our capabilities to produce data.

Secondly, there are three branches that we need to cover here. With as it is no, we poll reader once, and only re-enqueue it once (in this case that you commented). With peek we would have to pop it in two places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


@Override
public synchronized ResultPartitionID getPartitionId() {
return new ResultPartitionID();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended effect of having this synchronized, looks like it does nothing?


@Override
public synchronized BufferProvider getBufferProvider() {
return bufferProvider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this synchronize help? The field is final, so I would assume this change is not required.

…dSerializationTest

Dedupilcated code was effectively identical, but implemented in a slightly different way.
BufferConsumer will be used in the future for reading partially written
MemorySegments. On flushes instead of requesting new MemorySegment BufferConsumer
code will allow to continue writting to partially filled up MemmorySegment.
notifyBuffersAvailable is a quick call that doesn't need to be executed outside of the lock
SpilledSubpartitionViewTest duplicates a lot of production logic (TestSubpartitionConsumer is a
duplicated logic of LocalInputChannel and mix of CreditBasedSequenceNumberingViewReader with PartitionRequestQueue.
Also it seems like most of the logic is covered by SpillableSubpartitionTest.
…nputGate and handle redundant data notifications
@pnowojski
Copy link
Contributor Author

I have rebased the PR and squashed the fixup commits.

@StefanRRichter
Copy link
Contributor

Thanks for those very good improvements, I will merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants