[FLINK-8547][network] Implement CheckpointBarrierHandler not to spill data for exactly-once #5400

zhijiangW · 2018-02-02T07:54:44Z

What is the purpose of the change

Currently in exactly-once mode, the BarrierBuffer would block inputs with barriers until all inputs have received the barrier for a given checkpoint. To avoid back-pressuring the input streams which may cause distributed deadlocks, the BarrierBuffer has to spill the data in disk files to recycle the buffers for blocked channels.

Based on credit-based flow control, every channel has exclusive buffers, so it is no need to spill data for avoiding deadlock. Then we implement a new CheckpointBarrierHandler for only buffering the data for blocked channels for better performance.

And this new CheckpointBarrierHandler can also be configured to use or not in order to rollback the original mode for unexpected risks.

Brief change log

Implement the new CreditBasedBarrierBuffer and CreditBasedBufferBlocker for buffering data in blocked channels in exactly-once mode.
Define the parameter taskmanager.exactly-once.blocking.data.enabled for enabling the new handler or not.

Verifying this change

This change added tests and can be verified as follows:

Added tests for the logic of CreditBasedBarrierBuffer
Added tests for the logic of CreditBasedBufferBlocker

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (yes)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

pnowojski

Thanks for the changes :)

Maybe could you deduplicate the code for BarrierBuffer and CreditBasedBarrierBuffer (modulo the changes that I requested in a comment)?

At least could you deduplicate the tests for them? They are identical and it will cost us in additional maintenance to keep both of them (and it can take long time to completely get rid one of them, especially that we might need to maintain 1.5 branch after 1.6 release and who know when we will drop BarrierBuffer).

Also pleas mark BarrierBuffer as deprecated.

pnowojski · 2018-02-05T10:20:46Z

flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java

+	@Deprecated
+	public static final ConfigOption<Boolean> EXACTLY_ONCE_BLOCKING_DATA_ENABLED =
+			key("taskmanager.exactly-once.blocking.data.enabled")
+			.defaultValue(false);


I think we would like to enable it by default and leave this config option just as a safety net in case of bugs/problems.

btw, shouldn't this be tightly coupled with a credit based flow switch?

yes, the default value should be true, but I think it should be changed after the FLINK-7456 is merged to make the credit-based work.

pnowojski · 2018-02-05T10:21:05Z

...aming-java/src/main/java/org/apache/flink/streaming/runtime/io/CreditBasedBarrierBuffer.java

+
+package org.apache.flink.streaming.runtime.io;
+
+import org.apache.flink.annotation.Internal;


nit: There were some checkstyle failures

the checkstyle failures are fixed

pnowojski · 2018-02-05T10:34:44Z

...aming-java/src/main/java/org/apache/flink/streaming/runtime/io/CreditBasedBarrierBuffer.java

+ * all inputs have received the barrier for a given checkpoint.
+ *
+ * <p>The BarrierBuffer continues receiving buffers from the blocked channels and buffered them
+ * internally until the blocks are released. It will not cause deadlocks based on credit-based


Please explain a little bit more It will not cause deadlocks based on credit-based flow control part in the comment.

pnowojski · 2018-02-05T11:13:06Z

...aming-java/src/main/java/org/apache/flink/streaming/runtime/io/CreditBasedBarrierBuffer.java

+	 * The pending blocked buffer/event sequences. Must be consumed before requesting further data
+	 * from the input gate.
+	 */
+	private final ArrayDeque<BufferOrEventSequence> queuedBuffered;


Do we need this queuedBuffered and currentBuffered fields with CreditBasedBufferBlocker? Why can not we just use ArrayDeque<BufferOrEvent> currentBuffers field from CreditBasedBufferBlocker for this? Why do we need this triple level buffering here? In original code it made sense, since instead of CreditBasedBufferBlocker there was a BufferSpiller.

Getting rid of those three fields would vastly simplify this class.

The current implementation keeps the same logic with BarrierBuffer. I am wondering whether it can make sense if only keeping one ArrayDeque<BufferOrEvent> for holding all blocking buffers for different checkpoint ids. Especially for the uncommon case mentioned on line 496 in BarrierBuffer. I will double check that logic and reply to you later.

I think we can not directly mix all the blocked buffers for different checkpoint ids into one ArrayDeque. It also needs the BufferOrEventSequence which indicates the blocked buffers for a specific checkpoint id, otherwise we can not know when the blocked buffers are exhausted after reset a specific checkpoint id.

If we want to use only one ArrayDeque for blocking all buffers, we may need to insert extra hints of checkpoint id into this queue for helping when to stop reading blocked buffers from the queue.

For example:
channel1: [cp1,cp2,b1,cp3,b2,b3]
channel2: [cp2]

When reading cp1 first from channel1, [cp2,b1,cp3,b2,b3] are blocked as separate sequence1.

When reading cp2 from channel2, the cp1 is released and begins to read sequence1.

When reading cp2 from seq1, the following buffers will be blocked in new seq2.

When reading cp3 from seq1,the cp2 is released and the seq2 only contains [b1].

The following buffers after cp3 will be blocked in new seq3 which contains[b2,b3].

So every sequence indicates the blocked buffers belonging to different checkpoint id, and they will be read first after this checkpoint id is released.

pnowojski · 2018-02-05T11:15:09Z

...streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java

 				throw new IllegalConfigurationException(
-						TaskManagerOptions.TASK_CHECKPOINT_ALIGNMENT_BYTES_LIMIT.key()
-						+ " must be positive or -1 (infinite)");
+					TaskManagerOptions.TASK_CHECKPOINT_ALIGNMENT_BYTES_LIMIT.key()


Please extract this and the same code from StreamTwoInputProcessor.java into a common method. I think all of the lines upto this.lock = checkNotNull(lock); could be unified. Maybe into some base class.

yes, i will consider a proper way

I think we can change the current CheckpointBarrierHandler interface into abstract class and then add a createBarrierHanlder method for extracting the common parts in StreamInputProcessor and StreamTwoInputProcessor. Or we define a new class for the common method. I prefer the first way.
What do you think?

pnowojski · 2018-02-05T11:18:57Z

flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BufferBlockerTest.java

+/**
+ * Tests for {@link CreditBasedBufferBlocker}.
+ */
+public class BufferBlockerTest {


Rename class to CreditBasedBufferBlockerTest

pnowojski · 2018-02-05T11:24:12Z

flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BufferBlockerTest.java

+		final int maxNumChannels = 1656;
+
+		// do multiple blocking / rolling over rounds
+		for (int round = 0; round < 5; round++) {


Can you deduplicate the code of those two unit tests (testSpillAndRollOverSimple and testSpillWhileReading)? It seems like this one is just a one sequence of the next one?

zhijiangW · 2018-02-05T15:15:07Z

@pnowojski , thanks for reviews!

I understand your concerns and I should deduplicate some common utils in these tests. I will do that tomorrow together with other comments!

zhijiangW · 2018-02-07T09:20:05Z

@pnowojski , I have submitted a separate commit to address above comments.

pnowojski

Besides deduplicating tests, please deduplicate the BarrierBuffer and CreditBasedBarrierBuffer.
They are also almost 1 to 1 identical classes. The only difference here is bufferBlocker field, where BarrierBuffer uses BufferSpiller and CreditBasedBarrierBuffer uses CreditBasedBufferBlocker. However this also can be easily fixed by extracting a common interface of CreditBasedBufferBlocker and BufferSpiller.

So please:

deduplicate tests as I suggested in a comment
extract common interface from CreditBasedBufferBlocker and BufferSpiller to lets say BufferBlocker class.
Completely remove current CreditBasedBarrierBuffer class (but keep CreditBasedBufferBlocker!) and change BarrierBuffer class to use interface BufferBlocker.
Replace BarrierBuffer constructors with the following ones:

public BarrierBuffer(InputGate inputGate, BufferBlocker bufferBlocker) throws IOException {
	this (inputGate, bufferBlocker, -1);
}

public BarrierBuffer(InputGate inputGate, BufferBlocker bufferBlocker, long maxBufferedBytes) hrows IOException {
	checkArgument(maxBufferedBytes == -1 || maxBufferedBytes > 0);

	this.inputGate = inputGate;
	this.maxBufferedBytes = maxBufferedBytes;
	this.totalNumberOfInputChannels = inputGate.getNumberOfInputChannels();
	this.blockedChannels = new boolean[this.totalNumberOfInputChannels];

	this.bufferBlocker = checkNotNull(bufferBlocker);
	this.queuedBuffered = new ArrayDeque<BufferOrEventSequence>();
}

In that case, depending on how you want to block input channels, you can inject either BufferSpiller to the BarrierBuffer (old way) or inject CreditBasedBufferBlocker in case of a new non spilling code.

Inject appropriate BufferBlocker implementations in InputProcessorUtil#createCheckpointBarrierHandler:

			if (taskManagerConfig.getBoolean(TaskManagerOptions.EXACTLY_ONCE_BLOCKING_DATA_ENABLED)) {
				barrierHandler = new BarrierBuffer(inputGate, new CreditBasedBufferBlocker(), maxAlign);
			} else {
				barrierHandler = new BarrierBuffer(inputGate, new BufferSpiller(ioManager), maxAlign);
			}

pnowojski · 2018-02-09T08:17:47Z

...treaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BarrierBufferTestBase.java

+ * Utility class containing common methods for testing
+ * {@link BufferSpillerTest} and {@link CreditBasedBufferBlockerTest}.
+ */
+public class BarrierBufferTestBase {


This is not exactly what I had in mind by deduplication of BarrierBufferTest and CreditBasedBarrierBufferTest. Both of those tests are still pretty much copy of one another and those static methods are only a fraction of duplication.

Look for example at the testSingleChannelNoBarriers() they are 99% identical. All of it's code could be moved to BarrierBufferTestBase. BarrierBufferTestBase would only need to define abstract method CheckpointBarrierHandler createBarrierHandler() which would be define differently in BarrierBufferTest and CreditBasedBarrierBufferTest. One minor thing is that BarrierBufferTest would need checkNoTempFilesRemain() added as an @After test hook. Same applies to all of the other tests.

pnowojski · 2018-02-09T08:23:48Z

...treaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BarrierBufferTestBase.java

+			checkpointId, System.currentTimeMillis(), CheckpointOptions.forCheckpointWithDefaultLocation()), channel);
+	}
+
+	public static BufferOrEvent createCancellationBarrier(long checkpointId, int channel) {


Instead of using static methods please use inheritance - make BarrierBufferTest and CreditBasedBarrierBufferTest extend BarrierBufferTestBase. Especially that name *Base already suggests that.

zhijiangW · 2018-02-09T09:25:49Z

@pnowojski , thanks for suggestions and I totally agree with that.
That abstraction indeed makes the code simple. I will update the codes ASAP.

pnowojski

Thanks again for the contribution. This one looks almost good to me. Left only couple of NITs.

pnowojski · 2018-02-12T12:03:02Z

flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BarrierBufferTest.java


 /**
- * Tests for the behavior of the {@link BarrierBuffer}.
+ * Tests for the behavior of the {@link BarrierBuffer} with {@link BufferSpiller}


nit: Missing period in java doc (build failure).

pnowojski · 2018-02-12T12:03:37Z

...treaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BufferBlockerTestBase.java

+			this.numChannels = numChannels;
+		}
+	}
+}


nit: build failure, missing EOL

pnowojski · 2018-02-12T12:09:51Z

flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/BufferBlocker.java

+	 * @param newBuffer only works for {@link BufferSpiller} implements currently.
+	 * @return The readable sequence of buffers and events, or 'null', if nothing was added.
+	 */
+	BufferOrEventSequence rollOver(boolean newBuffer) throws IOException;


Could we stick with two methods in the interface? I think more descriptive names will be better compared to parameter here: rollOverWithoutReusingResources() and rollOverReusingResources(), where: rollOverWithoutReusingResources == rollOver(true).

Especially if one implementation doesn't support one of those calls.

pnowojski · 2018-02-12T12:11:49Z

flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/BufferBlocker.java

+	 * Starts a new sequence of buffers and event and returns the current sequence of buffers for reading.
+	 * This method returns {@code null}, if nothing was added since the creation, or the last call to this method.
+	 *
+	 * @param newBuffer only works for {@link BufferSpiller} implements currently.


Java doc in this interface shouldn't mention implementation specific details. On the other hand, this java doc doesn't explain what newBuffer is doing and for this information one must check the BufferSpiller's java doc itself.

Can you add appropriate java doc here, or better add java doc to proposed in the comment below two methods: rollOverWithoutReusingResources() and rollOverReusingResources(). Comment in CachedBufferBlocker.java#rollOverReusingResources should state that it is never reusing resources and is defaulting to CachedBufferBlocker.java#rollOverWithoutReusingResources

pnowojski · 2018-02-12T13:39:41Z

flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/BarrierBufferTest.java

-	private static final int PAGE_SIZE = 512;
-
-	private static int sizeCounter = 0;
+public class BarrierBufferTest extends BarrierBufferTestBase {


Rename the test class name to SpillingBarrierBufferTest?

zhijiangW · 2018-02-12T15:20:07Z

@pnowojski , I have submitted the updates for above comments.

pnowojski

This one LGTM.

There is one catch - it will conflict with #5423 . Let's wait for credit based PR to be merged, then low latency improvements, then I can rebase this one on top of my changes.

I assume that after rebasing this on top of the credit-based changes, the default value for EXACTLY_ONCE_BLOCKING_DATA_ENABLED should be changed to true, right @zhijiangW ?

zhijiangW · 2018-02-13T13:42:41Z

Thanks for rebasing the conflicts.

Yes, the default value can be changed to true after the credit-based is totally merged. If need any changes on my side after all, pls let me know. :)

… data for exactly-once

zhijiangW · 2018-02-19T09:30:28Z

@pnowojski , I have changed the EXACTLY_ONCE_BLOCKING_DATA_ENABLED as true and squashed the commits.

zhijiangW force-pushed the FLINK-8547 branch 5 times, most recently from 86559e9 to 25396e6 Compare February 5, 2018 10:09

pnowojski requested changes Feb 5, 2018

View reviewed changes

zhijiangW force-pushed the FLINK-8547 branch 3 times, most recently from a78dfc9 to 25d6eb1 Compare February 7, 2018 09:08

zhijiangW force-pushed the FLINK-8547 branch from 25d6eb1 to a117961 Compare February 8, 2018 06:00

pnowojski requested changes Feb 9, 2018

View reviewed changes

zhijiangW force-pushed the FLINK-8547 branch from 020d183 to 8ed3f21 Compare February 9, 2018 16:07

pnowojski requested changes Feb 12, 2018

View reviewed changes

pnowojski approved these changes Feb 13, 2018

View reviewed changes

[FLINK-8547][network] Implement CheckpointBarrierHandler not to spill…

f5fafce

… data for exactly-once

zhijiangW force-pushed the FLINK-8547 branch from 0dc80f7 to f5fafce Compare February 19, 2018 05:30

asfgit closed this in 3126bf5 Feb 22, 2018

zhijiangW deleted the FLINK-8547 branch February 22, 2018 14:09

rmetzger added the component=Runtime/Network label Mar 18, 2019


		package org.apache.flink.streaming.runtime.io;

		import org.apache.flink.annotation.Internal;

[FLINK-8547][network] Implement CheckpointBarrierHandler not to spill data for exactly-once #5400

[FLINK-8547][network] Implement CheckpointBarrierHandler not to spill data for exactly-once #5400

Uh oh!

Conversation

zhijiangW commented Feb 2, 2018

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

pnowojski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhijiangW commented Feb 5, 2018

Uh oh!

zhijiangW commented Feb 7, 2018

Uh oh!

pnowojski left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhijiangW commented Feb 9, 2018

Uh oh!

pnowojski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhijiangW commented Feb 12, 2018

Uh oh!

pnowojski left a comment

Choose a reason for hiding this comment

Uh oh!

pnowojski left a comment •

edited

Loading