[FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes #2544

StephanEwen · 2016-09-23T13:24:12Z

This prevents failures on eventually consistent S3, where the operations for keys (=entries in the parent directory/bucket) are not guaranteed to be immediately consistent (visible) after a blob was written.

Not relying on any operation on keys (= requesting FileStatus) should mitigate the problem.

This also changes the exception signature from getStateSize() from Exception to IOException, which fits more natural with the exception signatures of some other I/O methods.

Related issue: We may still want to have retries on FileStatus operations on S3, for other parts of the system (like FileOutputFormats)

StephanEwen · 2016-09-23T13:25:16Z

@StefanRRichter Maybe interesting for you to review.

aljoscha

Looks good 👍

…te sizes This prevents failures on eventually consistent S3, where the operations for keys (=entries in the parent directory/bucket) are not guaranteed to be immediately consistent (visible) after a blob was written.

StefanRRichter

Except for my two minor comments, everything looks good to me.

StefanRRichter · 2016-09-23T20:14:32Z

...ntime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStreamFactory.java

@@ -301,9 +301,16 @@ public StreamStateHandle closeAndGetHandle() throws IOException {
 					}
 					else {
 						flush();
+
+						long size = -1;


I am not sure if returning -1 as size on exception is ideal. Currently, this value should one be used in the calculation of meta data, but one might be tempted to use it e.g. to preallocate a byte[] to read the file into, so this should at least be documented in StateObject. Furthermore, we make the assumption that the stream position is also equal to the final file size. Not entirely sure if this holds for all streams and file systems, but I guess this is the best we can do without asking the file system for meta data.

Stream position should be okay to determine the state size. All instances I checked were accurate there.
Also, given that the size is more informational and should not be relied upon, it should be all the less critical.
As

StefanRRichter · 2016-09-23T20:21:13Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/StateObject.java

 	 */
-	long getStateSize() throws Exception;
+	long getStateSize() throws IOException;


I think with the change in StreamStateHandle, even throwing IOException becomes obsolete now for all existing implementations. We might remove it.

I think we should do that.

StephanEwen · 2016-09-27T14:06:48Z

Manually merged in 95e9004

aljoscha approved these changes Sep 23, 2016

View reviewed changes

StephanEwen force-pushed the state_size_fix branch from 7ce2de7 to dc12d0b Compare September 23, 2016 18:20

StefanRRichter approved these changes Sep 23, 2016

View reviewed changes

StephanEwen closed this Sep 27, 2016

rmetzger added the component=<none> label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes #2544

[FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes #2544

StephanEwen commented Sep 23, 2016

StephanEwen commented Sep 23, 2016

aljoscha left a comment

StefanRRichter left a comment

StefanRRichter Sep 23, 2016

StephanEwen Sep 26, 2016

StefanRRichter Sep 23, 2016

StephanEwen Sep 26, 2016

StephanEwen commented Sep 27, 2016

[FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes #2544

[FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes #2544

Conversation

StephanEwen commented Sep 23, 2016

StephanEwen commented Sep 23, 2016

aljoscha left a comment

Choose a reason for hiding this comment

StefanRRichter left a comment

Choose a reason for hiding this comment

StefanRRichter Sep 23, 2016

Choose a reason for hiding this comment

StephanEwen Sep 26, 2016

Choose a reason for hiding this comment

StefanRRichter Sep 23, 2016

Choose a reason for hiding this comment

StephanEwen Sep 26, 2016

Choose a reason for hiding this comment

StephanEwen commented Sep 27, 2016