[FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase #4876

pnowojski · 2017-10-20T15:50:06Z

What is the purpose of the change

It depends whether to call hsync or hflush on the underlying file system and user preferences. Normally hflush is enough to protect against single machine HDFS failures and against TaskManagers failures. However if user is using S3 like file system, or wants to protect against whole HDFS rack power loss hsync must be used instead.

This is a stop gap solution until proper fix waiting for https://issues.apache.org/jira/browse/FLINK-5789

Verifying this change

Testing syncOnFlush is hard :( One could think about writing a unit test using mocks, but that would only copy the implementation.

I have added tests for duplicate() (and fixed StringWriter::duplicate which was not preserving the charset)

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

It depends whether to call hsync or hflush on the underlying file system and user preferences. Normally hflush is enough to protect against single machine HDFS failures and against TaskManagers failures. However if user is using S3 like file system, or wants to protect againt whole HDFS rack power loss hsync must be used instead.

aljoscha · 2017-10-24T13:13:25Z

Excellent changes, I'll merge along with a bunch of other stuff. 👍

aljoscha · 2017-10-24T14:14:56Z

I merged this, could you please close the PR?

pnowojski · 2017-10-24T14:27:12Z

Thanks!

StephanEwen · 2017-10-26T15:33:15Z

A quick post-mortem comment here:

This adds a lot of equals() and hashCode() on classes where these are ill-defined.

For example: StreamWriterBase defines equals() and hashCode() just on a subset of configuration fields (here the sync field) and ignore the associated stream, because equals and hash is ill-defined on the stream. To me, the correct conclusion is that equals() and hashCode() are ill-defined on StreamWriterBase and should not be there!

Adding such methods just to make assertion statement in tests more compact wrongly pushes some specific test logic in to the main classes. The correct way is to adjust the assertions in the test, or, if there is a lot of repetitive checking, create a Matcher that matches "equality based on some fields" and replace assertEquals(X, Y) with assertThat(matcher, X, Y).

I think in this case, it is actually a reason for a follow-up patch that changes this.

pnowojski · 2017-10-27T07:15:04Z

I assumed that it is up to concrete implementations of StreamWriterBase to implement equals completely, for example based on configuration. Alternatively including outStream in equality/hash implementation is also a solution, with slightly different semantic.

StephanEwen · 2017-10-27T11:48:51Z

But do you understand what I mean? Semantics of code in the main scope should not be quirked to make assertions in tests shorter to write.

Equals/hashCode is usually not implemented on I/O classes, like the output stream, because it is not well defined.

pnowojski · 2017-10-27T14:07:14Z

Yes I know what you mean. However including outStream to hashCode and equals wouldn't add any quirks.

pnowojski force-pushed the f7737 branch from 8d1c341 to 1bfb776 Compare October 20, 2017 16:31

pnowojski closed this Oct 24, 2017

rmetzger added the component=Connectors/Common label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase #4876

[FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase #4876

Uh oh!

pnowojski commented Oct 20, 2017 •

edited

Loading

Uh oh!

aljoscha commented Oct 24, 2017

Uh oh!

aljoscha commented Oct 24, 2017

Uh oh!

pnowojski commented Oct 24, 2017

Uh oh!

StephanEwen commented Oct 26, 2017

Uh oh!

pnowojski commented Oct 27, 2017

Uh oh!

StephanEwen commented Oct 27, 2017

Uh oh!

pnowojski commented Oct 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase #4876

[FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase #4876

Uh oh!

Conversation

pnowojski commented Oct 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

aljoscha commented Oct 24, 2017

Uh oh!

aljoscha commented Oct 24, 2017

Uh oh!

pnowojski commented Oct 24, 2017

Uh oh!

StephanEwen commented Oct 26, 2017

Uh oh!

pnowojski commented Oct 27, 2017

Uh oh!

StephanEwen commented Oct 27, 2017

Uh oh!

pnowojski commented Oct 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pnowojski commented Oct 20, 2017 •

edited

Loading