Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS: fix target file overwrite #1834

Merged
merged 3 commits into from
Aug 5, 2019

Conversation

markarasev
Copy link
Contributor

Fixes #1761

Purpose

Overwrite HDFS target file when the overwrite setting is set to true.

Changes

  • propagated overwrite information to HdfsWriter base trait
  • deleted existing file if overwrite is enabled in HdfsWriter#moveToTarget function

Background Context

This is the most simple fix I could come up with but there are still some issues with the code imho:

Marc Karassev added 2 commits July 22, 2019 15:08
* propagated overwrite information to HdfsWriter base trait
* made child classes override this property
* SequenceWriter factories now take an overwrite parameter
* deleted existing file if overwrite is enabled in HdfsWriter#moveToTarget function
… in child classes for better code structure understanding.
@lightbend-cla-validator

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

@markarasev
Copy link
Contributor Author

I ignored binary compatibility issues since it affects only internal APIs.

@2m
Copy link
Member

2m commented Aug 1, 2019

Thanks for the PR. It looks good to me, and I really appreciate you looking into the broken atomicity and suggesting a fix for that as well.

We are planning to release Alpakka 2.0 in a month or so. Some connectors have accumulated API breaking changes and thus the time for a major release. If you would be willing to upgrade this PR to use FileContext then we could target this PR to Alpakka 2.0.

@markarasev
Copy link
Contributor Author

Should the migration to FileContext be in this PR or in another one with an associated issue? I can't guarantee I'll find time to work on this in the next month and succeed.

Also, according to https://hadoop.apache.org/docs/r3.1.1/api/org/apache/hadoop/fs/FileContext.html#rename-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Options.Rename...-

atomicity of rename is dependent on the file system implementation

So even if we migrate to FileContext the atomicity will not be guaranteed (but it is very likely to happen).

@2m
Copy link
Member

2m commented Aug 2, 2019

A new ticket for migration to FileContext would be good. Maybe that will attract the attention of other HDFS connector users and spark a discussion.

A new PR that does the migration then will close that new ticket and #1761 as well.

Copy link
Member

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ennru ennru merged commit bfdf315 into akka:master Aug 5, 2019
@ennru ennru added this to the 1.1.1 milestone Aug 5, 2019
@ennru
Copy link
Member

ennru commented Aug 5, 2019

Thank you for fixing this! Keep the PRs coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HDFS: Overwrite target file
4 participants