Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS data below 64KB not flushed to HDFS #107

Closed
muralirvce opened this issue Apr 4, 2018 · 6 comments
Closed

HDFS data below 64KB not flushed to HDFS #107

muralirvce opened this issue Apr 4, 2018 · 6 comments

Comments

@muralirvce
Copy link

Hi,
I see that data is sent in 64KB packets to the hdfs datanode in rpc requests. But if there is not data pending say 63KB then if it takes more than a minute for the data to arrive the rpc connection will be closed and the data will not be written. Would it be good to add a flush routine on the writer to help in such scenarios?

Thanks,
Murali

@colinmarc
Copy link
Owner

Hi @muralirvce,

Are you making sure to call Close? The documentation calls this out:

Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.

Or do you mean when keeping a file open for more than an hour?

@muralirvce
Copy link
Author

I call close, but before that the timeout has expired. I write data every 30 seconds once like 5KB, we do not have 64KB to flush. Hence we will not send the data out until we get 64KB. By the time we close the connection could have timedout. IMO we can add a flush routing to the FileWriter class. I have done some changes, can submit the patch if needed.

@colinmarc
Copy link
Owner

There might be a keepalive we can send, rather than flushing. I'd have to research.

@muralirvce
Copy link
Author

Keepalive sounds good. IMO Adding a flush also might not be a bad idea, incase if the data has to be visible. Only part is the block length will not be updated but the data is present. Block reporting will happen when we close. Please let me know your take on it. It is like any other file system writer.

@colinmarc
Copy link
Owner

colinmarc commented Apr 5, 2018 via email

@colinmarc
Copy link
Owner

I exposed FileWriter.Flush and added some tests around it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants