New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLUME-3217 . Creates empty files when quota #225
base: trunk
Are you sure you want to change the base?
Conversation
As it can be seen in FLUME-3217, Flume creates empty files when HDFS quota has been reached. This is a first approach to the solution, it works although a better approach it could be implemented. The idea is to capture the quota Exception in order to delete this annoying empty files generated by flume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffernandez92 thanks for the contribution
are you sure that the DSQuotaExceededException can only happen with empty files and it is safe to delete them? You can configure flume to write files with the size of gigabytes and the quota may exceed in the middle of that.
Can one of the admins verify this patch? |
Hi @szaboferee I have reproduced the environment and you are right. It is necessary to check if the file weight is zero before doing any delete operation. Moreover it will be necessary to implement some sort of “stopper” in the “append” function of the BucketWriter class in order to achieve the expected behavior. I will try to do the modifications asap. Thank you and sorry for the stupid mistake. |
Avoid the deletion of empty files when quota reached as it could delete a file that does not have to be deleted.
Adding the checkQuota function: - When the quota is reached avoids creating new files as interrupts the open flow - If the quota is reached just in the middle of a transaction, the file is closed with the data that could be included and then any other file will be created.
@ffernandez92 can you add a test with a minicluster? It would make the review much easier |
Check FS type before doing any action.
Hi @szaboferee sure, I will add a test with minicluster. Now I am uploading a new version as I am having some trouble with test part in local. |
There was a mistake when the replication factor was greater than 1. I expect to commit a test shortly
FIX DEPENDENCIES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffernandez92 Thanks, it looks promising, I have one question between the lines.
config.setBoolean("fs.automatic.close", false); | ||
ContentSummary cSumm; | ||
try { | ||
cSumm = fsCheckQuota(quotaPath.getFileSystem(config),quotaPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if checking the quota for every single write would cause any performance degradation.
Would it make sense to not compute the size of the event but a configurable threshold?
Like flume would stop writing if there is only 100 MB left from the quota. This way we could avoid race conditions for the last event sized free space.
@ffernandez92 thanks for the contribution. I guess this problem can be quite painful. Let me know please, what you think. |
Hi! Sorry for the late answer I’ve been a little bit under a rush lately. I think you are right @szaboferee about the performance degradation. I will try to reformulate the solution trying both approaches to see which one would be better. |
scala: recode pullAliAccessLog
As it can be seen in FLUME-3217, Flume creates empty files when HDFS quota has been reached.
This is a first approach to the solution. It works although a better approach it could be implemented.
The idea is to capture the quota Exception in order to delete this annoying empty files generated by flume.