Skip to content

HDDS-8748. Reduce DN IO times when deleteChunk (FilePerBlockStrategy)#4821

Closed
xichen01 wants to merge 1 commit intoapache:masterfrom
xichen01:HDDS-8748
Closed

HDDS-8748. Reduce DN IO times when deleteChunk (FilePerBlockStrategy)#4821
xichen01 wants to merge 1 commit intoapache:masterfrom
xichen01:HDDS-8748

Conversation

@xichen01
Copy link
Contributor

@xichen01 xichen01 commented Jun 2, 2023

What changes were proposed in this pull request?

When deleting a file, DN will check whether the Block file to be deleted exists or not in the FilePerBlockStrategy#deleteChunk.
This step of checking for the existence of a Block file can be omitted, and basically does not change the behavior of FilePerBlockStrategy#deleteChunk.
This can reduce the IO-times of FilePerBlockStrategy#deleteChunk
the FilePerBlockStrategy#deleteChunk

FilePerBlockStrategy#deleteChunk Change

  • For an existing file FilePerBlockStrategy#deleteChunk is no change in the result, just one less once IO with System.
  • For non-existent files, the `FilePerBlockStrategy#deleteChunk' function behaves the same as before the change, the only difference is that it prints a log of successful deletions.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8748

How was this patch tested?

existing test

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for working on this. There are two subtle issues with the patch:

  1. checkFullDelete is skipped when this method is being called from deleteChunks (verifyLength=false). In that case Block file to be deleted does not exist is not logged. Furthermore, this seems to be the common case (KeyValueHandler.deleteBlock).
  2. As you mention in the description:

For non-existent files ... it prints a log of successful deletions

I'm afraid performance improvement is possible only due to (1).

@xichen01
Copy link
Contributor Author

xichen01 commented Jun 5, 2023

Thanks @xichen01 for working on this. There are two subtle issues with the patch:

  1. checkFullDelete is skipped when this method is being called from deleteChunks (verifyLength=false). In that case Block file to be deleted does not exist is not logged. Furthermore, this seems to be the common case (KeyValueHandler.deleteBlock).
  2. As you mention in the description:

For non-existent files ... it prints a log of successful deletions

I'm afraid performance improvement is possible only due to (1).

Thanks for you response.
So it is possible that the "Block does not exist on the DN", which is a more common case. But from the "flame diagram" of our business, checking for the existence of files consumes a lot of time (our keys are small), and the LOG consumes a lot of time too (Print log to indicate that the deletion may not have been skipped due to non-existent Block)
image

image

@adoroszlai
Copy link
Contributor

@xichen01 What I meant is that this statement:

For non-existent files, the `FilePerBlockStrategy#deleteChunk' function behaves the same as before the change, the only difference is that it prints a log of successful deletions.

is true only for the case when verifyLength=true, i.e. the delete single chunk code path, which is not the one exercised by block deleting service. For the latter, it is equivalent to simply removing the check.

I'm OK with performing these checks and logging the messages only at debug level, if we can confirm that block deleting service provides other ways to understand what it's doing, or that the messages are basically useless.

In other words, we need to balance performance and observability.

@xichen01
Copy link
Contributor Author

xichen01 commented Jun 6, 2023

@xichen01 What I meant is that this statement:

For non-existent files, the `FilePerBlockStrategy#deleteChunk' function behaves the same as before the change, the only difference is that it prints a log of successful deletions.

is true only for the case when verifyLength=true, i.e. the delete single chunk code path, which is not the one exercised by block deleting service. For the latter, it is equivalent to simply removing the check.

I'm OK with performing these checks and logging the messages only at debug level, if we can confirm that block deleting service provides other ways to understand what it's doing, or that the messages are basically useless.

In other words, we need to balance performance and observability.

OK, understand. I will close this PR.

@xichen01
Copy link
Contributor Author

xichen01 commented Jun 6, 2023

Another Maybe Optimization for the deleteChunk
@adoroszlai
Do you think the Log of deleteChunk the can be adjust? There are too many "deleting LOG" in the DN log file.


In our environment the DN generate 1GB log file just need serval hours. the most of the contents is "Deleted block file: xxx" this can cause logs to be rolled back quickly, and make it difficult to find some others specific log.

Do you have any suggestion?

@adoroszlai
Copy link
Contributor

BlockDeletingService deletes blocks in a loop. We could collect a list of files deleted / failed to delete, then log them in a single message after the loop (but currently files are internal to the chunk manager).

I think a discussion about the possible improvements would be great.

@Xushaohong
Copy link
Contributor

In our environment the DN generate 1GB log file just need serval hours. the most of the contents is "Deleted block file: xxx" this can cause logs to be rolled back quickly, and make it difficult to find some others specific log.
Do you have any suggestion?

Hi @xichen01, you can adjust the log4j.properties and add a specific log appender for such classes.
E.g https://stackoverflow.com/questions/2763740/log4j-log-output-of-a-specific-class-to-a-specific-appender

@xichen01
Copy link
Contributor Author

xichen01 commented Jun 7, 2023

Hi @xichen01, you can adjust the log4j.properties and add a specific log appender for such classes.

Thanks for your suggestion. This seems a good idea.

@xichen01 xichen01 closed this Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments