-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16245.Record the number of INodeDirectory when loading the FSImage file. #3512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dc8fac6 to
e1d453e
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
In jenkins, there are some exceptions, such as: @sodonnell, I remember you have some experience in loading FsImage, can you help review this pr. |
| INodeDirectory p = dir.getInode(e.getParent()).asDirectory(); | ||
| for (long id : e.getChildrenList()) { | ||
| INode child = dir.getInode(id); | ||
| if (child.isDirectory()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only incrementing here if its a directory. The inode table / section contains an entry for every file and directory in the system.
The the directory section is what links them all together into the parent child relationship, so it should contain about the same number of entries as inodes.
I am not sure if it makes sense to just count the directories here, as we have already counted them in the inode section.
Why do you want to count just directories? Would it make more sense to count each entry and child entry to give an idea of the number of entries processed by each parallel section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for the comment and review.
When loading FsImage, we have recorded the number of all inodes (including INodeFile and INodeDirectory) in the log.
For example, here is the specific record information:
2021-09-30 19:12:55,034 [15609]-INFO [main:FSImageFormatPBINode$Loader@409]-Loading xxxx INodes.
Yes, this is good. We can know the data of the loaded inode, but this is a sum. But we can't know how many INodeFiles or how many INodeDirectory are loaded, if we can know how many INodeFiles are loaded, similarly, we can know how many INodeDirectory is loaded. This will help us find the cause of the problem when there is an exception.
Regarding the reason for dealing with INodeDirectory here. What I want to show is that in many cases, the number of files created will be more than the number of directories created. Therefore, it may use less time when calculating INodeDirectory.
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Description of PR
When loading the FSImage file, record the number of loaded INodeDirectory.
Details: HDFS-16245
How was this patch tested?
When the FSImage file is loaded, we need to compare the number of INodeDirectory that has been loaded.