Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
staging/lustre/llite: fix mkdir endless loop
Running on 3.11-rc4 kernel, I got below endless loop. It turns to be that Lustre always saves the first page of a dir inode mapping at index ~0UL. And after commit 5a72039 (mm: teach truncate_inode_pages_range() to handle non page aligned ranges), truncate_inode_pages_range() _NO LONGER_ truncates the page that is sitting at index ~0UL. [16768.998006] mkdir R running task 0 2717 2716 0x00000080 [16768.998073] 000000000000000e 0000000000000000 0000000000000000 ffff88000be00460 [16768.998157] ffff88000ea65908 ffffffff810fec3e ffff88000ea65968 ffff8800229e7750 [16768.998241] ffff88000ea658b8 0000000000000000 0000000000000000 ffff88000ea65958 [16768.998326] Call Trace: [16768.998401] [<ffffffff810fc6ed>] ? rcu_read_unlock+0x1c/0x2d [16768.998473] [<ffffffff810fec3e>] ? find_get_pages+0xf5/0x11b [16768.998530] [<ffffffff811078f0>] ? pagevec_lookup+0x20/0x2a [16768.998586] [<ffffffff8110920e>] ? truncate_inode_pages_range.part.2+0x161/0x39a [16768.998680] [<ffffffffa02ad5dc>] ? ll_md_blocking_ast+0x338/0x62f [lustre] [16768.998744] [<ffffffff8110947f>] ? truncate_inode_pages_range+0x38/0x3f [16768.998805] [<ffffffff811094f8>] ? truncate_inode_pages+0x12/0x14 [16768.998871] [<ffffffffa02ad6e8>] ? ll_md_blocking_ast+0x444/0x62f [lustre] [16768.998948] [<ffffffff810981b5>] ? arch_local_irq_save+0x9/0xc [16768.999022] [<ffffffffa07ee0e8>] ? ldlm_cancel_callback+0x67/0x12a [ptlrpc] [16768.999100] [<ffffffffa07f85b2>] ? ldlm_cli_cancel_local+0xf3/0x2bc [ptlrpc] [16768.999176] [<ffffffffa07f9163>] ? ldlm_cli_cancel_list_local+0x7e/0x1e4 [ptlrpc] [16768.999268] [<ffffffffa07f9473>] ? ldlm_cancel_resource_local+0x1aa/0x1b9 [ptlrpc] [16768.999385] [<ffffffffa0657bad>] ? mdc_resource_get_unused+0xf8/0x115 [mdc] [16768.999472] [<ffffffff8109c887>] ? trace_hardirqs_on+0xd/0xf [16768.999533] [<ffffffffa06583d8>] ? mdc_create+0x11e/0x4db [mdc] [16768.999597] [<ffffffff8152ed84>] ? mutex_unlock+0xe/0x10 [16768.999654] [<ffffffffa0350e99>] ? lmv_create+0x355/0x3e9 [lmv] [16768.999712] [<ffffffff811553b7>] ? final_putname+0x35/0x39 [16768.999775] [<ffffffffa02ae167>] ? ll_new_node+0x33b/0x3ff [lustre] [16768.999841] [<ffffffffa02ae62c>] ? ll_mkdir+0xf2/0x127 [lustre] [16768.999897] [<ffffffff81156996>] ? vfs_mkdir+0x84/0xc9 [16768.999961] [<ffffffff81158cf8>] ? SyS_mkdirat+0x77/0xad [16769.000014] [<ffffffff81158d47>] ? SyS_mkdir+0x19/0x1b [16769.000066] [<ffffffff81538652>] ? system_call_fastpath+0x16/0x1b Signed-off-by: Peng Tao <bergwolf@gmail.com>
- Loading branch information
28fde12There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be considered a bug in 5a72039? Otherwise, it seems that this would cause a conflict with hash = ~0UL being mapped onto the same index.
28fde12There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For files, 5a7239 is OK because no file can ever get that large (offset beyond 64bit). For directories, I am not sure. Will ext4 dentry page hash get that large? We already hard-coded MDS_DIR_END_OFF as 0xfffffffffffffffeULL. It makes me think that hash = ~0ULL is not possible because if ext4 does hash dentry page to any value in the 64bit space, MDS_DIR_END_OFF can also conflict.
28fde12There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is why the Lustre hash can ever get that small (hash == 0) to hit offset ~0ULL? I guess that does happen, or you wouldn't be seeing this problem. I thought we didn't use hash = 0 (maybe confusion based on comment in mdt_lock_pdo_init()), but it seems that this is an in-memory problem only.
Rather than the current solution, what about:
which is equivalent, but more consistent and avoids a conditional branch.
28fde12There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.