Skip to content

Some bugs in metadata_csum mode #100

@hayley-leblanc

Description

@hayley-leblanc

Hi,

I think I've found a couple of bugs in NOVA's metadata_csum mode.

The first bug appears to be an issue that seems to occur while mounting a cleanly-unmounted instance of NOVA. I've been able to trigger it by initializing a new instance of NOVA, creating a directory, and then unmounting it. On remount, one of my collaborators has reported a GPF crash. I don't get a crash on my machine, but on a kernel compiled with KASAN, I get the following KASAN report and partial call trace (which match the trace my collaborator sees on the GPF):

[   70.122633] ==================================================================
[   70.123476] BUG: KASAN: wild-memory-access in memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.124244] Write of size 120 at addr ffff1101b88f78a8 by task mount/1350

[   70.125064] CPU: 0 PID: 1350 Comm: mount Tainted: G           OE     5.1.0+ #415
[   70.125066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   70.125067] Call Trace:
[   70.125081]  dump_stack+0x94/0xd8
[   70.125093]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125103]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125111]  kasan_report+0x171/0x18c
[   70.125122]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125126]  check_memory_region+0x137/0x190
[   70.125129]  kasan_check_write+0x14/0x20
[   70.125139]  memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125150]  nova_free_inode_log+0x2b7/0x506 [nova]
[   70.125162]  ? nova_free_contiguous_log_blocks+0x213/0x213 [nova]
[   70.125171]  ? nova_insert_range_node+0x187/0x198 [nova]
[   70.125182]  nova_init_blockmap_from_inode+0x1ad/0x5ce [nova]

I did a little digging and it appears to be triggered by a strange value (0xffff1101b88f78a8 in this particular instance) in sih->alter_pi_addr being accessed in nova_free_inode_log.

The second bug is a crash consistency bug that occurs when a crash happens while a file's size is being increased by a write or truncate operation, or when a file is being renamed, and nova_initialize_inode_log gets called by nova_extend_inode_log. It seems as though there is a period of time between the initialization of the main log pointers and the alter log pointers in which a crash can cause problems. Specifically, if the system crashes after updating the main log pointers and checksum for the file's inode, but before it finishes updating the alter log pointers and calculating the new checksum, the file in question can't be deleted after the crash system is mounted again. Attempting to delete it gives an ENOSPC error. I don't know exactly what's going wrong during the unlink call that causes it to fail. I have observed that in crash states where the unlink fails, the recovery procedure prints "nova: nova_check_inode_integrity: inode replica 33 is stale, trying to repair using the primary" in dmesg. When it succeeds, I still see a "nova_repair_inode: inode 33 error repaired" message, but it seems to be coming from a checksum error rather than a stale replica.

I don't currently have a fix ready for either bug, although I have observed that the crash consistency bug seems to go away if the call to nova_update_inode_checksum is moved out of nova_initialize_inode_log and only called after we initialize both the primary and alter logs.

Let me know what you think! Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions