Johannes-Thums…
Commits on Apr 28, 2020
-
btrfs: rename btrfs_parse_device_options back to btrfs_parse_early_op…
…tions As btrfs_parse_device_options() now doesn't only parse the -o device mount option but -o auth_key as well, it makes sense to rename it back to btrfs_parse_early_options(). This reverts commit fa59f27. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Johannes Thumshirn authored and 0day robot committedApr 28, 2020 -
btrfs: add authentication support
Add authentication support for a BTRFS file-system. This works, because in BTRFS every meta-data block as well as every data-block has a own checksum. For meta-data the checksum is in the meta-data node itself. For data blocks, the checksums are stored in the checksum tree. When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256), a key is needed to mount a verified file-system. This key also needs to be used at file-system creation time. We have to used a keyed hash scheme, in contrast to doing a normal cryptographic hash, to guarantee integrity of the file system, as a potential attacker could just replay file-system operations and the changes would go unnoticed. Having a keyed hash only on the topmost Node of a tree or even just in the super-block and using cryptographic hashes on the normal meta-data nodes and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes do not include the checksums of their respective child nodes, but only the block pointers and offsets where to find them on disk. Also note, we do not need a incompat R/O flag for this, because if an old kernel tries to mount an authenticated file-system it will fail the initial checksum type verification and thus refuses to mount. The key has to be supplied by the kernel's keyring and the method of getting the key securely into the kernel is not subject of this patch. Example usage: Create a file-system with authentication key 0123456 mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk Add the key to the kernel's keyring as keyid 'btrfs:foo' keyctl add logon btrfs:foo 0123456 @U Mount the fs using the 'btrfs:foo' key mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Johannes Thumshirn authored and 0day robot committedApr 28, 2020
Commits on Apr 23, 2020
-
Merge branch 'for-next-current-v5.6-20200423' into for-next-20200423
kdave committedApr 23, 2020 -
Merge branch 'misc-next' into for-next-current-v5.6-20200423
kdave committedApr 23, 2020 -
Merge branch 'misc-5.7' into for-next-current-v5.6-20200423
kdave committedApr 23, 2020 -
btrfs: remove useless check for copy_items() return value
At btrfs_log_prealloc_extents() we are checking if copy_items() returns a value greater than 0. That used to happen in the past to signal the caller that the path given to it was released and reused for other searches, but as of commit 0e56315 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES"), the copy_items() function does not have that behaviour anymore and always returns 0 or a negative value. So just remove that check at btrfs_log_prealloc_extents(), which the previously mentioned commit forgot to remove. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix partial loss of prealloc extent past i_size after fsync
When we have an inode with a prealloc extent that starts at an offset lower than the i_size and there is another prealloc extent that starts at an offset beyond i_size, we can end up losing part of the first prealloc extent (the part that starts at i_size) and have an implicit hole if we fsync the file and then have a power failure. Consider the following example with comments explaining how and why it happens. $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our test file with 2 consecutive prealloc extents, each with a # size of 128Kb, and covering the range from 0 to 256Kb, with a file # size of 0. $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo $ xfs_io -c "falloc -k 128K 128K" /mnt/foo # Fsync the file to record both extents in the log tree. $ xfs_io -c "fsync" /mnt/foo # Now do a redudant extent allocation for the range from 0 to 64Kb. # This will merely increase the file size from 0 to 64Kb. Instead we # could also do a truncate to set the file size to 64Kb. $ xfs_io -c "falloc 0 64K" /mnt/foo # Fsync the file, so we update the inode item in the log tree with the # new file size (64Kb). This also ends up setting the number of bytes # for the first prealloc extent to 64Kb. This is done by the truncation # at btrfs_log_prealloc_extents(). # This means that if a power failure happens after this, a write into # the file range 64Kb to 128Kb will not use the prealloc extent and # will result in allocation of a new extent. $ xfs_io -c "fsync" /mnt/foo # Now set the file size to 256K with a truncate and then fsync the file. # Since no changes happened to the extents, the fsync only updates the # i_size in the inode item at the log tree. This results in an implicit # hole for the file range from 64Kb to 128Kb, something which fsck will # complain when not using the NO_HOLES feature if we replay the log # after a power failure. $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo So instead of always truncating the log to the inode's current i_size at btrfs_log_prealloc_extents(), check first if there's a prealloc extent that starts at an offset lower than the i_size and with a length that crosses the i_size - if there is one, just make sure we truncate to a size that corresponds to the end offset of that prealloc extent, so that we don't lose the part of that extent that starts at i_size if a power failure happens. A test case for fstests follows soon. Fixes: 31d11b8 ("Btrfs: fix duplicate extents after fsync of file with prealloc extents") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix transaction leak in btrfs_recover_relocation
btrfs_recover_relocation() invokes btrfs_join_transaction(), which joins a btrfs_trans_handle object into transactions and returns a reference of it with increased refcount to "trans". When btrfs_recover_relocation() returns, "trans" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of btrfs_recover_relocation(). When read_fs_root() failed, the refcnt increased by btrfs_join_transaction() is not decreased, causing a refcnt leak. Fix this issue by calling btrfs_end_transaction() on this error path when read_fs_root() failed. Fixes: 79787ea ("btrfs: replace many BUG_ONs with proper error handling") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
Commits on Apr 22, 2020
-
btrfs: fix block group leak when removing fails
btrfs_remove_block_group() invokes btrfs_lookup_block_group(), which returns a local reference of the block group that contains the given bytenr to "block_group" with increased refcount. When btrfs_remove_block_group() returns, "block_group" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in several exception handling paths of btrfs_remove_block_group(). When those error scenarios occur such as btrfs_alloc_path() returns NULL, the function forgets to decrease its refcnt increased by btrfs_lookup_block_group() and will cause a refcnt leak. Fix this issue by jumping to "out_put_group" label and calling btrfs_put_block_group() when those error scenarios occur. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: simplify direct I/O read repair
Direct I/O read repair was originally implemented in commit 8b110e3 ("Btrfs: implement repair function when direct read fails"). This implementation is unnecessarily complicated. There is major code duplication between __btrfs_subio_endio_read() (checks checksums and handles I/O errors for files with checksums), __btrfs_correct_data_nocsum() (handles I/O errors for files without checksums), btrfs_retry_endio() (checks checksums and handles I/O errors for retries of files with checksums), and btrfs_retry_endio_nocsum() (handles I/O errors for retries of files without checksum). If it sounds like these should be one function, that's because they should. Additionally, these functions are very hard to follow due to their excessive use of goto. This commit replaces the original implementation. After the previous commit getting rid of orig_bio, we can reuse the same endio callback for repair I/O and the original I/O, we just need to track the file offset and original iterator in the repair bio. We can also unify the handling of files with and without checksums and simplify the control flow. We also no longer have to wait for each repair I/O to complete one by one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: get rid of one layer of bios in direct I/O
In the worst case, there are _4_ layers of bios in the Btrfs direct I/O path: 1. The bio created by the generic direct I/O code (dio_bio). 2. A clone of dio_bio we create in btrfs_submit_direct() to represent the entire direct I/O range (orig_bio). 3. A partial clone of orig_bio limited to the size of a RAID stripe that we create in btrfs_submit_direct_hook(). 4. Clones of each of those split bios for each RAID stripe that we create in btrfs_map_bio(). As of the previous commit, the second layer (orig_bio) is no longer needed for anything: we can split dio_bio instead, and complete dio_bio directly when all of the cloned bios complete. This lets us clean up a bunch of cruft, including dip->subio_endio and dip->errors (we can use dio_bio->bi_status instead). It also enables the next big cleanup of direct I/O read repair. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
The next commit will get rid of btrfs_dio_private->orig_bio. The only thing we really need it for is containing all of the checksums, but we can easily put the checksum array in btrfs_dio_private and have the submitted bios reference the array. We can also look the checksums up while we're setting up instead of the current awkward logic that looks them up for orig_bio when the first split bio is submitted. (Interestingly, btrfs_dio_private did contain the checksums before commit 23ea8e5 ("Btrfs: load checksum data once when submitting a direct read io"), but it didn't look them up up front.) Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: convert btrfs_dio_private->pending_bios to refcount_t
This is really a reference count now, so convert it to refcount_t and rename it to refs. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: remove unused btrfs_dio_private::private
We haven't used this since commit 9be3395 ("Btrfs: use a btrfs bioset instead of abusing bio internals"). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: make btrfs_check_repairable() static
Since its introduction in commit 2fe6303 ("Btrfs: split bio_readpage_error into several functions"), btrfs_check_repairable() has only been used from extent_io.c where it is defined. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: rename __readpage_endio_check to check_data_csum
__readpage_endio_check() is also used from the direct I/O read code, so give it a more descriptive name. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: clarify btrfs_lookup_bio_sums documentation
Fix a couple of issues in the btrfs_lookup_bio_sums documentation: * The bio doesn't need to be a btrfs_io_bio if dst was provided. Move the declaration in the code to make that clear, too. * dst must be large enough to hold nblocks * csum_size, not just csum_size. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: don't do repair validation for checksum errors
The purpose of the validation step is to distinguish between good and bad sectors in a failed multi-sector read. If a multi-sector read succeeded but some of those sectors had checksum errors, we don't need to validate anything; we know the sectors with bad checksums need to be repaired. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: look at full bi_io_vec for repair decision
Read repair does two things: it finds a good copy of data to return to the reader, and it corrects the bad copy on disk. If a read of multiple sectors has an I/O error, repair does an extra "validation" step that issues a separate read for each sector. This allows us to find the exact failing sectors and only rewrite those. This heuristic is implemented in bio_readpage_error()/btrfs_check_repairable() as: failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; if (failed_bio_pages > 1) do validation However, at this point, bi_iter may have already been advanced. This means that we'll skip the validation step and rewrite the entire failed read. Fix it by getting the actual size from the biovec (which we can do because this is only called for non-cloned bios, although that will change in a later commit). Fixes: 8a2ee44 ("btrfs: look at bi_size for repair decisions") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix double __endio_write_update_ordered in direct I/O
In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private, we complete the ordered extent range. However, we don't mark that the range doesn't need to be cleaned up from btrfs_direct_IO() until later. Therefore, if we fail to allocate the btrfs_dio_private, we complete the ordered extent range twice. We could fix this by updating unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so that creating the btrfs_dio_private and submitting the bios are separate, and once the btrfs_dio_private is created, cleanup always happens through the btrfs_dio_private. The logic around unsubmitted_oe_range_end and unsubmitted_oe_range_start is really subtle. We have the following: 1. btrfs_direct_IO sets those two to the same value. 2. When we call __blockdev_direct_IO unless btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to modify unsubmitted_oe_range_start so that start < end. Cleanup won't happen. 3. We come into btrfs_submit_direct - if it dip allocation fails we'd return with oe_range_end now modified so cleanup will happen. 4. If we manage to allocate the dip we reset the unsubmitted range members to be equal so that cleanup happens from btrfs_endio_direct_write. This 4-step logic is not really obvious, especially given it's scattered across 3 functions. Fixes: f28a492 ("Btrfs: fix leaking of ordered extents after direct IO write error") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> [ add range start/end logic explanation from Nikolay ] Signed-off-by: David Sterba <dsterba@suse.com> -
btrfs: fix error handling when submitting direct I/O bio
In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID stripe or chunk, we submit orig_bio without cloning it. In this case, we don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we decrement pending_bios to -1, and we never complete orig_bio. Fix it by initializing pending_bios to 1 instead of incrementing later. Fixing this exposes another bug: we put orig_bio prematurely and then put it again from end_io. Fix it by not putting orig_bio. After this change, pending_bios is really more of a reference count, but I'll leave that cleanup separate to keep the fix small. Fixes: e65e153 ("btrfs: fix panic caused by direct IO") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
block: add bio_for_each_bvec_all()
An upcoming Btrfs fix needs to know the original size of a non-cloned bios. Rather than accessing the bvec table directly, let's add a bio_for_each_bvec_all() accessor. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: drop logs when we've aborted a transaction
Dave reported a problem where we were panicing with generic/475 with misc-5.7. This is because we were doing IO after we had stopped all of the worker threads, because we do the log tree cleanup on roots at drop time. Cleaning up the log tree will always need to do reads if we happened to have evicted the blocks from memory. Because of this simply add a helper to btrfs_cleanup_transaction() that will go through and drop all of the log roots. This gets run before we do the close_ctree() work, and thus we are allowed to do any reads that we would need. I ran this through many iterations of generic/475 with constrained memory and I did not see the issue. general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 12359 Comm: umount Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x33/0x1c0 [btrfs] RSP: 0018:ffff9cfb015937d8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8eb5e339ed80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8eb5eb33b770 RDI: ffff8eb5e37a0460 RBP: ffff8eb5eb33b770 R08: 000000000000020c R09: ffffffff9fc09ac0 R10: 0000000000000007 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b R13: ffff9cfb00229040 R14: 0000000000000008 R15: ffff8eb5d3868000 FS: 00007f167ea022c0(0000) GS:ffff8eb5fae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f167e5e0cb1 CR3: 0000000138c18004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_end_bio+0x81/0x130 [btrfs] __split_and_process_bio+0xaf/0x4e0 [dm_mod] ? percpu_counter_add_batch+0xa3/0x120 dm_process_bio+0x98/0x290 [dm_mod] ? generic_make_request+0xfb/0x410 dm_make_request+0x4d/0x120 [dm_mod] ? generic_make_request+0xfb/0x410 generic_make_request+0x12a/0x410 ? submit_bio+0x38/0x160 submit_bio+0x38/0x160 ? percpu_counter_add_batch+0xa3/0x120 btrfs_map_bio+0x289/0x570 [btrfs] ? kmem_cache_alloc+0x24d/0x300 btree_submit_bio_hook+0x79/0xc0 [btrfs] submit_one_bio+0x31/0x50 [btrfs] read_extent_buffer_pages+0x2fe/0x450 [btrfs] btree_read_extent_buffer_pages+0x7e/0x170 [btrfs] walk_down_log_tree+0x343/0x690 [btrfs] ? walk_log_tree+0x3d/0x380 [btrfs] walk_log_tree+0xf7/0x380 [btrfs] ? plist_requeue+0xf0/0xf0 ? delete_node+0x4b/0x230 free_log_tree+0x4c/0x130 [btrfs] ? wait_log_commit+0x140/0x140 [btrfs] btrfs_free_log+0x17/0x30 [btrfs] btrfs_drop_and_free_fs_root+0xb0/0xd0 [btrfs] btrfs_free_fs_roots+0x10c/0x190 [btrfs] ? do_raw_spin_unlock+0x49/0xc0 ? _raw_spin_unlock+0x29/0x40 ? release_extent_buffer+0x121/0x170 [btrfs] close_ctree+0x289/0x2e6 [btrfs] generic_shutdown_super+0x6c/0x110 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x3a/0x70 Reported-by: David Sterba <dsterba@suse.com> Fixes: 8c38938 ("btrfs: move the root freeing stuff into btrfs_put_root") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: discard: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in header file related to Btrfs File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: simplify error handling of clean_pinned_extents()
At clean_pinned_extents(), whether we end up returning success or failure, we pretty much have to do the same things: 1) unlock unused_bg_unpin_mutex 2) decrement reference count on the previous transaction We also call btrfs_dec_block_group_ro() in case of failure, but that is better done in its caller, btrfs_delete_unused_bgs(), since its the caller that calls inc_block_group_ro(), so it should be responsible for the decrement operation, as it is in case any of the other functions it calls fail. So move the call to btrfs_dec_block_group_ro() from clean_pinned_extents() into btrfs_delete_unused_bgs() and unify the error and success return paths for clean_pinned_extents(), reducing duplicated code and making it simpler. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix memory leak of transaction when deleting unused block group
When cleaning pinned extents right before deleting an unused block group, we check if there's still a previous transaction running and if so we increment its reference count before using it for cleaning pinned ranges in its pinned extents iotree. However we ended up never decrementing the reference count after using the transaction, resulting in a memory leak. Fix it by decrementing the reference count. Fixes: fe119a6 ("btrfs: switch to per-transaction pinned extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: remove the redundant parameter level in btrfs_bin_search()
All callers pass the eb::level so we can get read it directly inside the btrfs_bin_search and key_search. This is inspired by the work of Marek in U-boot. CC: Marek Behun <marek.behun@nic.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Commits on Apr 20, 2020
-
btrfs: discard: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in header file related to Btrfs File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: make btrfs_read_disk_super return struct btrfs_disk_super
Instead of returning both the page and the super block structure, make btrfs_read_disk_super just return a pointer to struct btrfs_disk_super. As a result the function signature is simplified. Also, read_cache_page_gfp can never return NULL so check its return value only for IS_ERR. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: use list_for_each_entry_safe in free_reloc_roots
The function always works on a local copy of the reloc root list, which cannot be modified outside of it so using list_for_each_entry is fine. Additionally the macro handles empty lists so drop list_empty checks of callers. No semantic changes. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: don't force read-only after error in drop snapshot
Deleting a subvolume on a full filesystem leads to ENOSPC followed by a forced read-only. This is not a transaction abort and the filesystem is otherwise ok, so the error should be just propagated to the callers. This is caused by unnecessary call to btrfs_handle_fs_error for all errors, except EAGAIN. This does not make sense as the standard transaction abort mechanism is in btrfs_drop_snapshot so all relevant failures are handled. Originally in commit cb1b69f ("Btrfs: forced readonly when btrfs_drop_snapshot() fails") there was no return value at all, so the btrfs_std_error made some sense but once the error handling and propagation has been implemented we don't need it anymore. Signed-off-by: David Sterba <dsterba@suse.com>
kdave committedApr 20, 2020 -
btrfs: remove pointless assertion on reclaim_size counter
The reclaim_size counter of a space_info object is unsigned. So its value can never be negative, it's pointless to have an assertion that checks its value is >= 0, therefore remove it. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: tree-checker: remove duplicate definition of 'inode_item_err'
Remove the duplicate definition of 'inode_item_err' in the file tree-checker.c that got there by accident in c23c77b ("btrfs: tree-checker: Refactor inode key check into seperate function"). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Zheng Wei <wei.zheng@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: force chunk allocation if our global rsv is larger than metadata
Nikolay noticed a bunch of test failures with my global rsv steal patches. At first he thought they were introduced by them, but they've been failing for a while with 64k nodes. The problem is with 64k nodes we have a global reserve that calculates out to 13MiB on a freshly made file system, which only has 8MiB of metadata space. Because of changes I previously made we no longer account for the global reserve in the overcommit logic, which means we correctly allow overcommit to happen even though we are already overcommitted. However in some corner cases, for example btrfs/170, we will allocate the entire file system up with data chunks before we have enough space pressure to allocate a metadata chunk. Then once the fs is full we ENOSPC out because we cannot overcommit and the global reserve is taking up all of the available space. The most ideal way to deal with this is to change our space reservation stuff to take into account the height of the tree's that we're modifying, so that our global reserve calculation does not end up so obscenely large. However that is a huge undertaking. Instead fix this by forcing a chunk allocation if the global reserve is larger than the total metadata space. This gives us essentially the same behavior that happened before, we get a chunk allocated and these tests can pass. This is meant to be a stop-gap measure until we can tackle the "tree height only" project. Fixes: 0096420 ("btrfs: do not account global reserve in can_overcommit") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: run btrfs_try_granting_tickets if a priority ticket fails
With normal tickets we could have a large reservation at the front of the list that is unable to be satisfied, but a smaller ticket later on that can be satisfied. The way we handle this is to run btrfs_try_granting_tickets() in maybe_fail_all_tickets(). However no such protection exists for priority tickets. Fix this by handling it in handle_reserve_ticket(). If we've returned after attempting to flush space in a priority related way, we'll still be on the priority list and need to be removed. We rely on the flushing to free up space and wake the ticket, but if there is not enough space to reclaim _but_ there's enough space in the space_info to handle subsequent reservations then we would have gotten an ENOSPC erroneously. Address this by catching where we are still on the list, meaning we were a priority ticket, and removing ourselves and then running btrfs_try_granting_tickets(). This will handle this particular corner case. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>