Defrag related backports for v5.16 #423

adam900710 · 2022-02-16T07:20:59Z

Definitely break the record I hold for the most amount of fixes for a single commit.

Upstreamed:

b767c2fc787e992daeadfff40d61c05f66c82da0 btrfs: allow defrag to be interruptible
6b34cd8e175bfbf4f3f01b6d19eae18245e1a8cc btrfs: fix too long loop when defragging a 1 byte file
c080b4144b9dd3b7af838a194ffad3204ca15166 btrfs: defrag: properly update range->start for autodefrag
484167da77739a8d0e225008c48e697fd3f781ae btrfs: defrag: fix wrong number of defragged sectors

27cdfde181bcacd226c230b2fd831f6f5b8c215f btrfs: update writeback index when starting defrag
3c9d31c715948aaff0ee6d322a91a2dec07770bf btrfs: add back missing dirty page rate limiting to defrag
0cb5950f3f3b51a4e8657d106f897f2b913e0586 btrfs: fix deadlock when reserving space during defrag

0d1ffa2228cb34f485f8fe927f134b82a0ea62ae btrfs: defrag: don't try to defrag extents which are under writeback
ea0eba69a2a8125229b1b6011644598039bc53aa btrfs: don't hold CPU for too long when defragging a file

In misc-next:

268effdffdbcbbcf7c50554f6dc0fa50de12bc1a btrfs: defrag: use control structure in btrfs_defrag_file()
3eed203daf30ebc480d45d2fdc432e6ac93feeb4 btrfs: defrag: introduce control structure for later use
5e6560cc7d65bb8212e80cd15dbe858e619bd443 btrfs: defrag: remove an ambiguous condition for rejection
9f6af04b22fd70d2c1303b978ce6f60e339d5469 btrfs: defrag: don't defrag extents which are already at max capacity
5d80ef438ec4411738f713332933eba6c8029e74 btrfs: defrag: don't try to merge regular extents with preallocated extents
237524eaad3281dfc3bbd690115e411026351ff2 btrfs: defrag: allow defrag_one_cluster() to skip large extent which is not a target

For the btrfs_defrag_ctrl related patches, manual backport may be needed.

Not yet merged and need review:

btrfs: defrag: bring back the old file extent search behavior
btrfs: defrag: don't use merged extent map for their generation check
btrfs: autodefrag: only scan one inode once
btrfs: close the gap between inode_should_defrag() and autodefrag extent size threshold

No yet mereged, but useful debug trace events

btrfs: add trace events for defrag

The text was updated successfully, but these errors were encountered:

kdave · 2022-02-16T22:45:15Z

For stable we really need something that's minimal in terms of code size and also dependencies, so the cleanups like removing parameters or adding tracepoints are not suitable for stable. Right now the remaining issue is broken autodefrag so that's the what I care about for now.

cc @cmurf - you mentioned that autodefrag is still broken on 5.16.8, do you have links?

cmurf · 2022-02-17T00:40:26Z

It's here but it's not a proper bug report, which I'm trying to extract out of the autodefrag case in that thread.

adam900710 · 2022-02-17T04:32:51Z

@cmurf Is it possible to build a testing kernel for those affected based on my branch?

https://github.com/adam900710/linux/tree/autodefrag_fixes

That branch contains all submitted autodefrag related fixes.

tootea · 2022-03-03T08:40:24Z

Hi, I'm still hitting excessive autodefrag I/O on three separate machines, all currently on 5.16.11-200.fc35.x86_64. How can I best help test these fixes? The autodefrag_fixes branch seems to be based on some 5.17-rc, which doesn't sound like what I would want to be running on my two work machines. (I can give it a try on my home machine if desired.)

In all three cases I have btrfs mounted with noatime,autodefrag,compress=zstd:1. Incidentally, I just migrated one of the three machines from ext4 to Btrfs yesterday (mkfs and rsync the data over, not in-place conversion), so that filesystem was nearly pristine to begin with. Right after logging into KDE I saw a massive stream of writes due to btrfs-cleaner, which seemed to be triggered by the Baloo indexing service. This went on and on at roughly 10 GiB of writes per minute until I gave up and did mount -o remount,noautodefrag /, at which point the writes practically stopped.

My kernel (understandably) doesn't have the defrag tracepoints, so I enabled all btrfs tracepoints and got a log for about 3 seconds of operation (5 MiB gzipped, can share it if needed).

Summary:

# grep -oE "root=.* ino=[0-9]+" /tmp/btrfs-trace.log | sort | uniq -c
  10496 root=256(-) ino=1009441
    742 root=256(-) ino=1009491
     56 root=257(-) ino=1164418
      4 root=257(-) ino=250105
     14 root=257(-) ino=250114
  20983 root=257(-) ino=329832
  29525 root=257(-) ino=329891
     28 root=257(-) ino=407088
      2 root=257(-) ino=407089
     33 root=257(-) ino=407090
     19 root=257(-) ino=407091

The most active inodes are the Baloo index and system journal:

# btrfs inspect inode-resolve 1009441 /
//var/log/journal/df38462702849b30219ad17170aa3c06/system.journal
# btrfs inspect inode-resolve 1009491 /
//var/log/journal/df38462702849b30219ad17170aa3c06/user-395845.journal
# btrfs inspect inode-resolve 329832 /home/
/home//tootea/.local/share/baloo/index
# btrfs inspect inode-resolve 329891 /home/
/home//tootea/.local/share/baloo/email/postlist.glass

The Baloo index is a not-too-big non-sparse database which is written to very randomly:

# du -s -m /home//tootea/.local/share/baloo/index
934     /home//tootea/.local/share/baloo/index
# ll -h /home//tootea/.local/share/baloo/index
-rw-r--r--. 1 tootea lcc 934M Mar  3 08:52 /home//tootea/.local/share/baloo/index

However, it seems to be quite excessively fragmented (due to Baloo and/or autodefrag; as mentioned above this file was freshly written shortly before):

# filefrag /home//tootea/.local/share/baloo/index
/home//tootea/.local/share/baloo/index: 78914 extents found
# filefrag //var/log/journal/df38462702849b30219ad17170aa3c06/system.journal
//var/log/journal/df38462702849b30219ad17170aa3c06/system.journal: 4547 extents found
# compsize /home//tootea/.local/share/baloo/index
Processed 1 file, 72722 regular extents (78936 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       95%      1.5G         1.6G         933M       
none       100%      1.4G         1.4G         895M       
zstd        46%       59M         127M          37M       
# compsize //var/log/journal/df38462702849b30219ad17170aa3c06/system.journal
Processed 1 file, 3302 regular extents (4593 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       51%       41M          79M          46M       
none       100%       21M          21M          12M       
zstd        23%       11M          49M          28M       
prealloc   100%      8.0M         8.0M         5.9M

Shall I try patching the defrag tracepoints into my kernel and send the output? I don't have the time right now to try manually backporting all the fixes to 5.16.

adam900710 · 2022-03-03T09:04:26Z

@tootea

which doesn't sound like what I would want to be running on my two work machines. (I can give it a try on my home machine if desired.)

Then please run that branch on your home machine, and enable the following trace events:

btrfs:defrag_one_locked_range
btrfs:defrag_add_target
btrfs:defrag_file_start
btrfs:defrag_file_end

Although I got the backport merged message from Greg, not sure if they are already in any stable release.

Anyway, with above events, it would be much easier to be sure what's going wrong.

tootea · 2022-03-03T10:14:39Z

@adam900710 Oh, I noticed too late that the autodefrag fixes just came out in 5.16.12. I'm running that now and it seems the write amplification is gone. Sorry for the noise.

adam900710 · 2022-03-03T10:20:54Z

What a relief.

Feel free to report back any problem observed.

I'm going to still keep alerted just in case something doesn't go as planned.

kdave · 2022-03-04T18:29:12Z

All fixes sent to stable, should be released in 5.16.13 .

tootea · 2022-03-07T14:02:19Z

@adam900710 Looks like I spoke too soon again. I still see quite some write amplification due to autodefrag on 5.16.12 with all the latest commits from your autodefrag_fixes branch on top ( https://github.com/tootea/linux/commits/btrfs_autodefrag-5.16 ).

I have a log of the defrag trace events (6.3 MiB compressed). It's over a million of defrag_add_target calls on one file over the course of a few minutes, which seems to correspond to autodefrag rewriting this entire 500 MiB file more than 100 times over.

Shall I open a new issue here or take it to the mailing list?

adam900710 · 2022-03-07T22:21:56Z

Mind to upload the trace events?

Feel free to use whatever method you like (github issues or mail list).

tootea · 2022-03-08T14:48:34Z

@adam900710 Sure, here you go: btrfs-autodefrag-trace.log.gz

The most active inode (root=259 ino=410492) with 1.1M events is ~/.local/share/baloo/email/postlist.glass (this is from a different machine than the one in my first message here). It's a 500-MB database which filefrag reports to have over 40k extents (that went up by 5k in one day, with autodefrag enabled). Grepping all the target_start values from the log and plotting them it looks like the file is rewritten from beginning to end more than 100 times over, which I consider definitely suspect.

Let me know if you need more logs or anything else.

adam900710 · 2022-03-08T23:06:54Z

I think I got some clue.

If we focus on defrag_file_start events on that inode 410492, and only focus on start=0, which means it's a new defrag on that inode, it shows very unexpected behavior:

btrfs-cleaner-822     [000] .....  5093.162849: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421337 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [001] .....  5096.558521: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421338 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [002] .....  5098.222259: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421338 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [003] .....  5098.544354: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421339 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [003] .....  5099.551326: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421339 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [003] .....  5101.066516: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421340 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [001] .....  5101.410834: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421340 flags=0x0 compress=0 max_sectors_to_defrag=1024
   btrfs-cleaner-822     [000] .....  5102.177464: defrag_file_start: 6a206506-2479-4dc9-ab37-0bac2287d6bc: root=259 ino=410492 start=0 len=599138304 extent_thresh=65536 newer_than=421340 flags=0x0 compress=0 max_sectors_to_defrag=1024

In theory, we should only got cleaner triggered for one inode per commit interval (300s by default).

But it get triggered again and again for the same inode in a very quick session, definitely not 300s.

So it means, if we got some write during autodefrag, it will choose the same inode again and again.
This behavior itself is not touched in the refactor, so it looks like it's a long existing bug, but the defrag refactor exposes it.

For now, my plan to fix it is to move the whole rb tree to a local tree instead, so we won't trigger autodefrag for an inode again during the same cleaner session.

Will provide the fix to test very soon.

Thank you very much for such detailed report! It really shows me a completely different pattern to cause problems.

adam900710 · 2022-03-08T23:32:14Z

@tootea Mind to test this small patch?

It passes my local tests using fstests' defrag group.

But it's a little tricky as we're swapping pointers directly, so not that confident.

0001-btrfs-don-t-let-new-writes-to-trigger-autodefrag-on-.patch.txt

tootea · 2022-03-10T09:40:18Z

@adam900710 Running it now, it seems to help quite a bit, btrfs-cleaner writes are down from gigabytes per minute to 20 GiB in an hour on one of my machines. That might still be a little bit excessive, or perhaps it's entirely normal, I don't know.

A log from that hour of active system usage with your patch is here (github refuses to accept it, perhaps it's too big): https://is.muni.cz/de/ttrnka/btrfs-autodefrag-2.log.xz

The most active file (root=257 ino=329891) is again ~/.local/share/baloo/email/postlist.glass with a ton of extents:

# filefrag .local/share/baloo/email/postlist.glass
.local/share/baloo/email/postlist.glass: 51012 extents found
# compsize .local/share/baloo/email/postlist.glass
Processed 1 file, 28634 regular extents (51044 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL      100%      1.1G         1.1G         719M       
none       100%      1.1G         1.1G         719M       
# ll -h .local/share/baloo/email/postlist.glass
-rw-r--r--. 1 tootea lcc 720M Mar 10 10:19 .local/share/baloo/email/postlist.glass

No idea how come it has that many extents. I don't think it is that sparse actually given that the size from ls pretty much exactly matches the "Referenced" in compsize (and also what du says). I guess it's just heavily fragmented and autodefrag can't seem to get that fixed.

adam900710 · 2022-03-10T10:10:52Z

@tootea Thank you very much for verifying the behavior!

Now it's mostly expected behavior for autodefrag: (UUID/PID/CPU omitted)

 486.810041: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3855 flags=0x0 compress=0 max_sectors_to_defrag=1024
 506.407089: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3855 flags=0x0 compress=0 max_sectors_to_defrag=1024
 536.463320: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3856 flags=0x0 compress=0 max_sectors_to_defrag=1024
 539.721309: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3857 flags=0x0 compress=0 max_sectors_to_defrag=1024
 569.741417: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3858 flags=0x0 compress=0 max_sectors_to_defrag=1024
 594.832649: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3859 flags=0x0 compress=0 max_sectors_to_defrag=1024
 624.258214: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3860 flags=0x0 compress=0 max_sectors_to_defrag=1024
 654.856264: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3861 flags=0x0 compress=0 max_sectors_to_defrag=1024
 684.943029: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3862 flags=0x0 compress=0 max_sectors_to_defrag=1024
 715.288662: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3865 flags=0x0 compress=0 max_sectors_to_defrag=1024
 745.530203: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3866 flags=0x0 compress=0 max_sectors_to_defrag=1024
 775.562500: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3867 flags=0x0 compress=0 max_sectors_to_defrag=1024
 779.697262: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3868 flags=0x0 compress=0 max_sectors_to_defrag=1024
 809.869418: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3869 flags=0x0 compress=0 max_sectors_to_defrag=1024
 827.273520: defrag_file_start: root=257 ino=329891 start=0 len=754728960 extent_thresh=65536 newer_than=3870 flags=0x0 compress=0 max_sectors_to_defrag=1024
...

Note the timestamp, for most part it's in 30s interval.

But there are still some out liners, like 536.463320 -> 539.721309, it's only 3s, not the expected 30s.

Thus I guess although most of the behavior is solved, there are still some corner cases not addressed, namingly why autodefrag is triggered frequently than expected.

For the fragmented problem, now autodefrag only targets 64K as extent threshold, so it's not as hard as manual defrag ioctl (target 256M).

We will add support for users to specify the autodefrag target extent size in the future.

But please allow me to address the remaining problem first.

adam900710 · 2022-03-12T00:46:00Z

@tootea Mind to apply this patch upon the existing patches?
https://patchwork.kernel.org/project/linux-btrfs/patch/d1ce90f37777987732b8ccf0edbfc961cd5c8873.1646912061.git.wqu@suse.com/

This would further reduce the IO/CPU usage.

tootea · 2022-03-14T15:51:37Z

@adam900710 Thanks a lot. Indeed, with this patch on top the IO is yet lower. There don't seem to be any new defrag starts in less than 30s since the previous pass.

Defrag trace with this patch is here: https://is.muni.cz/de/ttrnka/btrfs-autodefrag-5.16.13-200.2-1.log.xz
(Yet another machine, the most interesting file is root=257 ino=262402, again the Baloo email index database with 71k extents.)

I'm still somewhat concerned by autodefrag rewriting this file over and over again, generating nearly 100 GiB of writes in a day. Perhaps it's always been like that and I simply never paid attention. I've only skimmed the defrag code so I can't really claim I understand the logic, but it seems to me that most of the time defrag is really only rewriting things exactly as fragmented as they were before, and then does it again the next time around, and so on. Shouldn't it be able to avoid this somehow?

Opening the log via grep 'root=257 ino=262402' btrfs-autodefrag-5.16.13-200.2-1.log | less and then searching for defrag_file_start.*root=257 ino=262402 start=0, it looks like the pattern of the starts and offsets is exactly the same every time defrag runs. I can just keep hitting n and almost nothing seems to change, apart from the timestamps.

Is this really the way it's meant to work?

adam900710 · 2022-03-15T01:06:36Z

@tootea There are indeed some weird behaviors.

One of the most obvious one is defragging single sectors.

defrag_add_target() only get called when:

The range meets the basic condition
The range can be merged with next range

But there is a pitfall, the mergeable check doesn't mean the next range is also a defrag target, especially for autodefrag which has generation check.

So here we have weird single sector defragging being added.

Mind to apply this patch?
0001-btrfs-avoid-defragging-extents-whose-next-extents-ar.patch.txt

This should remove all the pure rewrite of single range.

[BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergebility against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in next iteartion, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) Cc: stable@vger.kernel.org # 5.16+ Signed-off-by: Qu Wenruo <wqu@suse.com>

tootea · 2022-03-16T07:51:12Z

@adam900710 Wow, now that's the kind of improvement I was looking for. Yeah, this was the missing piece. With this patch, autodefrag I/O is down to under a megabyte per minute, 2-3 orders of magnitude less than without the patch. The trace log now shows roughly one defrag_one_locked_range per defrag_file_start.*start=0. That ratio used to be around 100:1.

Thanks a whole lot for fixing this!

adam900710 · 2022-03-16T09:26:29Z

@tootea Great to hear that!
Although it's me screwing up in the first place.

Your detailed debugging really helped a lot!

Let's wait for the proper fixes get upstreamed.

[BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

ghost · 2022-03-17T19:55:35Z

Mind to apply this patch? 0001-btrfs-avoid-defragging-extents-whose-next-extents-ar.patch.txt

I would like to confirm that the patch significantly reduces I/O and CPU usage of btrfs-cleaner.

I would like to report that the patch doesn't help to reduce fragmentation of high-fragmentation files (such as: a 40GB sqlite file).

adam900710 · 2022-03-17T22:56:28Z

@atomsymbol As mentioned in the mailing list, the current design of autodefrag (from the very beginning) is not that good for DB workload.

Also mentioned by Zygo, I believe a userspace based defrag solution would provide a better way to handle it.
(Instead of defragging only newer writes, we can defrag a range of transaction, or even learn where the new writes are possibly be)

[BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs/linux#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs/linux#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs/linux#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs/linux#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: https://bugs.launchpad.net/bugs/1968986 commit 75a36a7 upstream. [BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging: btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096 [CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B). If mergeable, we will add range A into target for defrag. However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent. But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than. And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B. [FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by: 1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent() This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent. While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets(). Link: btrfs/linux#423 (comment) CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

adam900710 added the backlog Stale or backlog label Feb 16, 2022

kdave added stable and removed backlog Stale or backlog labels Feb 16, 2022

kdave closed this as completed Mar 4, 2022

adam900710 reopened this Mar 7, 2022

kdave closed this as completed Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defrag related backports for v5.16 #423

Defrag related backports for v5.16 #423

adam900710 commented Feb 16, 2022 •

edited

Loading

kdave commented Feb 16, 2022

cmurf commented Feb 17, 2022

adam900710 commented Feb 17, 2022

tootea commented Mar 3, 2022

adam900710 commented Mar 3, 2022

tootea commented Mar 3, 2022

adam900710 commented Mar 3, 2022

kdave commented Mar 4, 2022

tootea commented Mar 7, 2022

adam900710 commented Mar 7, 2022

tootea commented Mar 8, 2022

adam900710 commented Mar 8, 2022

adam900710 commented Mar 8, 2022

tootea commented Mar 10, 2022 •

edited

Loading

adam900710 commented Mar 10, 2022

adam900710 commented Mar 12, 2022

tootea commented Mar 14, 2022

adam900710 commented Mar 15, 2022 •

edited

Loading

tootea commented Mar 16, 2022

adam900710 commented Mar 16, 2022

ghost commented Mar 17, 2022

adam900710 commented Mar 17, 2022

Defrag related backports for v5.16 #423

Defrag related backports for v5.16 #423

Comments

adam900710 commented Feb 16, 2022 • edited Loading

kdave commented Feb 16, 2022

cmurf commented Feb 17, 2022

adam900710 commented Feb 17, 2022

tootea commented Mar 3, 2022

adam900710 commented Mar 3, 2022

tootea commented Mar 3, 2022

adam900710 commented Mar 3, 2022

kdave commented Mar 4, 2022

tootea commented Mar 7, 2022

adam900710 commented Mar 7, 2022

tootea commented Mar 8, 2022

adam900710 commented Mar 8, 2022

adam900710 commented Mar 8, 2022

tootea commented Mar 10, 2022 • edited Loading

adam900710 commented Mar 10, 2022

adam900710 commented Mar 12, 2022

tootea commented Mar 14, 2022

adam900710 commented Mar 15, 2022 • edited Loading

tootea commented Mar 16, 2022

adam900710 commented Mar 16, 2022

ghost commented Mar 17, 2022

adam900710 commented Mar 17, 2022

adam900710 commented Feb 16, 2022 •

edited

Loading

tootea commented Mar 10, 2022 •

edited

Loading

adam900710 commented Mar 15, 2022 •

edited

Loading