Skip to content

Commit 496b5ef

Browse files
jankaragregkh
authored andcommitted
writeback: Avoid excessively long inode switching times
[ Upstream commit 9a6ebbd ] With lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits We need to maintain b_dirty list ordering to keep writeback happy so instead of sorting inode into appropriate place just append it at the end of the list and clobber dirtied_time_when. This may result in inode writeback starting later after cgroup switch however cgroup switches are rare so it shouldn't matter much. Since the cgroup had write access to the inode, there are no practical concerns of the possible DoS issues. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent bd408c3 commit 496b5ef

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

fs/fs-writeback.c

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -446,22 +446,23 @@ static bool inode_do_switch_wbs(struct inode *inode,
446446
* Transfer to @new_wb's IO list if necessary. If the @inode is dirty,
447447
* the specific list @inode was on is ignored and the @inode is put on
448448
* ->b_dirty which is always correct including from ->b_dirty_time.
449-
* The transfer preserves @inode->dirtied_when ordering. If the @inode
450-
* was clean, it means it was on the b_attached list, so move it onto
451-
* the b_attached list of @new_wb.
449+
* If the @inode was clean, it means it was on the b_attached list, so
450+
* move it onto the b_attached list of @new_wb.
452451
*/
453452
if (!list_empty(&inode->i_io_list)) {
454453
inode->i_wb = new_wb;
455454

456455
if (inode->i_state & I_DIRTY_ALL) {
457-
struct inode *pos;
458-
459-
list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
460-
if (time_after_eq(inode->dirtied_when,
461-
pos->dirtied_when))
462-
break;
456+
/*
457+
* We need to keep b_dirty list sorted by
458+
* dirtied_time_when. However properly sorting the
459+
* inode in the list gets too expensive when switching
460+
* many inodes. So just attach inode at the end of the
461+
* dirty list and clobber the dirtied_time_when.
462+
*/
463+
inode->dirtied_time_when = jiffies;
463464
inode_io_list_move_locked(inode, new_wb,
464-
pos->i_io_list.prev);
465+
&new_wb->b_dirty);
465466
} else {
466467
inode_cgwb_move_to_attached(inode, new_wb);
467468
}

0 commit comments

Comments
 (0)