New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Enable throttled background buffered writeback (CONFIG_BLK_WBT) #2970

Merged
merged 2 commits into from Feb 9, 2018

Conversation

Projects
None yet
4 participants
@ezequielgarcia
Contributor

ezequielgarcia commented Dec 27, 2017

From the commit log:

This feature prevents the background writeback thread from stalling I/O,
e.g. reads.

For instance, without this option enabled, it's possible to stall reads
setting a writeback dirty threshold (/proc/sys/vm/dirty_writeback_centisecs)
sufficiently long.

Quoting the author of the feature:

[..] If you are sick of Linux bogging down when buffered
writes are happening, then this is for you, laptop or server. The
patchset is fully stable, I have not observed problems. It passes full
xfstest runs, and a variety of benchmarks as well. It works equally well
on blk-mq/scsi-mq, and "classic" setups.

This feature adds a simple blk-wb code that keeps limits how much buffered
writeback we keep in flight on the device end. [..] it should be pretty
much auto-tuning.

See the patchset for the gory details [1]. Given this feature is not turned
on by default, distributions will start doing so [2].

While here, I bite the bullet and tried to consolidate some kernel options (as per coreos/bugs#2137).

[1] https://lwn.net/Articles/704739/
[2] https://forum.manjaro.org/t/please-enable-writeback-throttling-by-default-linux-4-10/18135/3

@dm0-

Can you update the ebuild revisions as in 0034f61?

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Dec 27, 2017

@dm0- Thanks for the review. How about that? (Wasn't sure which commit to put the -r1 rename).

@dm0-

dm0- approved these changes Dec 27, 2017

It looks okay. With holiday time off, it may take another week before this gets better testing, though. @glevand and @bgilbert might want to verify this change.

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Dec 28, 2017

@dm0- That is fine. No rush at all.

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 4, 2018

@bgilbert Gentle ping :-). We need to rebase this on top of master, but it would be good to know if there are more requested changes.

@bgilbert

Apologies for the delay. There have been one or two things going on. 😄

Thanks for the PR and cleanup. When adding config options, please add them in the order make *config would write them out. We're not perfect about that now, but it'd be great not to make things worse.

SCHEDSTATS and DEBUG_FS are both implicitly enabled on amd64, so this doesn't change anything there (but is harmless).

CONFIG_PCI_MSI=y
CONFIG_PCI_IOV=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KEXEC=y

This comment has been minimized.

@bgilbert

bgilbert Jan 11, 2018

Member

This enables kexec on ARM. @glevand, do we want to do that?

This comment has been minimized.

@ezequielgarcia

ezequielgarcia Jan 11, 2018

Contributor

I don't run CoreOS on ARM boxes, but again: it seems to me that if it's wanted on x86, it's wanted on any platforms. Maybe I'm missing anything? Or in other words, what is the current use of kexec on x86?

This comment has been minimized.

@glevand

glevand Jan 11, 2018

Contributor

Sure, KEXEC for arm64 is OK.

@@ -931,3 +931,11 @@ CONFIG_MODULE_SIG_KEY="certs/modules.pem"
# CONFIG_XZ_DEC_ARMTHUMB is not set
# CONFIG_XZ_DEC_SPARC is not set
CONFIG_NR_CPUS=512
CONFIG_ACPI=y
CONFIG_HW_RANDOM=y

This comment has been minimized.

@bgilbert

bgilbert Jan 11, 2018

Member

This unmodularizes HW_RANDOM on amd64. Is there a reason to do that?

This comment has been minimized.

@ezequielgarcia

ezequielgarcia Jan 11, 2018

Contributor

Was there any reason to build it as a module on amd64? It seems to me that this gives us consistent behavior across platforms, and saw no harm in making this option builtin.

@@ -46,6 +46,7 @@ CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y

This comment has been minimized.

@bgilbert

bgilbert Jan 11, 2018

Member

Should we be enabling BLK_WBT_SQ as well?

This comment has been minimized.

@ezequielgarcia

ezequielgarcia Jan 11, 2018

Contributor

Well, I do not know. A few notes about BLK_WBT_SQ:

  • It is supposed to enable writeback throttle by default on single-queue devices. However, it did not work for me.
  • Since writeback throttle can be enabled for any device (regardless of BLK_WBT_SQ) via a simple udev rule, instead of debugging why BLK_WBT_SQ did not work, I simply used a udev rule to enable writeback throttle, using ignition like this:
    - path: /etc/udev/rules.d/60-wbt-enable.rules
      filesystem: root
      mode: 0644
      contents:
        inline: |
          ACTION=="add|change", KERNEL=="sd[b-z]*", ATTR{queue/wbt_lat_usec}="75000"
@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 11, 2018

Hi @bgilbert

Apologies for the delay. There have been one or two things going on. 😄

No problem. It's been a rough couple weeks for everyone :-)

Thanks for the PR and cleanup. When adding config options, please add them in the order make *config would write them out. We're not perfect about that now, but it'd be great not to make things worse.

OK got it. Being a "common" config, was not sure how to do that. Let me give a try.

SCHEDSTATS and DEBUG_FS are both implicitly enabled on amd64, so this doesn't change anything there (but is harmless).

Indeed. It still seems a good idea to have them in the common file, no? IMO, the slimer the per-platform configs, the cleaner and better the configs will be.

CONFIG_PCI_MSI=y
CONFIG_PCI_IOV=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KEXEC=y

This comment has been minimized.

@glevand

glevand Jan 11, 2018

Contributor

@ezequielgarcia If possible, could you post a diff of the generated kernel config files with and without this change? That would make it a lot easier to review.

This comment has been minimized.

@ezequielgarcia

ezequielgarcia Jan 11, 2018

Contributor

Good idea.

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 11, 2018

@glevand @bgilbert diff for x86_64:

--- amd64-defconfig-old	2018-01-11 17:58:24.784415388 -0300
+++ amd64-defconfig-new	2018-01-11 17:57:01.467750924 -0300
@@ -46,6 +46,7 @@
 CONFIG_MODULE_SIG=y
 CONFIG_MODULE_SIG_SHA256=y
 CONFIG_BLK_DEV_THROTTLING=y
+CONFIG_BLK_WBT=y
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_BSD_DISKLABEL=y
 CONFIG_MINIX_SUBPARTITION=y
@@ -768,6 +769,7 @@
 CONFIG_IPMI_SSIF=m
 CONFIG_IPMI_WATCHDOG=m
 CONFIG_IPMI_POWEROFF=m
+CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_TIMERIOMEM=m
 CONFIG_HW_RANDOM_VIRTIO=m
 CONFIG_NVRAM=m

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 11, 2018

diff for ARM64:

--- arm64-defconfig-old	2018-01-11 18:03:59.811073197 -0300
+++ arm64-defconfig-new	2018-01-11 18:04:55.171071732 -0300
@@ -49,6 +49,7 @@
 CONFIG_MODULE_SIG=y
 CONFIG_MODULE_SIG_SHA256=y
 CONFIG_BLK_DEV_THROTTLING=y
+CONFIG_BLK_WBT=y
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_BSD_DISKLABEL=y
 CONFIG_MINIX_SUBPARTITION=y
@@ -94,6 +95,7 @@
 CONFIG_ZSMALLOC=m
 CONFIG_SECCOMP=y
 CONFIG_PARAVIRT_TIME_ACCOUNTING=y
+CONFIG_KEXEC=y
 CONFIG_XEN=y
 CONFIG_RANDOMIZE_BASE=y
 CONFIG_ARM64_ACPI_PARKING_PROTOCOL=y
@glevand

This comment has been minimized.

Contributor

glevand commented Jan 12, 2018

Seems OK for arm64.
@ezequielgarcia If you want to test, something like this should work:

./build_packages --board=arm64-usr --nousepkg
./build_image --board=arm64-usr prod
./image_to_vm.sh --from=../build/images/arm64-usr/latest --board=arm64-usr --prod_image --format=qemu_uefi
../build/images/arm64-usr/latest/coreos_production_qemu_uefi.sh
@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 15, 2018

@glevand @bgilbert @dm0- Updated the branch. How does it look?

@dm0-

This comment has been minimized.

Member

dm0- commented Jan 16, 2018

Looks okay to me. New feature merges are currently frozen to prepare for a release this week, so it might not be merged until next week.

@glevand glevand requested review from glevand and removed request for glevand Jan 16, 2018

@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Jan 29, 2018

@glevand @bgilbert Gentle ping.

@glevand

This comment has been minimized.

Contributor

glevand commented Jan 29, 2018

As I said, it seems OK for arm64. Nothing more to add from me.
@bgilbert has a pending change request.

@bgilbert

Looks good. Thanks for your patience.

We're now carrying configs and ebuilds for both 4.14 and 4.15. When rebasing, please remember to bump both.

ezequielgarcia added some commits Dec 16, 2017

sys-kernel: cleanup options
Move options explicitly enabled in arm64 and that x86 enable
by default to the common config file.

Changes on x86:

 * CONFIG_HW_RANDOM was compiled as a module,
   and now it is built-in.

Changes on ARM64:

 * CONFIG_KEXEC is enabled.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
sys-kernel: enable throttled background buffered writeback
This feature prevents the background writeback thread from stalling I/O,
e.g. reads.

For instance, without this option enabled, it's possible to stall reads
setting a writeback dirty threshold (/proc/sys/vm/dirty_writeback_centisecs)
sufficiently long.

Quoting the author of the feature:

"""
[..] If you are sick of Linux bogging down when buffered
writes are happening, then this is for you, laptop or server. The
patchset is fully stable, I have not observed problems. It passes full
xfstest runs, and a variety of benchmarks as well. It works equally well
on blk-mq/scsi-mq, and "classic" setups.
"""

"""
This feature adds a simple blk-wb code that keeps limits how much buffered
writeback we keep in flight on the device end. [..] it should be pretty
much auto-tuning.
"""

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
@ezequielgarcia

This comment has been minimized.

Contributor

ezequielgarcia commented Feb 9, 2018

@bgilbert Done. How do we launch a ci job?

@dm0-

This comment has been minimized.

Member

dm0- commented Feb 9, 2018

It's a manual process that involves writing manifest modifications at the moment. I'll run the tests on it.

@dm0-

This comment has been minimized.

Member

dm0- commented Feb 9, 2018

Tests looked okay. Merging, thanks!

@dm0- dm0- merged commit 1ebfd14 into coreos:master Feb 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment