Add BFS to RPi kernel. #444

merged 1 commit into from Apr 7, 2013


None yet

8 participants

graysky2 commented Apr 5, 2013


Con Kolivas' BFS improves desktop interactivity (latency) and feels snappier on my RPi. Feelings are nice but here are some hard data I acquired on my RPi using the stock linux-rasperrypi package and the one patched with BFS v0.425.


The test was compiling a preconfigured htop from source on my RPi. See the following little script that automates the testing:


Here are the data using a one way anova. As you can see, I ran it with -j2 under each kernel. In each case, the BFS patched kernel shows statistical significance at the p=0.05 level.
Link to raw data


Although small, the BFS-patched kernel is statistically significantly faster than the current kernel when using this compilation endpoint. The BFS was never designed for superior performance, just superior latency. I cannot quantitate the latency, but I can quantitate the differences in compile times :p

More Reading

Here is a paper I wrote comparing the BFS on desktop and server hardware if you are interested.

xenoxaos commented Apr 5, 2013

This doesn't change the default scheduler, which is still set to cfq...correct? I don't want to build this in and have it cause a problem for people that don't know what it is.

As long as you need to manually specify the scheduler on the cmdline/bootargs, I would be ok with this.

graysky2 commented Apr 5, 2013

Don't think of it as an I/O scheduler that can be change like BFQ, noop, deadline, etc.. BFS is a CPU scheduler and its patch is hardcoded into the kernel as-is CFS in mainline. There is no need for any special cmdline/bootargs; I built the kernel from the pull request I sent you and booted into it with no modification to anything. I am happy to share the build if you want to try it out:

Using CFS on a multicore desktop is silly if you asked me since it scales to thousands of CPUs... using CFS on a mono core ARM chip is beyond silly :p

@kmihelich kmihelich merged commit d8023ed into archlinuxarm:master Apr 7, 2013
graysky2 commented Apr 7, 2013

Nice... I didn't bump the pkgver in my pull request so you want wanna do that. I actually don't know how you guys handle the formal releases vs what is in git.


why isn't it in the actual kernel source code then?

He really should try and merge it again instead of maintainging a 'better' out of tree patch.

I'd like to see this pull request reconsidered. While "statistical significance" is a nice phrase to throw around, the test used to show this is utter hogwash (who the hell compiles anything of significance on a RPi??). If you want a low latency desktop, use the already mainline functionality of CONFIG_SCHED_AUTOGROUP.


@dave - No, it is not utter hogwash, it is a mathematical fact. In answer to your rhetorical question: people who own the hardware with no other ARM PC are those of us who do compile on it.

Yes, and it's mathematical fact that 0.0001 is greater than 0.00009.

If you want to build for ARM, you use a cross toolchain on a machine that consumes more than 2 watts. Please show a real world example showing noticeable gains on generic workloads -- something actually representative of what you would do with a tiny little ARM board.

And if you can, get ck to resubmit his work to the kernel if it's so much better.

Kurlon commented Apr 10, 2013

I use a GoFlex Net as a desktop and kernel development workstation. Less RAM, USB connected graphics, no floating point, I don't even have a functional Distcc setup right now. I can even stream MST3K while surfing and chatting on #archlinux-arm... you'd be amazed what people do on tiny little ARM boards on a regular basis.


@Dave - Thanks for the tip; I am using a cross toolchain.

Please re-read the first few lines of my pull request. Perhaps CK's interbench might be a better means to assess the differentiation that the BFS gives users on the Pi. I am interested to hear your suggestions.

@KaiSforza - CK did a pretty effective job communicating his views about your suggestion. Source: BFS-faq.

So when you gang together ARM boards and have 4096 CPUs, then there's no reason to use bfs. Why bother using it for 1, or 2 cores? If he is unwilling to rewrite it to support the range of hardware that Linux supports, why use this as a distribution's default kernel package? That seems to make very little sense to me.


@KaiSforza - I would challenge your assertion that RPi users are likely to reach into the full hardware range that Linux supports; likely >99% of Archlinux ARM users have a single board. I think the data I presented shows BFS offers tangibles to the target users of this distro.


I can't say I agree.
On Apr 10, 2013 2:07 PM, "graysky" wrote:

@KaiSforza - I would challenge your
assertion that RPi users are likely to reach into the full hardware range
that Linux supports; likely >99% of Archlinux ARM users have a single
board. I think the data I presented shows BFS offers tangibles to the
target users of this distro.

Reply to this email directly or view it on GitHub

Kurlon commented Apr 10, 2013

KaiSforza: Your test case would only be valid if there was a single image linux kernel available that supported discrete ARM boards/systems and used the kernel scheduler at the macro level. So far the only SSI kernel varient I know of with any active devel is Kerrighed but it's strictly amd64 ISA only at this time. It's too bad, as I'd love to play with it on ARM, a SSI cluster of kirkwoods could be amusing given the decent interconnect (dual GigE / dual SATA) options present compared to the cpu's relative power.

Also note, Kerrighed is a useful kernel feature being developed outside of mainline. Such things do happen.


I'll admit I barely looked at this before merging. So here's the question: is there a significant (preferably, order of magnitude) improvement by using this patch?

If yes, then we can stop here.

If no, then what, exactly is the benefit, in nontechnical terms if at all possible. If you can't get outside of a 10% margin of error, this isn't worth discussing and should be pulled back out. If we're talking in differences of a second on esoteric tasks, same deal.

The graphs graysky posted show that you're saving fractions of a second over a 90s compile. Even being generous, this is only ~1% faster.

Consider that you immediately lose out on cgroup integration with the scheduler (meaning your cpu cgroup does nothing) by using BFS.


That would be along the lines of what I concluded as well. The raw data shows variances of at least +/- .5s in the samples, so it's really not showing anything. Using a better disk over USB will get you better compile times.

I'll take this patch out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment