New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop IO priority to idle while reading blocks for peer requests and startup verification #9245

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
@luke-jr
Member

luke-jr commented Nov 30, 2016

No description provided.

Show outdated Hide outdated src/util.h
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 30, 2016

Member

Concept ACK. Though not very happy to introduce platform-specific voodoo - we only just got rid of thread priority manipulation. But it may be worth the hassle, I don't know.

Can we quantify whether this works or not somehow?

Member

laanwj commented Nov 30, 2016

Concept ACK. Though not very happy to introduce platform-specific voodoo - we only just got rid of thread priority manipulation. But it may be worth the hassle, I don't know.

Can we quantify whether this works or not somehow?

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 30, 2016

Member

This will also delay other processing, in particular block relay-- at least until the handling is made more concurrent-- no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.

I second the need to quantify this-- I could imagine it making for a big usability improvement. ... or not mattering at all. If the former, I want it... if the latter...

Member

gmaxwell commented Nov 30, 2016

This will also delay other processing, in particular block relay-- at least until the handling is made more concurrent-- no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.

I second the need to quantify this-- I could imagine it making for a big usability improvement. ... or not mattering at all. If the former, I want it... if the latter...

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Nov 30, 2016

Member

Whenever I restart my node lately, I find myself eventually manually ioniceing the entire process as it slows down other things monitoring it in iotop. I can't be sure it's sending out old blocks, but I can't imagine what else it'd be spending so much time reading... :/

Added Mac and Windows support for completeness.

Member

luke-jr commented Nov 30, 2016

Whenever I restart my node lately, I find myself eventually manually ioniceing the entire process as it slows down other things monitoring it in iotop. I can't be sure it's sending out old blocks, but I can't imagine what else it'd be spending so much time reading... :/

Added Mac and Windows support for completeness.

@ryanofsky

ryanofsky approved these changes Nov 30, 2016 edited

ACK 6430b92 (after adding missing #includes)

Show outdated Hide outdated src/utilioprio.h
Show outdated Hide outdated src/utilioprio.h
Show outdated Hide outdated src/utilioprio.cpp
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#ifndef BITCOIN_UTIL_IOPRIO_H

This comment has been minimized.

@ryanofsky

ryanofsky Nov 30, 2016

Contributor

Should add #include "config/bitcoin-config.h"

@ryanofsky

ryanofsky Nov 30, 2016

Contributor

Should add #include "config/bitcoin-config.h"

@fanquake

This comment has been minimized.

Show comment
Hide comment
@fanquake

fanquake Dec 1, 2016

Member

Travis failure:

'../../src/'`utilioprio.cpp
In file included from ../../src/utilioprio.cpp:9:0:
../../src/utilioprio.h: In destructor ‘ioprio_idler::~ioprio_idler()’:
../../src/utilioprio.h:42:51: error: ‘LogPrintf’ was not declared in this scope
             LogPrintf("failed to restore ioprio\n");
Member

fanquake commented Dec 1, 2016

Travis failure:

'../../src/'`utilioprio.cpp
In file included from ../../src/utilioprio.cpp:9:0:
../../src/utilioprio.h: In destructor ‘ioprio_idler::~ioprio_idler()’:
../../src/utilioprio.h:42:51: error: ‘LogPrintf’ was not declared in this scope
             LogPrintf("failed to restore ioprio\n");
@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Dec 1, 2016

Member

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I'm going to leave it for a separate PR...

Member

luke-jr commented Dec 1, 2016

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I'm going to leave it for a separate PR...

@rebroad

This comment has been minimized.

Show comment
Hide comment
@rebroad

rebroad Dec 19, 2016

Contributor

I like this (concept ACK) although I wonder what the impact is on the p2p network as a whole if everyone ran this.

Contributor

rebroad commented Dec 19, 2016

I like this (concept ACK) although I wonder what the impact is on the p2p network as a whole if everyone ran this.

@martinschwarz

This comment has been minimized.

Show comment
Hide comment
@martinschwarz

martinschwarz Mar 11, 2017

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer.

There are win32 and win64 builds. Can't this just be enabled on the win64 build only?

martinschwarz commented Mar 11, 2017

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer.

There are win32 and win64 builds. Can't this just be enabled on the win64 build only?

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Mar 13, 2017

Member

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I'm going to leave it for a separate PR...

Isn't Vista the version after Windows XP? As we dropped support for Windows XP in 0.13, it seems that requiring Vista for 0.15 is fine.

There are win32 and win64 builds. Can't this just be enabled on the win64 build only?

Could be done, but it'd be confusing to couple those. The low-end systems running 32-bit versions would probably need this more.

Member

laanwj commented Mar 13, 2017

Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I'm going to leave it for a separate PR...

Isn't Vista the version after Windows XP? As we dropped support for Windows XP in 0.13, it seems that requiring Vista for 0.15 is fine.

There are win32 and win64 builds. Can't this just be enabled on the win64 build only?

Could be done, but it'd be confusing to couple those. The low-end systems running 32-bit versions would probably need this more.

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Aug 21, 2017

Member

Rebased...

Member

luke-jr commented Aug 21, 2017

Rebased...

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Aug 21, 2017

Contributor

Hmm, I dont think this is really the best idea as long as our message processing is still single-threaded. Really we need to refactor stuff so that block reading is async and the network processing can continue for other peers while we're serving blocks for peers in IBD, otherwise we may block receiving a new block longer than required.

Contributor

TheBlueMatt commented Aug 21, 2017

Hmm, I dont think this is really the best idea as long as our message processing is still single-threaded. Really we need to refactor stuff so that block reading is async and the network processing can continue for other peers while we're serving blocks for peers in IBD, otherwise we may block receiving a new block longer than required.

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Aug 21, 2017

Member

That's somewhat independent from this issue. If users need to shut off their node to use their computer, the delay for processing a new block will be even longer.

Member

luke-jr commented Aug 21, 2017

That's somewhat independent from this issue. If users need to shut off their node to use their computer, the delay for processing a new block will be even longer.

@ryanofsky

This comment has been minimized.

Show comment
Hide comment
@ryanofsky

ryanofsky Oct 12, 2017

Contributor

@TheBlueMatt @luke-jr, maybe a compromise would be to make this behavior configurable, and perhaps to default to dropping priority if user is running bitcoin-qt on a desktop.

Contributor

ryanofsky commented Oct 12, 2017

@TheBlueMatt @luke-jr, maybe a compromise would be to make this behavior configurable, and perhaps to default to dropping priority if user is running bitcoin-qt on a desktop.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Nov 10, 2017

Contributor

Another approach which might be simpler would be to have the validation.h-exposed versions of ReadBlockFromDisk drop io priority so that net_processing will use low priority when answering remote-node queries but connecting blocks will not. With 0.15 I/O when doing initial sync is somewhat better, so this may also be less of an issue now unless the user is running with -peerbloomfilters.

Contributor

TheBlueMatt commented Nov 10, 2017

Another approach which might be simpler would be to have the validation.h-exposed versions of ReadBlockFromDisk drop io priority so that net_processing will use low priority when answering remote-node queries but connecting blocks will not. With 0.15 I/O when doing initial sync is somewhat better, so this may also be less of an issue now unless the user is running with -peerbloomfilters.

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Nov 11, 2017

Member

@TheBlueMatt That's exactly what this already does... priority is only dropped when serving peers, not when connecting blocks.

Member

luke-jr commented Nov 11, 2017

@TheBlueMatt That's exactly what this already does... priority is only dropped when serving peers, not when connecting blocks.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Nov 11, 2017

Contributor

@luke-jr I was referring to the possibility of not exposing a priority flag in validation.h's API - that seems a bit overkill IMO, as evidenced by the fact that there are now two ReadBlockFromDisk calls in net_processing which dont get the low-priority flag :p. Though that would also result in RPC ReadBlockFromDisk calls getting de-prioritized.

More importantly, I'm curious how much we need this anymore - it seems most of the complaints about I/O usage were primarily due to 0.13.1 preferential peering...On systems where your I/O is severely limited, I both don't know how much this will help (in my experience Linux' ionice is mostly worthless when it comes to desktop latency) and don't know if its not better to direct people towards maxuploadtarget or peerbloomfilters so as to avoid simply slowing down your peers because your I/O is too slow.

Contributor

TheBlueMatt commented Nov 11, 2017

@luke-jr I was referring to the possibility of not exposing a priority flag in validation.h's API - that seems a bit overkill IMO, as evidenced by the fact that there are now two ReadBlockFromDisk calls in net_processing which dont get the low-priority flag :p. Though that would also result in RPC ReadBlockFromDisk calls getting de-prioritized.

More importantly, I'm curious how much we need this anymore - it seems most of the complaints about I/O usage were primarily due to 0.13.1 preferential peering...On systems where your I/O is severely limited, I both don't know how much this will help (in my experience Linux' ionice is mostly worthless when it comes to desktop latency) and don't know if its not better to direct people towards maxuploadtarget or peerbloomfilters so as to avoid simply slowing down your peers because your I/O is too slow.

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Nov 11, 2017

Member

Before writing this, I generally ionice'd the entire bitcoind process to maintain system usability.

Member

luke-jr commented Nov 11, 2017

Before writing this, I generally ionice'd the entire bitcoind process to maintain system usability.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Nov 16, 2017

Contributor

Concept ACK. You need to mark the other ReadBlockFromDisks in net_processing low-priority as well.

Contributor

TheBlueMatt commented Nov 16, 2017

Concept ACK. You need to mark the other ReadBlockFromDisks in net_processing low-priority as well.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Mar 6, 2018

Member

Concept ACK, but needs rebase.

Member

sipa commented Mar 6, 2018

Concept ACK, but needs rebase.

@luke-jr luke-jr changed the title from Drop IO priority to idle while reading blocks for getblock requests to Drop IO priority to idle while reading blocks for peer requests and startup verification Mar 6, 2018

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Mar 6, 2018

Member

Rebased and added the additional deprioritisations requested by @TheBlueMatt

Member

luke-jr commented Mar 6, 2018

Rebased and added the additional deprioritisations requested by @TheBlueMatt

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Mar 6, 2018

Member

utACK 91ccbbb

Member

laanwj commented Mar 6, 2018

utACK 91ccbbb

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Mar 18, 2018

Member

Needs rebase after LookupBlockIndex-"rename"

Member

MarcoFalke commented Mar 18, 2018

Needs rebase after LookupBlockIndex-"rename"

@eklitzke

This comment has been minimized.

Show comment
Hide comment
@eklitzke

eklitzke Mar 20, 2018

Member

I considered making this change in #12618 and there's a comment about it there. This is much more dangerous than a CPU scheduler change (like the one I just linked for two reasons).

On Linux the I/O scheduling stuff is very primitive, and IOPRIO_CLASS_IDLE is quite a strong policy. From the man page:

       IOPRIO_CLASS_IDLE (3)
              This is the idle scheduling class.  Processes running at this
              level get I/O time only when no one else needs the disk.  The
              idle class has no class data.  Attention is required when
              assigning this priority class to a process, since it may
              become starved if higher priority processes are constantly
              accessing the disk.

This means that at the idle I/O processing level you can get starved forever if anything else at all is using the disk. This is quite different from something like SCHED_BATCH, which just deprioritizes you a little bit. The CPU scheduler actually has a SCHED_IDLE that works like IOPRIO_CLASS_IDLE, but it's dangerous for the same reason that IOPRIO_CLASS_IDLE is dangerous, so I didn't use it.

The other thing that I'm fairly certain of (but could be wrong about) is that I believe the kernel doesn't really take into account multi-queue devices when considering idleness of a block device. There's a lot of discussion about this online if you look at people talking about the %util field in iostat output, e.g. here. I believe for this reason IOPRIO_CLASS_IDLE could starve you out from accessing a disk when it actually does have idle capacity remaining.

I think we should test this change more (or get a better understanding of the Linux I/O scheduler before proceeding with this change). The crude and easy-to-get wrong policy is part of the reason that I think glibc doesn't expose this system call in the first place. Not sure that I can think of an actual attack off-hand, but this is a DOS vector if you create N connections to a host, ask for blocks from all N connections, and then have some other mechanism to cause them to use up their disk I/O.

Member

eklitzke commented Mar 20, 2018

I considered making this change in #12618 and there's a comment about it there. This is much more dangerous than a CPU scheduler change (like the one I just linked for two reasons).

On Linux the I/O scheduling stuff is very primitive, and IOPRIO_CLASS_IDLE is quite a strong policy. From the man page:

       IOPRIO_CLASS_IDLE (3)
              This is the idle scheduling class.  Processes running at this
              level get I/O time only when no one else needs the disk.  The
              idle class has no class data.  Attention is required when
              assigning this priority class to a process, since it may
              become starved if higher priority processes are constantly
              accessing the disk.

This means that at the idle I/O processing level you can get starved forever if anything else at all is using the disk. This is quite different from something like SCHED_BATCH, which just deprioritizes you a little bit. The CPU scheduler actually has a SCHED_IDLE that works like IOPRIO_CLASS_IDLE, but it's dangerous for the same reason that IOPRIO_CLASS_IDLE is dangerous, so I didn't use it.

The other thing that I'm fairly certain of (but could be wrong about) is that I believe the kernel doesn't really take into account multi-queue devices when considering idleness of a block device. There's a lot of discussion about this online if you look at people talking about the %util field in iostat output, e.g. here. I believe for this reason IOPRIO_CLASS_IDLE could starve you out from accessing a disk when it actually does have idle capacity remaining.

I think we should test this change more (or get a better understanding of the Linux I/O scheduler before proceeding with this change). The crude and easy-to-get wrong policy is part of the reason that I think glibc doesn't expose this system call in the first place. Not sure that I can think of an actual attack off-hand, but this is a DOS vector if you create N connections to a host, ask for blocks from all N connections, and then have some other mechanism to cause them to use up their disk I/O.

@eklitzke

Concept ACK is this is enabled via a flag, but this seems dangerous to enable by default. See the other comment I just left.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 10, 2018

Member

This has been open since 2016 and it's still uncertain whether it's a good idea to merge. Closing,f or now.

Member

laanwj commented Sep 10, 2018

This has been open since 2016 and it's still uncertain whether it's a good idea to merge. Closing,f or now.

@laanwj laanwj closed this Sep 10, 2018

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Sep 10, 2018

Member

There are plenty of ACKs here, plenty of testing, and only speculation on why there could (in very improbable circumstances) be a problem.

Member

luke-jr commented Sep 10, 2018

There are plenty of ACKs here, plenty of testing, and only speculation on why there could (in very improbable circumstances) be a problem.

@laanwj laanwj reopened this Sep 11, 2018

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 11, 2018

Member

reopened on request

Member

laanwj commented Sep 11, 2018

reopened on request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment