Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: heap-allocated alerts #5154

Closed
AllSeeingEyeTolledEweSew opened this issue Sep 15, 2020 · 11 comments
Closed

Feature request: heap-allocated alerts #5154

AllSeeingEyeTolledEweSew opened this issue Sep 15, 2020 · 11 comments
Labels
stale todo long-term todo item

Comments

@AllSeeingEyeTolledEweSew
Copy link
Contributor

TL;DR

I'd like to request that alerts be allocated on the heap, or have the option to be. This is to support concurrency, simplicity and safety of client code.

I assume the main advantage of the specialized memory allocation is speed of the main libtorrent thread. I'd like to hear if there are any others.

I have a lot of thoughts on why the specialized alert memory management is a disadvantage. Apologies for the wall of text; it's a significant change so I thought it important to be thorough.

Argument summary

  • Currently, responsive apps shouldn't make blocking calls while handling alerts (so the next pop_alerts() can be called ASAP). This can lead to awkward workarounds in code trying to avoid calls to torrent_handle::status() etc.
  • One natural pattern is to have multiple tasks (threads, coroutines, etc) each processing the alert stream. With current memory management, this pattern is difficult to build, and has caveats.
  • The main alternative to the task pattern is an app with limited or no concurrency. This makes simple use cases complex, and limits scalability and flexibility.
  • Apps are incentivized to restrict the alert_mask as much as possible, which should minimize the allocations done in the libtorrent thread
  • It would be nice to eliminate hard crashes via invalid memory access. It's possible to hold references to alert-managed memory in really surprising ways.

I'll try to motivate my request by explaining my use case, and exploring the limits of how I can use libtorrent to implement my use case, and how those limits relate to alert memory management.

My use case

The goal of my project tvaf is to provide access to a library of torrents as HTTP resources, downloading data on the fly. It must be highly responsive and concurrent.

This leads to a pretty complex workflow:

  • When a request comes in, look up the torrent by infohash
  • If it's not found, add it to the session
  • Prioritize different pieces of the torrent, according to all active requests
  • As pieces are downloaded, vend them via HTTP
  • Ensure errors are passed through
  • When all requests are finished, optionally clean up the torrent
  • Meanwhile, still function as a normal torrent client, allowing the user to add/remove torrents at any point

Keeping in mind that new requests may come in or be canceled at any point in the workflow.

Event loop vs iterators

Every libtorrent-based app I can find uses a single-threaded event loop which fires a set of callbacks on user action or alert from libtorrent.

In my experience, single-threaded event loop apps either

  • are tightly constrained in scope and complexity,
  • have some blocking calls or long-running handlers which limit performance, or
  • try to avoid the above by turning all blocking calls into events or callbacks, becoming effectively concurrent but written in a confusing way

Code-wise, the disadvantage of event loops is that they turn control flow into lots of state variables and callbacks. This isn't appropriate for all use cases.

For example, consider that in tvaf I want to gracefully pause a torrent and delete it if it has no data.

With a single-threaded event loop, that looks something like this

def on_unexpected_remove(alert):
    if alert.handle == handle:
        report_error(handle)

def on_paused(alert):
    if alert.handle == handle:
        remove_callback(torrent_removed_alert, on_unexpected_remove)
        if handle.status().total_done == 0:
            def on_removed(alert):
                mark_done_processing(handle)
            register_callback(torrent_removed_alert, on_removed)
            session.remove_torrent(handle)

register_callback(torrent_removed_alert, on_unexpected_remove)
register_callback(torrent_paused_alert, on_paused)

handle.pause(graceful_pause)

It's pathological in Python, due to the lack of anonymous inline functions. Callbacks mean the second half of our code reads first.

Note also the blocking status() call in an alert handler. How do we deal with that?

Consider an iterator (or "channel") pattern:

iterator = iter_alerts()  # receives any alerts posted after it's created
handle.pause(graceful_pause)
for alert in iterator:
    if isinstance(alert, torrent_paused_alert):
        break
    elif isinstance(alert, torrent_removed_alert):
        raise TorrentRemoved()
if handle.status().total_done == 0:
    session.remove_torrent(handle)

This is far easier to understand. In practice, I am finding the complexity of iterator code increases much less quickly than event loop code.

The iterator code is also very friendly to concurrency. The torrent may be "unexpectedly" removed at any point; the code will simply raise an exception which can be handled or ignored. The blocking status() call occurs outside of the alert-handling loop, so it need not block other alert-handling code if we play our cards right.

Implementing iterators

I think the reason most app authors don't use the iterator pattern above is that it's not straightforward to implement iter_alerts(). Normally, it would just return a queue of alerts, and a feeder thread would call pop_alerts() and feed all the queues. But pop_alerts() invalidates existing alerts, so iterator consumers quickly touch invalid memory.

The trick is that the feeder thread must wait until all consumers are blocked waiting for new alerts; then it's safe to call pop_alerts() and feed them all.

To avoid thundering herds, I have iter_alerts() take some arguments to filter the alert stream by type and handle.

I implemented these iterators in a recent commit to tvaf.

Iterator quirks

Due to alert memory management, we still want to avoid blocking calls while handling alerts to maximize responsiveness and concurrency.

This can lead to some weird code. Instead of this:

iterator = iter_alerts()
got_piece(*handle.status().pieces)
for alert in iterator:
    if isinstance(alert, piece_finished_alert):
        got_piece(alert.piece)

I might try to get status data from the alert stream, like this:

iterator = iter_alerts()
handle.save_resume_data()  # marks torrent dirty for state_update_alert
session.post_torrent_updates()  # handle will appear in this update
for alert in iterator:
    if isinstance(alert, piece_finished_alert):
        got_piece(alert.piece)
    elif isinstance(alert, state_update_alert):
        for status in alert.status:
            if status.handle == handle:
                got_piece(*status.pieces)

In Python, there's an additional trick that iterators aren't explicitly signaled when consumer code abandons them due to an exception or break. We can detect when they're garbage collected, but this may not be immediate. I thought it was better to use a context manager like so:

iterator = iter_alerts()
with iterator:
    for alert in iterator:
        ...

Alert references

Consider this:

try:
    for alert in iter_alerts():
        if condition():
            raise MyError()
except Exception as exc:
    last_exception = exc

In the above code, exc holds a stack trace, which refers to the stack frame in which it occurred, which refers to the local variable alert. So the alert gets kept alive as long as we hold the exception.

Or this:

logging.debug("saw alert %s", alert)

In some conditions, I have seen the logger hold a reference to the alert argument after the call completes (potentially for the lifetime of the app).

I think Python is just generally not designed for "specialized" memory management, where it's important to positively release references.

My specific fear is either that some code will end up calling str(alert), or that the python bindings may change such that repr(alert) touches alert memory where it didn't before. I dislike the thought of my app crashing due to conditions that can't be guaranteed against.

@arvidn
Copy link
Owner

arvidn commented Sep 16, 2020

thank you for a well made argument. I felt uneasy introducing this memory allocation restriction as well, although, previously alerts were held by std::auto_ptr (which doesn't work right all the time, so it has been removed from newer versions of C++). In either case, it wouldn't allow shared ownership of alerts (i.e. reference-counted). It did however support cloning alerts. They had a virtual function call to heap allocate a copy.

I'm a bit hesitant to go all the way to individually heap allocate alerts. That can actually be quite inefficient. The argument about clients already are encouraged to limit alerts by alert_mask is true, but I also think it's important to be able to enable debug logging in production without bogging everything down. Handling such alerts would be cheap anyway, since it's just streaming them to a file.

I imagine you've seen this blog post, detailing how the alert queue works now. It uses two separate allocations for the alert objects themselves and the data they carry. This makes it especially tricky to allow cloning of them. Sure, the alert object itself is easy to make a copy of, but the data is still owned by another stack allocator.

In order to support your use case, and more broadly, allowing deferring handling alerts until after another call to pop_alerts() has been made, I can think of a few ways:

  1. Currently the alert queue is double buffered. There's the alert queue you (the client) is currently looking at, and there's the alert queue libtorrent is printing new alerts to. When you call pop_alerts(), the pointers are swapped (so to speak) and you (the client) will see the other buffer. The one you were just looking at is cleared and libtorrent will start printing alerts into it. It could be configurable how many buffers to use. For example, if it was triple buffered, you could always keep iterating over alerts returned one pop_alerts() ago. This would still require some jumping-through-hoops to ensure you never pop_alert() once too many while still looking at some old alerts. So, it might not reduce your complexity that much, but it would enable more concurrency.

  2. There could be a way to "detach" the alert queue (and the associated payload allocator for the data the alerts hold) that was just returned by pop_alert() and make its lifetime tied to some other reference counted object (like, all the iterators iterating over it in your case). This solution would be similar to (1) except that the number of buffers to use would be arbitrary and determined based on the lifetime of the objects holding the queue alive.

  3. Perhaps the alert queue could be altered to allocate both alerts and their payload in the same buffer. This might make it easier or feasible to copying alerts. The resulting object would still have to be something that encompassed the alert itself + the data it points to. But the references to its data could then be relative to its 'this`-pointer.

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Thanks for the well thought out response!

I also think it's important to be able to enable debug logging in production without bogging everything down. Handling such alerts would be cheap anyway, since it's just streaming them to a file.

I agree, this is a crucial case. It's interesting to think though that a python app likely does dozens of allocations when processing each alert, even if it's just logging most of them. So optimizing alert allocations helps throughput in libtorrent threads, but the app may get bogged down regardless.

  1. Currently the alert queue is double buffered. There's the alert queue you (the client) is currently looking at, and there's the alert queue libtorrent is printing new alerts to. When you call pop_alerts(), the pointers are swapped (so to speak) and you (the client) will see the other buffer. The one you were just looking at is cleared and libtorrent will start printing alerts into it. It could be configurable how many buffers to use. For example, if it was triple buffered, you could always keep iterating over alerts returned one pop_alerts() ago. This would still require some jumping-through-hoops to ensure you never pop_alert() once too many while still looking at some old alerts. So, it might not reduce your complexity that much, but it would enable more concurrency.

Nice thought. I think it would actually reduce complexity, since alert handling code could get away with doing blocking calls while handling alerts, and leave it to user configuration (or automagic app code) to increase buffers if alert handling gets too backed up.

  1. There could be a way to "detach" the alert queue (and the associated payload allocator for the data the alerts hold) that was just returned by pop_alert() and make its lifetime tied to some other reference counted object (like, all the iterators iterating over it in your case). This solution would be similar to (1) except that the number of buffers to use would be arbitrary and determined based on the lifetime of the objects holding the queue alive.

This sounds like the best of both worlds. Fast allocation for libtorrent, but we still have (basically) refcounted alerts, so no buffer configuration to manage and "unexpected" references don't crash the app. Seems like an improvement over (1).

Is this just called slab allocation?

  1. Perhaps the alert queue could be altered to allocate both alerts and their payload in the same buffer. This might make it easier or feasible to copying alerts. The resulting object would still have to be something that encompassed the alert itself + the data it points to. But the references to its data could then be relative to its 'this`-pointer.

This also has the upsides of (2), but I guess it's more code maintenance since every alert type needs a copy method.

@stale
Copy link

stale bot commented Dec 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 16, 2020
@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

I'm still very interested in this. Do you foresee having time to prototype your suggestion about "detaching" the alerts allocation slab?

@stale stale bot removed the stale label Dec 16, 2020
@arvidn
Copy link
Owner

arvidn commented Jan 20, 2021

this is still on my todo list, it's not particularly urgent compared to some other things on there though

@stale
Copy link

stale bot commented Apr 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 22, 2021
@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Go away @Stale :)

@stale stale bot closed this as completed May 15, 2021
@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

@arvidn is there any way I can keep this open despite the stale bot?

@stale
Copy link

stale bot commented Aug 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 12, 2022
@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

@arvidn perhaps the stale bot config should ignore the todo label.

Otherwise if you don't think you can attempt the "detachable queue" solution in the foreseeable future, perhaps this should just be closed. I care about this issue a lot but perhaps I'm the only one who cares about it. The work is definitely beyond my skill to do myself.

@stale stale bot removed the stale label Aug 12, 2022
@stale
Copy link

stale bot commented Nov 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 12, 2022
@stale stale bot closed this as completed Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale todo long-term todo item
Projects
None yet
Development

No branches or pull requests

2 participants