Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_piece_alert and streaming #7337

Open
SemiAccurate opened this issue Mar 3, 2023 · 36 comments
Open

read_piece_alert and streaming #7337

SemiAccurate opened this issue Mar 3, 2023 · 36 comments
Labels

Comments

@SemiAccurate
Copy link

From the docs, read_piece_alert() will fail when the piece hasn't finished downloading yet. For the streaming use case when you care about the next sequential piece, calling read_piece_alert continuously until it succeeds is inefficient. (You can't tell from have_piece() if the piece has just finished downloading as it also requires it to have been written to disk, which is not good due to the time sensitive nature of streaming.)

Can there be another call, downloaded_piece_alert (or some name like that) that only triggers when the specified piece has finished downloading and you can then get the piece data from its buffer?

Or alternatively there could be a new alert that triggers when the specified piece finishes downloading, and then we'd be able to call read_piece_alert just once, knowing that the piece is downloaded and get the piece data from it.

@arvidn
Copy link
Owner

arvidn commented Mar 5, 2023

you could use set_piece_deadline() and pass in alert_when_available.

@SemiAccurate
Copy link
Author

From the discussions (#6272 #6259 #6273) it seems like set_piece_deadline() fundamentally uses a different mechanism than regular torrenting and has issues with complexity and inefficiency. I'd prefer to avoid it if possible. If there could be an alert that works with regular torrenting, without invoking the set_piece_deadline machinery, that issues when the specified piece finishes downloading (or is already downloaded) then that would be advantageous.

@SemiAccurate
Copy link
Author

@arvidn do you figure such an alert as described can be added?

@SemiAccurate
Copy link
Author

@AllSeeingEyeTolledEweSew from your discussions you have a lot of ideas on streaming. Though for now, do you figure libtorrent should support this simple addition of an alert (which doesn't use set_piece_deadline) to notify when a specified piece has completed downloading?

@master255
Copy link

master255 commented Mar 17, 2023

@SemiAccurate

From the discussions (#6272 #6259 #6273) it seems like set_piece_deadline() fundamentally uses a different mechanism than regular torrenting and has issues with complexity and inefficiency. I'd prefer to avoid it if possible. If there could be an alert that works with regular torrenting, without invoking the set_piece_deadline machinery, that issues when the specified piece finishes downloading (or is already downloaded) then that would be advantageous.

Maybe this method has some disadvantages, but in practice it does not matter. I listen to music from torrents and watch movies and TV shows every day.
There are no problems with streaming. Even streaming (small files) music works!

@AllSeeingEyeTolledEweSew from your discussions you have a lot of ideas on streaming. Though for now, do you figure libtorrent should support this simple addition of an alert (which doesn't use set_piece_deadline) to notify when a specified piece has completed downloading?

In order to be sure that the piece is loaded I use the following (Java) code:

            case PIECE_FINISHED:
                final int pieceIndex = ((PieceFinishedAlert) alert).pieceIndex();
                if (pieceIndex < startPiece || pieceIndex > endPiece)
                    return;
                while (!torrentHandle.havePiece(pieceIndex)) {
                    synchronized (this) {
                        try {
                            wait(10);
                        } catch (InterruptedException e) {
                            return;
                        }
                    }
                }

I try to warn other programmers against making players. Because the best player in the world has already been created. It's called the Media Library. No one will ever be able to make something like it or better. Not even me. Maybe you just need to use this solution and that's it.

@SemiAccurate
Copy link
Author

@master255 your code busy waits to get the torrent piece. If you're not running it in a background thread but in the user interface thread then that is bad practice. Instead of polling my proposal is a simple alert instead to notify when the piece is downloaded and get its data from a buffer.

@master255
Copy link

@SemiAccurate If you don't even understand my code, I don't know how to help you

@arvidn
Copy link
Owner

arvidn commented Mar 26, 2023

You can't tell from have_piece() if the piece has just finished downloading as it also requires it to have been written to disk

Can you share what your experience is that give you that impression?
read_piece() reads the data via libtorrent's disk I/O subsystem, so it shouldn't matter whether the data is in a buffer on its way to the disk or has actually been flushed to disk.

The documentation for piece_finished_alert says:

this alert is posted every time a piece completes downloading and passes the hash check.

What else do you need before you can call read_piece()?

@SemiAccurate
Copy link
Author

You can't tell from have_piece() if the piece has just finished downloading as it also requires it to have been written to disk

Can you share what your experience is that give you that impression? read_piece() reads the data via libtorrent's disk I/O subsystem, so it shouldn't matter whether the data is in a buffer on its way to the disk or has actually been flushed to disk.

I had not been relying on have_piece(), I am calling read_piece() on the next piece in a (async) loop until it succeeds. Which is a problem for performance as it pauses the torrent on failure and goes through multiple async iterations.

The documentation for piece_finished_alert says:

this alert is posted every time a piece completes downloading and passes the hash check.

What else do you need before you can call read_piece()?

Before calling read_piece() I'd need to ensure the piece is on disk, which this alert does not ensure. More fundamentally you cannot specify the piece you want to be alerted on when it's downloaded (which for this use case is always going to be the next piece in line, instead of some random piece downloaded.)

If there can be an analogue to read_piece(), like say get_piece(), that only triggers an alert (say piece_completed_alert) when the piece is downloaded (or has already been d/l'd) then it streamlines things as the torrent doesn't have to be put into a paused state nor does the invocation need to be called repeatedly.

@arvidn
Copy link
Owner

arvidn commented Mar 28, 2023

I had not been relying on have_piece(), I am calling read_piece() on the next piece in a (async) loop until it succeeds. Which is a problem for performance as it pauses the torrent on failure and goes through multiple async iterations.

The torrent isn't supposed to stop when asking for a piece that hasn't been downloaded. That's an oversight. Asking for a piece repeatedly will still be a performance issue though, as you will have several round-trips to the disk thread(s).
I would highly recommend only calling it on pieces that have been downloaded.

Before calling read_piece() I'd need to ensure the piece is on disk, which this alert does not ensure.

You don't. If you intend to read the data back directly via the operating system calls, like open() and read(), you need to make sure that libtorrent has written the data to the OS. Whether the data has hit the physical disk or not doesn't matter. However, there's no notification of when libtorrent passes the data to the OS. If you read the data via libtorrent, with read_piece() it will be read through the same path data is read when requested by peers. i.e. it can read from libtorrent's disk cache.

More fundamentally you cannot specify the piece you want to be alerted on when it's downloaded (which for this use case is always going to be the next piece in line, instead of some random piece downloaded.)

My understanding is that alerts are very efficient, and being notified of every piece that completes is unlikely to be a performance issue.

If there can be an analogue to read_piece(), like say get_piece(), that only triggers an alert (say piece_completed_alert) when the piece is downloaded (or has already been d/l'd) then it streamlines things as the torrent doesn't have to be put into a paused state nor does the invocation need to be called repeatedly.

You definitely want to avoid pausing the torrent (it's a bug that I will fix, but you should avoid calling read_piece() optimistically anyway).

The solution I proposed was to record which pieces have completed via the piece_finished_alert and only call read_piece() once the piece you're interested in has completed. Is there a reason this approach doesn't work or is inferior to the new feature you're proposing?

@arvidn
Copy link
Owner

arvidn commented Mar 28, 2023

#7359

@SemiAccurate
Copy link
Author

I had not been relying on have_piece(), I am calling read_piece() on the next piece in a (async) loop until it succeeds. Which is a problem for performance as it pauses the torrent on failure and goes through multiple async iterations.

The torrent isn't supposed to stop when asking for a piece that hasn't been downloaded. That's an oversight. Asking for a piece repeatedly will still be a performance issue though, as you will have several round-trips to the disk thread(s). I would highly recommend only calling it on pieces that have been downloaded.

Ok.

Before calling read_piece() I'd need to ensure the piece is on disk, which this alert does not ensure.

You don't. If you intend to read the data back directly via the operating system calls, like open() and read(), you need to make sure that libtorrent has written the data to the OS. Whether the data has hit the physical disk or not doesn't matter. However, there's no notification of when libtorrent passes the data to the OS. If you read the data via libtorrent, with read_piece() it will be read through the same path data is read when requested by peers. i.e. it can read from libtorrent's disk cache.

That's good to know!

More fundamentally you cannot specify the piece you want to be alerted on when it's downloaded (which for this use case is always going to be the next piece in line, instead of some random piece downloaded.)

My understanding is that alerts are very efficient, and being notified of every piece that completes is unlikely to be a performance issue.

I'm not contesting the efficiency of this alert, or the alert mechanism in general. But as a matter of expressive power it would be helpful to be alerted when a specific piece you're interested in has been downloaded (or is already d/l'ed).

For example down below you suggest I build a user application data structure to note where the pieces in a torrent have been downloaded but not saved to disk (I imagine libtorrent currently has such a data structure to note where the pieces in a torrent have both been downloaded and saved to disk as part of the have_piece() call.)

Wouldn't it be nice to encompass multiple application use cases to have such an expressive alert? It wouldn't be charting new territory of work for you as you'd be reusing most of the code of piece_finished_alert(). The parameters could be the piece in question, and whether to trigger an alert if the piece has already been downloaded.

If there can be an analogue to read_piece(), like say get_piece(), that only triggers an alert (say piece_completed_alert) when the piece is downloaded (or has already been d/l'd) then it streamlines things as the torrent doesn't have to be put into a paused state nor does the invocation need to be called repeatedly.

You definitely want to avoid pausing the torrent (it's a bug that I will fix, but you should avoid calling read_piece() optimistically anyway).

The solution I proposed was to record which pieces have completed via the piece_finished_alert and only call read_piece() once the piece you're interested in has completed. Is there a reason this approach doesn't work or is inferior to the new feature you're proposing?

Now that we've gone back and forth some more we're reached a better understanding of the use case I have as well as what libtorrent can do, and I think arrived at a more general truth and perhaps synthesis of this understanding.

The API call and alert I was proposing, as well as the user application specific data structure you're suggesting I maintain would be obviated with an expressive mechanism of alert, say piece_completed_alert(), which as described above allows for separation of mechanism between library and application(s).

This way for my use case such an alert can be composed with the existing read_piece() call to accomplish my aims. And because of its generality it could be composed with other libtorrent calls to accomplish other application use cases.

(And perhaps the existing data structure libtorrent maintains as part of the have_piece() call might not be needed .. But that could be needed for other things so maybe not.)

@arvidn
Copy link
Owner

arvidn commented Mar 29, 2023

For example down below you suggest I build a user application data structure to note where the pieces in a torrent have been downloaded but not saved to disk (I imagine libtorrent currently has such a data structure to note where the pieces in a torrent have both been downloaded and saved to disk as part of the have_piece() call.)

yes, libtorrent tracks this. However, libtorrent really appreciates not sharing this data structure with any other threads. There's no locking. For threading performance it makes a lot of sense to duplicate some information across threads, to have efficient access to it (free from mutexes).

Libtorrent does not have a data structure where it tracks "subscriptions" to individual pieces. Which it sounds like you're suggesting would be added.

This way for my use case such an alert can be composed with the existing read_piece() call to accomplish my aims. And because of its generality it could be composed with other libtorrent calls to accomplish other application use cases.

I don't see how it would be any more "general" than the existing alert. The only thing it would do would be to shift some responsibility from your application into libtorrent. Receiving an alert for every piece that finishes is just as general. But perhaps I'm misunderstanding your proposal.

It really sounds like you would like to be able to call read_piece() and if the piece isn't downloaded yet, it waits for it to download instead of failing. That would not require any new alerts.

@SemiAccurate
Copy link
Author

For example down below you suggest I build a user application data structure to note where the pieces in a torrent have been downloaded but not saved to disk (I imagine libtorrent currently has such a data structure to note where the pieces in a torrent have both been downloaded and saved to disk as part of the have_piece() call.)

yes, libtorrent tracks this. However, libtorrent really appreciates not sharing this data structure with any other threads. There's no locking. For threading performance it makes a lot of sense to duplicate some information across threads, to have efficient access to it (free from mutexes).

Libtorrent does not have a data structure where it tracks "subscriptions" to individual pieces. Which it sounds like you're suggesting would be added.

This way for my use case such an alert can be composed with the existing read_piece() call to accomplish my aims. And because of its generality it could be composed with other libtorrent calls to accomplish other application use cases.

I don't see how it would be any more "general" than the existing alert. The only thing it would do would be to shift some responsibility from your application into libtorrent. Receiving an alert for every piece that finishes is just as general. But perhaps I'm misunderstanding your proposal.

It really sounds like you would like to be able to call read_piece() and if the piece isn't downloaded yet, it waits for it to download instead of failing. That would not require any new alerts.

Well my current proposal is in the spirit of the suggestion in the other thread for libtorrent to provide a more flexible API.

For example, from @master25's application code above, he uses a busy wait loop in order to not have to maintain his own application data structure to note which pieces have been downloaded which btw is what you suggested I do ... (he's piggybacking off of libtorrent's internal data structure as part of its have_piece() call) which besides being seemingly poor practice for Java code in busy waiting also is not necessary if he used read_piece() to get the piece data directly from libtorrent, which as you recently noted does not require it to have been saved to disk.

@arvidn
Copy link
Owner

arvidn commented Mar 29, 2023

Well my current proposal is in the spirit of the suggestion in the other thread for libtorrent to provide a more flexible API.

I don't know what you mean by "flexible". I've mostly heard a request for a specific feature, no description of how it would be more flexible.

It's unclear to me, exactly, what your proposal is actually. Again, it sounds to me like you would be able to call read_piece() on pieces that haven't finished downloading yet, and have the piece data be posted once it completes. But you've been talking about some new alert, which presumably would have to be paired with some new call to subscribe to that new alert. Please feel free to clarify.

he uses a busy wait loop in order to not have to maintain his own application data structure to note which pieces have been downloaded which btw is what you suggested I do

I don't recall having seen that code, but that sounds like poor practice by your description. Not a strong argument in favor of being able to subscribe to individual pieces finishing.

@SemiAccurate
Copy link
Author

Well my current proposal is in the spirit of the suggestion in the other thread for libtorrent to provide a more flexible API.

I don't know what you mean by "flexible". I've mostly heard a request for a specific feature, no description of how it would be more flexible.

The original request was for a self-contained feature that does one thing well for the streaming use case: an api call (eg. get_piece) and accompanying alert. After having discussed it back and forth, my current request is for a flexible alert that overall does less but is composable with the existing read_piece api call in order to address a broader array of possible use cases.

It's unclear to me, exactly, what your proposal is actually. Again, it sounds to me like you would be able to call read_piece() on pieces that haven't finished downloading yet, and have the piece data be posted once it completes. But you've been talking about some new alert, which presumably would have to be paired with some new call to subscribe to that new alert. Please feel free to clarify.

I'm not sure if you misspoke but read_piece() as it exists cannot be successfully called on pieces that haven't finished downloading yet, only on those that have. (My original request was for a variant of read_piece() that can be called on pieces that haven't downloaded yet. To wait on them and then trigger an alert.)

Whereas my current request for a new alert, say piece_completed_alert, would I admit require as you mention some tracking of subscriptions to specific pieces. Perhaps this is too much logic and complexity to embed in libtorrent. Hmm thinking about it, doesn't read_piece() as it exists use this exact sort of subscription logic?

he uses a busy wait loop in order to not have to maintain his own application data structure to note which pieces have been downloaded which btw is what you suggested I do

I don't recall having seen that code,

Here is his post from up above:
#7337 (comment)

but that sounds like poor practice by your description. Not a strong argument in favor of being able to subscribe to individual pieces finishing.

Well in order not to have to maintain his own application data structure to note which pieces are downloaded, he "piggybacks" off of libtorrent's internal data structure queryable from have_piece(). But because this structure has stronger guarantees (that the downloaded pieces are further saved to disk) he also ensures this by busy waiting, admittedly not a good way to ensure this.

The point I was making is that users of libtorrent shouldn't have to maintain their own bookkeeping logic for this (or devise poorly coded workarounds) especially when it seems libtorrent already has this built in. In the latest commit for #7359 you seem to reference an internal data structure through user_have_piece() which seems to track downloaded (but not yet saved to disk) pieces. Now if this structure can be queryable by user applications similar to how the data structure for downloaded and saved to disk pieces is queryable by have_piece() then it would be very helpful to us users. It would for example solve his busy wait Java code above; he would just query this structure and then call read_piece()

@arvidn
Copy link
Owner

arvidn commented Mar 30, 2023

The point I was making is that users of libtorrent shouldn't have to maintain their own bookkeeping logic for this (or devise poorly coded workarounds) especially when it seems libtorrent already has this built in.

You don't have to, you can ask libtorrent for whether a piece has been downloaded or not. You might not like the (poor) performance of doing so though.

@arvidn
Copy link
Owner

arvidn commented Mar 30, 2023

I'm not sure if you misspoke but read_piece() as it exists cannot be successfully called on pieces that haven't finished downloading yet, only on those that have.

I did not misspeak. I was describing a potential future, in an attempt to understand what you're asking for.

(My original request was for a variant of read_piece() that can be called on pieces that haven't downloaded yet. To wait on them and then trigger an alert.)

I sounds like the same then. The alert already exists (read_piece_alert), the only user facing change would be to add an overload to read_piece() that takes flags.

@SemiAccurate
Copy link
Author

The point I was making is that users of libtorrent shouldn't have to maintain their own bookkeeping logic for this (or devise poorly coded workarounds) especially when it seems libtorrent already has this built in.

You don't have to, you can ask libtorrent for whether a piece has been downloaded or not. You might not like the (poor) performance of doing so though.

As you said in the related PR it's a blocking call so it's not a good idea. (Though that Java code snippet above calls it so I don't know if he knows it blocks, in addition to his code busy waiting. So it seems to have multiple issues.)

@SemiAccurate
Copy link
Author

I'm not sure if you misspoke but read_piece() as it exists cannot be successfully called on pieces that haven't finished downloading yet, only on those that have.

I did not misspeak. I was describing a potential future, in an attempt to understand what you're asking for.

Ah ok. I misunderstood.

(My original request was for a variant of read_piece() that can be called on pieces that haven't downloaded yet. To wait on them and then trigger an alert.)

I sounds like the same then. The alert already exists (read_piece_alert), the only user facing change would be to add an overload to read_piece() that takes flags.

Yeah that's right. And this call already seems to use a data structure which tracks subscriptions to individual pieces. So it already has the subscription logic for this, just add a flag as you said to issue the read_piece_alert for pieces not yet downloaded. Yes this would work alright!

@arvidn
Copy link
Owner

arvidn commented Mar 30, 2023

And this call already seems to use a data structure which tracks subscriptions to individual pieces.

No, that does not exist

@SemiAccurate
Copy link
Author

And this call already seems to use a data structure which tracks subscriptions to individual pieces.

No, that does not exist

Then what mechanism does read_piece() use to keep track of multiple in-flight read requests to individual pieces?

And could you use this mechanism as is - or maybe improve it some more - to support the addition of an overloaded call with a flag to keep track of read requests to yet to be downloaded pieces?

@SemiAccurate
Copy link
Author

SemiAccurate commented Apr 2, 2023

@arvidn can you please take the time to explain this for me?

How does read_piece() currently keep track of multiple in-flight read requests to individual pieces?

How it works now, can you use this same method as is - or maybe improve it some more - to support as you mentioned the addition of an overloaded call with a flag to keep track of read requests to yet to be downloaded pieces?

Because there must be existing subscription tracking code in libtorrent somewhere to handle things like this. You don't have to write all new code for this. It must be all in there already.

@arvidn
Copy link
Owner

arvidn commented Apr 2, 2023

Because there must be existing subscription tracking code in libtorrent somewhere to handle things like this.

No, there doesn't. You can see here:
https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent_handle.cpp#L745
https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent.cpp#L739-L805
https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent.cpp#L1171-L1205

@SemiAccurate
Copy link
Author

Because there must be existing subscription tracking code in libtorrent somewhere to handle things like this.

No, there doesn't.

Didn't you suggest at one point there is subscription tracking code already there in the set_piece_deadline() implementation? Can't you reuse that in read_piece() for keeping track of read requests to yet to be downloaded pieces?

You can see here:

I don't know C++ :(

https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent_handle.cpp#L745 https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent.cpp#L739-L805 https://github.com/arvidn/libtorrent/blob/RC_2_0/src/torrent.cpp#L1171-L1205

@SemiAccurate
Copy link
Author

@arvidn correct me if I'm wrong but you suggested in the related flexible API thread that the set_piece_deadline() implementation has subscription tracking code.

So can't you reuse that for supporting what you mentioned earlier: an overload to read_piece() with a flag for keeping track of read requests to yet to be downloaded pieces? You wouldn't have to write all new code, because this code is already in libtorrent.

@SemiAccurate
Copy link
Author

@arvidn can you please take the time to explain this for me? .. And if I'm wrong then please correct me.

If I recall what was discussed earlier, you suggested in the related flexible API thread that the set_piece_deadline() implementation has subscription tracking code, did you not?

So then if I understand things correctly, can't you reuse that for supporting what you mentioned earlier: an overload to read_piece() with a flag for keeping track of read requests to yet to be downloaded pieces? You wouldn't have to write all new code, because this code is already in libtorrent right?

@SemiAccurate
Copy link
Author

@arvidn I'm trying to help in any way I can but you make it really hard to work with you to reach a mutual understanding and make progress here. Can you please take the time to explain this for me?

Correct me if I'm wrong but you suggested in the related flexible API thread that the set_piece_deadline() implementation has subscription tracking code.

So can't you reuse that for supporting what you mentioned earlier: an overload to read_piece() with a flag for keeping track of read requests to yet to be downloaded pieces? You wouldn't have to write all new code, because this code is already in libtorrent.

@arvidn
Copy link
Owner

arvidn commented Apr 16, 2023

Correct me if I'm wrong but you suggested in the related flexible API thread that the set_piece_deadline() implementation has subscription tracking code.

That's correct. read_piece() does not.

So can't you reuse that for supporting what you mentioned earlier: an overload to read_piece() with a flag for keeping track of read requests to yet to be downloaded pieces?

Perhaps one could do that. It's not obvious (to me) that there would be any benefits of reusing that existing infrastructure, it might lead to more complexity than necessary.

@SemiAccurate
Copy link
Author

Correct me if I'm wrong but you suggested in the related flexible API thread that the set_piece_deadline() implementation has subscription tracking code.

That's correct. read_piece() does not.

So can't you reuse that for supporting what you mentioned earlier: an overload to read_piece() with a flag for keeping track of read requests to yet to be downloaded pieces?

Perhaps one could do that. It's not obvious (to me) that there would be any benefits of reusing that existing infrastructure, it might lead to more complexity than necessary.

It is odd that you consider the subscription tracking code to be infrastructure. Why so?

It doesn't have to be refactored to be reusable, though that code can still be copied and molded to work as part of read_piece()

Is it that you think it too much work? Or perhaps you don't figure it would provide material benefit to track pieces as part of read_piece() for streaming through regular torrenting? Why is that?

@arvidn
Copy link
Owner

arvidn commented Apr 16, 2023

It is odd that you consider the subscription tracking code to be infrastructure. Why so?

Because that's what infrastructure means.

It doesn't have to be refactored to be reusable, though that code can still be copied and molded to work as part of read_piece()

Patches are welcome!

Is it that you think it too much work? Or perhaps you don't figure it would provide material benefit to track pieces as part of read_piece() for streaming through regular torrenting? Why is that?

It's not high on my personal priority list, and there's quite a lot of work above it. So, in that sense it's too much work per reward. But that doesn't mean it's not at the top of somebody else's list.

I've already told you what is at the top of my list.

@SemiAccurate
Copy link
Author

Hi @anacrolix I'd like your input if you may on this modest proposal I'm advocating for in libtorrent.

I'm proposing the addition of a simple read piece call to asynchronously (through an alert) get the piece data of a yet to be downloaded torrent piece (or if it already downloaded then to just return it.) This is to facilitate streaming.

Note I'm aware of your Go torrent library with its robust streaming abilities. I'm a Python guy so I don't know Go (or C++) but from your library's API I don't think you make torrent files as first class objects, but rather you make a Reader interface that implements the Go I/O library's Reader interface for torrents, delivering a byte oriented stream to callers starting from anywhere in the torrent file. Having just looked at it I have two questions: 1) What is the purpose of having the implementation call a user specified readahead function when you already provide for the user to set the number of readahead bytes? 2) Why is your Reader implementation blocking when the programming world is moving to asynchronous interfaces that return results in an efficient non-blocking manner?

My simple proposal here Matt is nowhere near as full featured as your streaming interface for your library but it is an incremental addition to the libtorrent API that would be useful for streaming. Besides I think it is likely you implement your Reader interface using something akin to this simple proposal, except you request multiple pieces' data (instead of one) and then stitch them together into a buffer that you then return to the caller. This proposal instead is simple and a small incremental addition to libtorrent. Do you think it makes sense what I'm proposing here?

@anacrolix
Copy link

I skimmed the above, but I suspect the issue lies in not providing or wanting to add a synchronous wake up/alert mechanism when a piece is available, whether it was already available, or just became available. You are correct in that anacrolix/torrent does make that possible and guarantee that the read can occur.

I'm not sure it's appropriate to discuss anacrolix/torrent stuff specific here, but assuming it's relevant to your feature request here:

  1. What is the purpose of having the implementation call a user specified readahead function when you already provide for the user to set the number of readahead bytes?

As the Reader read position moves through a stream as reads occur, it might be desirable to dynamically calculate readahead, rather than have to call back in and adjust it and occur overhead in correcting the client's readahead-based calculations after it had already made them with a previous value. For example if you are reading slowly you might reduce readahead, or you might balance it dynamically across the available cache space.

  1. Why is your Reader implementation blocking when the programming world is moving to asynchronous interfaces that return results in an efficient non-blocking manner?

Because people are confused about function colouring and concurrency and think that async/await syntax and callbacks somehow make "asynchronous" code possible when it's an entirely different axis. This is opinionated, but see how Haskell and Go do concurrency vs. other languages. I'm not convinced all languages have the tradeoffs figured out but they're getting there. In short, my implementation in Go does use asynchronous I/O, and it's programmed against using a blocking API. Those are the separate axes.

anacrolix/torrent will wake up blocked Readers when their data is available (which could be immediately), and as many blocks/pieces that are available are stitched together as necessary to satisfy the Reader interface constraints.

I don't know if libtorrent supports evicting pieces like anacrolix/torrent, but if it does, a synchronous API, or an asynchronous one that ensures that data is delivered before it is evicted would be required to match the behaviour.

@SemiAccurate
Copy link
Author

I skimmed the above, but I suspect the issue lies in not providing or wanting to add a synchronous wake up/alert mechanism when a piece is available, whether it was already available, or just became available. You are correct in that anacrolix/torrent does make that possible and guarantee that the read can occur.

All the alerts are asynchronous in libtorrent (not synchronous) The proposal has a read piece call that will notify (with an alert) when the specified piece is downloaded, and provides the piece data in a buffer. That's all.

I'm not sure it's appropriate to discuss anacrolix/torrent stuff specific here, but assuming it's relevant to your feature request here:

  1. What is the purpose of having the implementation call a user specified readahead function when you already provide for the user to set the number of readahead bytes?

As the Reader read position moves through a stream as reads occur, it might be desirable to dynamically calculate readahead, rather than have to call back in and adjust it and occur overhead in correcting the client's readahead-based calculations after it had already made them with a previous value. For example if you are reading slowly you might reduce readahead, or you might balance it dynamically across the available cache space.

You talk about cache space, is this something your torrent library maintains and is configurable by the client? Or is this just a generic client side caching mechanism you're talking about that is not relevant to your library?

  1. Why is your Reader implementation blocking when the programming world is moving to asynchronous interfaces that return results in an efficient non-blocking manner?

Because people are confused about function colouring and concurrency and think that async/await syntax and callbacks somehow make "asynchronous" code possible when it's an entirely different axis. This is opinionated, but see how Haskell and Go do concurrency vs. other languages. I'm not convinced all languages have the tradeoffs figured out but they're getting there. In short, my implementation in Go does use asynchronous I/O, and it's programmed against using a blocking API. Those are the separate axes.

I'm not sure what function colouring is (attaching an "async" keyword?), though I thought Go concurrency and the async/await mechanism pioneered by C# are fundamentally using the same thing under the hood: coroutines (but one is stackfull and the other is stackless.)

Your view is as you admit opinionated, and that these are different axes of consideration .. Do you have a post or article in mind that would help explain this view to me so that I may learn more about it?

anacrolix/torrent will wake up blocked Readers when their data is available (which could be immediately), and as many blocks/pieces that are available are stitched together as necessary to satisfy the Reader interface constraints.

I don't doubt your implementation is efficient, but why is it programmed against using a blocking API, and not an async/nonblocking API?

I don't know if libtorrent supports evicting pieces like anacrolix/torrent, but if it does, a synchronous API, or an asynchronous one that ensures that data is delivered before it is evicted would be required to match the behaviour.

You talk about evicting pieces, I guess your library maintains a cache of some sort that is user visible/configurable? Why?

Why do pieces necessarily need to be in a cache or main memory to be delivered to callers as part of a stream oriented Reader API? If they happen to be already downloaded and on disk or in an OS buffer (and not in the library maintained cache) can't they still be read and then delivered / stitched together with relevant pieces to be delivered to the caller?

@anacrolix
Copy link

This has become offtopic to the libtorrent repo, so I decline to answer your questions here.

@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants