Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next Announce after error is too long #4873

Closed
Jerrk opened this issue Jul 11, 2020 · 21 comments
Closed

Next Announce after error is too long #4873

Jerrk opened this issue Jul 11, 2020 · 21 comments
Labels

Comments

@Jerrk
Copy link

Jerrk commented Jul 11, 2020

libtorrent version (or branch): 1.2.7.0

platform/architecture: Unraid 6.8.3, Docker 19.03.5, Deluge 2.0.4.dev38

compiler and compiler version:

not sure if this is a libtorrent or a deluge specific issue. if it is the latter, sorry for wasting your time

I'm running 5000~ torrents in deluge and a lot of torrents are getting the "error: Connection timed out" tracker status while seeding after complete, and then setting the "next announce" time to something like 20 hours (i haven't seen this happening live so im not sure the cause or the actual announce time that gets set).
This is giving me a lot of HnRs on private trackers and i was wondering if it is possible to change this max re-announce time after an error to maybe something more conservative like 4-5 hours? Or maybe there is some other setting i can change to fix this issue.

Maybe changing max_concurrent_http_announce fixes my issue? not sure what to set it to though

I've been scrolling through the ltConfig plugin within deluge but i haven't seen any obvious settings that would affect the announce time.

@ghost
Copy link

ghost commented Jul 12, 2020

It’s due to tracker_backoff timer.
You can read more here
https://www.libtorrent.org/reference-Settings.html

There should be a max limit to this I guess. Like 30 mins or an hour. It doesn’t make sense to wait 20 hrs to contact a failed tracker... @arvidn

@arvidn
Copy link
Owner

arvidn commented Jul 12, 2020

yes, there should be an upper limit on how long to wait. Exponential back-off is a really important feature when trackers go down because they are overwhelmed by traffic. Everyone needs to slow down in such situation. If a tracker has a normal announce interval of an hour, the max back-off better be much longer than an hour. Perhaps 20 is a bit much though, but probably not that much to much.

However, if trackers go down because they are buggy or run on unreliable machines or connections, the back-off is just making it take longer to announce.

From the client's point of view, there may not be a way to tell the difference.

I think it would make sense to add s setting max_tracker_backoff and default it to 5 hours.

@Jerrk when this happens, do you have access to the fail counter of the tracker? i.e. the number of times it's failed in a row.

@ghost
Copy link

ghost commented Jul 12, 2020

In my opinion this doesn’t help.

If a tracker is not down but your connection is, does it make sense to delay the announce even more?

How do you determine that it’s the tracker which is down and not your connection?

In this era of Gbit connected servers, can a tracker really be overwhelmed by traffic up to Gbit with legitimate traffic?
If it’s hit by DDoS then the operator should be hosting somewhere else with protection.

All these would make sense 10 years ago when servers were merely of few Mbps capacity with weak hardware.

Also other clients like utorrent doesn’t do backoffs. It keeps retrying on regular intervals. So just one type of client backing off doesn’t help the tracker either. It’d require the whole swarm to be using libtorrent based clients.

@Jerrk
Copy link
Author

Jerrk commented Jul 12, 2020

It’s due to tracker_backoff timer.
You can read more here
https://www.libtorrent.org/reference-Settings.html

There should be a max limit to this I guess. Like 30 mins or an hour. It doesn’t make sense to wait 20 hrs to contact a failed tracker... @arvidn

I see, i will try to lower it to 200 to start of with and monitor its effects.

yes, there should be an upper limit on how long to wait. Exponential back-off is a really important feature when trackers go down because they are overwhelmed by traffic. Everyone needs to slow down in such situation. If a tracker has a normal announce interval of an hour, the max back-off better be much longer than an hour. Perhaps 20 is a bit much though, but probably not that much to much.

However, if trackers go down because they are buggy or run on unreliable machines or connections, the back-off is just making it take longer to announce.

From the client's point of view, there may not be a way to tell the difference.

I think it would make sense to add s setting max_tracker_backoff and default it to 5 hours.

@Jerrk when this happens, do you have access to the fail counter of the tracker? i.e. the number of times it's failed in a row.

not explicitly no, deluge thinclient shows this much info, and the webUI doesn't show anything more.
tracker timeout

For this specific tracker i have around 500 torrents running, some of them have the error while most of them do not and announces properly.

In my opinion this doesn’t help.

If a tracker is not down but your connection is, does it make sense to delay the announce even more?

How do you determine that it’s the tracker which is down and not your connection?

In this era of Gbit connected servers, can a tracker really be overwhelmed by traffic up to Gbit with legitimate traffic?
If it’s hit by DDoS then the operator should be hosting somewhere else with protection.

All these would make sense 10 years ago when servers were merely of few Mbps capacity with weak hardware.

Also other clients like utorrent doesn’t do backoffs. It keeps retrying on regular intervals. So just one type of client backing off doesn’t help the tracker either. It’d require the whole swarm to be using libtorrent based clients.

This would be more in line with my thoughts as well, since my client has so many torrents running is what might be causing the timeout of too many of them trying to announce at once. since most torrents from the same tracker announces properly leads me to this conclusion as well

@arvidn
Copy link
Owner

arvidn commented Jul 12, 2020

In this era of Gbit connected servers, can a tracker really be overwhelmed by traffic up to Gbit with legitimate traffic?

I wouldn't expect the bandwidth of he connection to be the limiting factor, but shitty php code, slow databases and drives, that takes forever to handle a request.

@arvidn
Copy link
Owner

arvidn commented Jul 12, 2020

@An0n666

Also other clients like utorrent doesn’t do backoffs. It keeps retrying on regular intervals.

Do you know what that interval is?

The formula for retrying is:

delay = 5 + 5 * x / 100 * fails^2

where x is the tracker_backoff setting (which defaults to 250). With a setting of 100, the retry times would be:

1: 5 + 5 seconds
2: 5 + 10 seconds
3: 5 + 20 seconds
4: 5 + 40 seconds

With the default setting of 250, it would grow more than twice as fast.

@arvidn
Copy link
Owner

arvidn commented Jul 12, 2020

In my opinion this doesn’t help.

It would help in the sense that it would be possible to disable the exponential backoff

@arvidn
Copy link
Owner

arvidn commented Jul 12, 2020

looking a bit closer at this, as far as I can tell, there's supposed to be an upper limit on retrying failed trackers already, hard coded to 60 minutes.

@ghost
Copy link

ghost commented Jul 12, 2020

@Jerrk
Try lowering the max_concurrent_http_announce
from 50 to 20 and start/stop or force announce all torrents. Check if any of them time out...and if they do, what’s their retry time?

@ghost
Copy link

ghost commented Jul 13, 2020

The formula for retrying is:

delay = 5 + 5 * x / 100 * fails^2

where x is the tracker_backoff setting (which defaults to 250). With a setting of 100, the retry times would be:

1: 5 + 5 seconds
2: 5 + 10 seconds
3: 5 + 20 seconds
4: 5 + 40 seconds

With the default setting of 250, it would grow more than twice as fast.

At 250, It’d require 72 failures to reach 18 hours retry time.
And to reach 72 failures tracker has to be down for 441 hours. I highly doubt that’s the case here. Probably something else is broken.

@ghost
Copy link

ghost commented Jul 13, 2020

I couldn’t find a fail count in announce entry. So I’m assuming this count is an aggregate of all fail counts from all endpoints for that particular tracker?

If it’s an aggregate, then if one of the endpoints always fails but another one works, won’t that keep increasing the count exponentially for the failed endpoint even if the tracker was working? Or is the fail count reset for all endpoints if at least one of the endpoints work?

If it’s not reset for all endpoints after a successful announce, if all endpoints fail in any future announce, the fail count would be way higher and thus a higher retry time.

@arvidn
Copy link
Owner

arvidn commented Jul 21, 2020

I think this may be related to this ticket, which has a few PRs associated with it: #4851

@Jerrk
Copy link
Author

Jerrk commented Jul 26, 2020

I changed the max_concurrent_http_announce to 20 and restarted all torrents, this did not produce any timeouts.

Couple days later a bunch of my trackers got the same tracker issue, as well as "the outgoing socket was closed" error.

This happens on multiple different private trackers so the common point of them all would be my client. Not sure why the retry time would be so long though..

Restarting the client fixes the issue and sends a reannounce, then the next announce timer is at around 30m-1hr which is normal

@skuizy
Copy link

skuizy commented Aug 27, 2020

Subscribing to this ticket as I have the same issue (#4479). @Jerrk simply does a better job than me describing the issue :)

@arvidn
Copy link
Owner

arvidn commented Aug 27, 2020

the retry timeout increases exponentially. That's why it can get long. There is an upper limit, and I believe it was lowered recently, but still a few hours iirc.

@skuizy
Copy link

skuizy commented Sep 20, 2020

The formula for retrying is:

delay = 5 + 5 * x / 100 * fails^2

where x is the tracker_backoff setting (which defaults to 250). With a setting of 100, the retry times would be:

1: 5 + 5 seconds
2: 5 + 10 seconds
3: 5 + 20 seconds
4: 5 + 40 seconds

With the default setting of 250, it would grow more than twice as fast.

At 250, It’d require 72 failures to reach 18 hours retry time.
And to reach 72 failures tracker has to be down for 441 hours. I highly doubt that’s the case here. Probably something else is broken.

Well, 18h is definitely "a few hours" but it seems (to me, don't get it wrong) absudingly long :(
@An0n666 did the math and it should happen after almost 19 days if announcing fails 250 consecutive times wich is not the case here.

@arvidn
Copy link
Owner

arvidn commented Sep 20, 2020

It's quite possible it happens after 250 failed announces in a row, even if those announces happen closer together. Is there any logic in the client that causes more announces than the built-in timeouts in libtorrent?

If a tracker is down for 19 days in a row, 18 hours to retry does not seem unreasonable to me. What are the chances that the tracker will be back up on day 20?

@arvidn
Copy link
Owner

arvidn commented Sep 20, 2020

the announce entry has a fail counter btw. Is that exposed in the UI?

@skuizy
Copy link

skuizy commented Sep 20, 2020

It would make sense if we waited for 19 days indeed, but we only wait a few hours...

I don't see anything in deluge code that would overwrite lt timeouts, but I might have missed something... I'm no expert.

It doesn't look like the fails counter is exposed by deluge, which would help as well as having a way to filter torrents stuck in 'Announce sent' status to track them down.

@skuizy
Copy link

skuizy commented Oct 1, 2020

I don't think announce timer is related to tracker status anymore as I stumbled upon a new case this morning where the tracker status is OK but announce time is infinite :
image
(spoiler : this torrent wasn't seen by the tracker !)

I was then able to reannounce it and it went in announce sent status for the next 19h :(

@stale
Copy link

stale bot commented Dec 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 30, 2020
@stale stale bot closed this as completed Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants