Conversation
|
Tested the branch against the same setup from #507 (no local patches beyond building from this branch, stock 5s ResultThe multi-minute cascade is gone.
Every one of the 9 timeout events observed followed the same shape: Secondary observationWithin the same run that triggered the timeout, the first Representative snippet: My best guess is that the bouncer in front of the FTP server needs a moment to release state after the poisoned connection is torn down, and it's not fully ready to negotiate a fresh Not a blocker from my side — total wasted time went from hours/day to tens of seconds/day — but wanted to flag it in case it's worth letting the initial reconnect attempt happen after a small delay, or not attempting reconnect again within the same run after it fails once. Happy to test any variant you'd like. Thanks for the quick turnaround on this one. |
|
@com6056 Thanks a lot for this report! I pushed a follow-up for the second issue you found. After a timeout poisons the session and ftpgrab closes it, the client now allows one reconnect attempt in the same run. If that reconnect fails, for example with I chose that over adding an arbitrary sleep before reconnect. The expensive part here was the repeated reconnect thrash, so this change removes that behavior without guessing at a delay that may or may not help on other servers. If you get a chance to retest, the useful thing to confirm is that the poisoned session still gets discarded after the timeout, and the rest of the same cron tick no longer repeatedly attempts the broken TLS reconnect. |
|
Tested the follow-up commit ( Results863 runs, 45 timeout events, 0 desync cascades. The two runs over 2 minutes were legitimate large-file downloads (3.97 GB), not hangs. The Only one Next scheduled tick reconnects cleanly and completes in the usual ~1 minute. Looks good to merge from my end. Thanks for the quick fix — this has been a significant quality-of-life improvement. |
|
Awesome thanks a lot for your feedback on this ❤️ |
fixes #507
The FTP client now keeps the state needed to reconnect lazily after timeout-shaped transfer failures.
Retrievealso stops ignoringresp.Close()failures, because that close path is where the trailing FTP control response timeout is surfaced byjlaffaye/ftpmodule.#507 describes a real failure mode where one stalled transfer poisons the control connection and turns a short run into minutes of repeated failures. FTP has no reliable in-band resync after that state is corrupted, so the correct recovery path is to discard the connection and reconnect before the next operation.