doctor: run DC connectivity probes in parallel#485
Merged
Conversation
Each DC dial uses a 10s timeout, and "checkNetwork" iterates 6 DCs sequentially, so worst case is ~60s when egress is broken. Probing in parallel collapses the worst case to a single timeout window while preserving the existing DC-ordered output. Refs 9seconds#482
0a42a75 to
a9011c0
Compare
This was referenced Apr 29, 2026
9seconds
reviewed
May 4, 2026
|
|
||
| var wg sync.WaitGroup | ||
| for i, dc := range dcs { | ||
| wg.Add(1) |
Owner
There was a problem hiding this comment.
Just a nitpick: with latest Golang there is a bit more idiomatic way of doing that: https://pkg.go.dev/sync#WaitGroup.Go
Collaborator
Author
There was a problem hiding this comment.
Switched to wg.Go, thanks.
| err := d.checkNetworkAddresses(ntw, essentials.TelegramCoreAddresses[dc]) | ||
| if err == nil { | ||
| for i, dc := range dcs { | ||
| if errs[i] == nil { |
Owner
There was a problem hiding this comment.
There is a flaw in the logic: if checkNetworkAdresses panics (for any reason), then nothing will be written in the array.
Collaborator
Author
There was a problem hiding this comment.
Added a recover inside the goroutine that stores the panic as that DC's error, so one bad probe no longer kills the whole doctor run.
Address review feedback on 9seconds#485: - switch to sync.WaitGroup.Go (Go 1.25+) for the per-DC goroutine - recover panics inside the goroutine and record them as that DC's error, so a single panicking probe no longer crashes the whole doctor run and the remaining DCs still report their results
This was referenced May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mtg doctor's Validate native network connectivity section dials each Telegram DC sequentially, with a 10s timeout per DC. When egress to Telegram is broken (e.g. host has no native route, only proxy chain) this means waiting ~60s (6 DCs × 10s) for the section to finish.
This PR runs the per-DC dials concurrently using `sync.WaitGroup`, collects results, then prints them in DC order so the output is unchanged. Worst case becomes a single ~10s window.
No timeout is changed; this is purely a concurrency change. Same applies regardless of whether `network.proxies` is configured — it speeds up the broken-egress case for everyone.
Refs #482. Pairs nicely with #484 (which makes the section skippable entirely) but is independent and useful on its own.
Test plan