Skip to content

doctor: run DC connectivity probes in parallel#485

Merged
9seconds merged 2 commits into
9seconds:masterfrom
dolonet:doctor/parallel-dc
May 4, 2026
Merged

doctor: run DC connectivity probes in parallel#485
9seconds merged 2 commits into
9seconds:masterfrom
dolonet:doctor/parallel-dc

Conversation

@dolonet
Copy link
Copy Markdown
Collaborator

@dolonet dolonet commented Apr 29, 2026

Summary

mtg doctor's Validate native network connectivity section dials each Telegram DC sequentially, with a 10s timeout per DC. When egress to Telegram is broken (e.g. host has no native route, only proxy chain) this means waiting ~60s (6 DCs × 10s) for the section to finish.

This PR runs the per-DC dials concurrently using `sync.WaitGroup`, collects results, then prints them in DC order so the output is unchanged. Worst case becomes a single ~10s window.

No timeout is changed; this is purely a concurrency change. Same applies regardless of whether `network.proxies` is configured — it speeds up the broken-egress case for everyone.

Refs #482. Pairs nicely with #484 (which makes the section skippable entirely) but is independent and useful on its own.

Test plan

  • go vet ./internal/cli/ clean
  • go build ./... clean
  • Manual: run `mtg doctor` against a config where Telegram DCs are unreachable; confirm the section completes in ~10s instead of ~60s, and DC results print in numeric order (1, 2, 3, 4, 5, 203)

Each DC dial uses a 10s timeout, and "checkNetwork" iterates 6 DCs
sequentially, so worst case is ~60s when egress is broken. Probing in
parallel collapses the worst case to a single timeout window while
preserving the existing DC-ordered output.

Refs 9seconds#482
Comment thread internal/cli/doctor.go Outdated

var wg sync.WaitGroup
for i, dc := range dcs {
wg.Add(1)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick: with latest Golang there is a bit more idiomatic way of doing that: https://pkg.go.dev/sync#WaitGroup.Go

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to wg.Go, thanks.

Comment thread internal/cli/doctor.go
err := d.checkNetworkAddresses(ntw, essentials.TelegramCoreAddresses[dc])
if err == nil {
for i, dc := range dcs {
if errs[i] == nil {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a flaw in the logic: if checkNetworkAdresses panics (for any reason), then nothing will be written in the array.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a recover inside the goroutine that stores the panic as that DC's error, so one bad probe no longer kills the whole doctor run.

Address review feedback on 9seconds#485:
- switch to sync.WaitGroup.Go (Go 1.25+) for the per-DC goroutine
- recover panics inside the goroutine and record them as that DC's
  error, so a single panicking probe no longer crashes the whole
  doctor run and the remaining DCs still report their results
@9seconds 9seconds merged commit 827bbd6 into 9seconds:master May 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants