-
Notifications
You must be signed in to change notification settings - Fork 367
Monitoring and Diagnostics
A blocked proxy is silent. mtg keeps serving Telegram traffic on the loopback while every probe from the outside hits a RST or sees a mismatched cert. This page is a field guide: how to tell whether mtg is healthy, whether the host looks like the website it claims to be, and what to look at first when "the proxy stopped working."
Everything below has been tested on Ubuntu 24.04 with mtg v2 (PR #461,
PR #462 branches at 9seconds/mtg). Commands are copy-paste ready.
mtg doctor /path/to/config.toml runs five validation passes and exits
non-zero (1) if any of them fails. It is the single most useful
command to run after editing config or moving the proxy to a new host.
| Section | What it does | What "OK" looks like | What failure means |
|---|---|---|---|
| Deprecated options | Warns on domain-fronting-ip, domain-fronting-port, domain-fronting-proxy-protocol, network.doh-ip
|
All good |
Migrate to the new [domain-fronting] / dns keys before v2.3.0
|
| Time skewness | Hits 0.pool.ntp.org once, compares drift against tolerate-time-skewness (default 5s) |
Time drift is X, but tolerate-time-skewness is 5s (green check) |
At >70% of the tolerance, FakeTLS will reject many real client connections. Fix NTP. |
| Native network connectivity | TCP-dials each Telegram DC (1..5) on its public IPs, honours prefer-ip filter |
DC 1 ... DC 5 all green |
Outbound to Telegram is blocked or filtered. Check egress firewall, ASN reputation. |
| Fronting domain | TCP-dials the host from the secret (or domain-fronting.ip) on port 443 |
host:443 is reachable |
The fronting domain itself is unreachable. mtg will fail every cover-traffic relay. |
| SNI-DNS match | Resolves the hostname encoded in the secret, checks at least one A/AAAA record matches the host's public IP | IP address X matches secret hostname Y |
The hostname doesn't point to your VPS. Censors block on this alone — see Surviving Active Probing. |
-
Time check is single-shot — one NTP query, one server. If that
query times out you get
cannot access ntp pooland the whole invocation returns failure even when the local clock is fine. Re-run. -
prefer-ip = "only-ipv4"filters DC addresses before dialing. If you set IPv4-only on a host with broken IPv4 routing, the DC checks will fail loudly — that's the intended behaviour. -
Public IP autodetect uses ifconfig.co. On a host that can't
reach ifconfig.co (firewalled egress) doctor falls through to "cannot
detect public IP address" and the SNI-DNS check fails. Set
public-ipv4/public-ipv6in the config to bypass. - The fronting check is TCP-only. It opens a socket and closes it. It does not validate that the remote actually serves TLS — a hijacked port-443-anything will pass.
mtg doctor exits 1 on the first failed section accumulator (it
runs all sections, then exits at the end), 0 otherwise. Suitable for
cron / pre-deploy gating:
mtg doctor /etc/mtg/config.toml || { echo "doctor failed"; exit 1; }Starting with PR #461, mtg run performs the SNI-DNS check at startup
and emits a warning to the log if the secret's hostname does not resolve
to the host's public IP. It does not abort — the proxy starts
anyway. The intent is to catch operators who deployed without running
mtg doctor.
mtg logs JSON via zerolog (logger/zerolog.go). On mismatch you'll see
something like:
{
"level": "warning",
"hostname": "www.vk.com",
"resolved": "87.240.190.78, 87.240.190.67",
"public_ip": "65.108.5.233",
"ipv4_match": "false",
"message": "SNI-DNS mismatch: secret hostname does not resolve to this server's public IP. DPI may detect and block the proxy. See 'mtg doctor' for details"
}Other variants from the same code path:
-
"SNI-DNS check: cannot resolve secret hostname"— DNS failure on the proxy host. Often means egress DNS is broken or the secret encodes a typo. -
"SNI-DNS check: cannot detect public IP address; set public-ipv4/public-ipv6 in config or run 'mtg doctor'"— same ifconfig.co reachability issue as in doctor.
journalctl -u mtg --since "10 min ago" | grep -i "SNI-DNS"
# Or, if you're running mtg under docker-compose:
docker logs mtg 2>&1 | grep -i "SNI-DNS"If you see the warning, fix DNS (point your domain at the VPS) or
regenerate the secret with a domain that already points there
(mtg generate-secret --hex your.domain). Then restart mtg.
The single most useful black-box test is to be the censor and see what your proxy returns to a probe. mtg's domain-fronting fallback fires when the FakeTLS handshake doesn't authenticate; a probe from a censor doesn't hold the secret, so it always falls through to fronting.
PROXY_IP=65.108.5.233 # your VPS public IP
PROXY_PORT=443 # whatever mtg is bound to (or its TLS frontend)
DOMAIN=your.domain.example # the hostname encoded in the secretThis is what a passive probe does first.
openssl s_client -connect "$PROXY_IP:$PROXY_PORT" \
-servername "$DOMAIN" -brief </dev/nullExpected:
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Peer certificate: CN = your.domain.example
Verification: OK
The CN/SAN must match $DOMAIN and the chain must verify. If you see
Verification error: self-signed certificate or unable to get local issuer certificate, your fronting domain is not serving a real Let's
Encrypt cert — that's a giveaway.
A probing crawler rotates SNI to detect MTProto proxies that break on non-MTProto handshakes.
openssl s_client -connect "$PROXY_IP:$PROXY_PORT" \
-servername "random$(date +%s).example.org" -brief </dev/nullExpected on a correctly deployed proxy with a real SNI router (see Surviving Active Probing):
- Either a valid certificate from a default backend (Caddy / nginx).
- Or
unrecognized_nameTLS alert (alert handshake failure) which is also normal HTTPS behaviour.
Bad signs:
-
Connection reset by peer— kernel RST. No real TLS service behind port 443. The host looks naked. Censor flags it. -
read:errno=0afterServer certificatefrom$DOMAINfor a totally unrelated SNI — means mtg is always serving the fronting cert regardless of SNI, which is itself a fingerprint.
openssl s_client -connect "$PROXY_IP:$PROXY_PORT" -brief </dev/nullExpected: a valid default cert from your web stack. A real HTTPS server almost always answers without SNI (browsers don't always send it on retries). Anything that hangs or RSTs is a problem.
Run the same three commands against a known-good website on similar hosting and diff. Any structural difference (cipher list ordering, ALPN advertisement, supported_versions extension) leaks "this is not nginx".
openssl s_client -connect www.cloudflare.com:443 \
-servername www.cloudflare.com -brief </dev/null-brief cuts the noise to ~10 lines. Drop it if you need to inspect
extensions; add -msg to dump every TLS record.
These are kernel-level views of the proxy. mtg has no direct hook into them, but the kernel sees every probe regardless of how mtg responds.
# Established connections to mtg's port:
ss -tnp state established '( sport = :443 )'
# Anything in SYN-RECV (incoming half-open, possible scan):
ss -tn state syn-recv
# Aggregate counts by state:
ss -tan '( sport = :443 )' | awk 'NR>1 {print $1}' | sort | uniq -cnstat reads /proc/net/netstat and /proc/net/snmp and shows deltas
since the last invocation per user. Verified counter names on Ubuntu
24.04:
# Per-interval (5s) snapshot of the relevant counters:
nstat -t 5 TcpActiveOpens TcpPassiveOpens TcpCurrEstab \
TcpExtTCPSynRetrans TcpExtTCPAbortOnData \
TcpExtListenDrops TcpExtListenOverflowsWhat to watch:
-
TcpPassiveOpens— incoming TCP handshakes completed. Spikes with no matchingTcpCurrEstabincrease = scan / probe burst. -
TcpExtListenDrops/ListenOverflows— accept queue full. Means mtg'sconcurrencyor kernelsomaxconnis too low under load. -
TcpExtTCPAbortOnData— RSTs sent because data arrived on a closed socket. A sudden spike often correlates with active probing plus mtg killing connections post-handshake.
For a one-shot delta snapshot:
nstat -rs >/dev/null; sleep 60; nstat# All traffic to/from port 443, with TCP flags, no name resolution:
sudo tcpdump -i any -nn -tttt 'tcp port 443' -c 200
# RSTs only — useful for spotting probes mtg or the kernel reset:
sudo tcpdump -i any -nn 'tcp port 443 and tcp[tcpflags] & tcp-rst != 0'
# ClientHello SNI extraction (works on most distros' tcpdump):
sudo tcpdump -i any -nn -A -s 0 'tcp port 443 and (tcp[((tcp[12:1] & 0xf0) >> 2)+5:1] = 0x01)' \
| grep -aoE '[a-z0-9.-]+\.[a-z]{2,}' | sort -uThe last recipe is approximate — it greps printable text out of TLS records that look like ClientHellos. For real SNI logging on the host, use a SNI-routing frontend (HAProxy / sslh) and read its logs instead; those are reliable.
# Find mtg's listening sockets and PID:
ss -tlnp | grep mtg
# Follow established sessions over time:
watch -n2 "ss -tnp '( sport = :443 )' | wc -l"mtg ships a Prometheus exporter. Enable in config:
[stats.prometheus]
enabled = true
bind-to = "127.0.0.1:3129"
http-path = "/"
metric-prefix = "mtg"Scrape it with curl http://127.0.0.1:3129/. Bind to loopback unless
you have a real reason — the endpoint exposes per-IP DC connection
counts.
All names are prefixed with metric-prefix (default mtg). Definitions
are in stats/init.go:
| Metric | Type | Labels | Meaning |
|---|---|---|---|
mtg_client_connections |
gauge |
ip_family (ipv4|ipv6) |
Active client sessions |
mtg_telegram_connections |
gauge |
telegram_ip, dc
|
Active upstream sessions to Telegram DCs |
mtg_domain_fronting_connections |
gauge | ip_family |
Active sessions routed to the fronting domain (= probes + non-MTProto traffic) |
mtg_telegram_traffic |
counter |
telegram_ip, dc, direction (to_client|from_client) |
Bytes to/from Telegram |
mtg_domain_fronting_traffic |
counter | direction |
Bytes to/from the fronting domain |
mtg_domain_fronting |
counter | — | Total times mtg fell through to fronting |
mtg_concurrency_limited |
counter | — | Sessions rejected due to concurrency cap |
mtg_ip_blocklisted |
counter |
ip_list (blocklist|allowlist) |
Sessions rejected by IP list |
mtg_replay_attacks |
counter | — | Detected SessionID replays (mtg routes them to fronting) |
mtg_iplist_size |
gauge | ip_list |
Loaded entries in block/allow list |
-
rate(mtg_domain_fronting[5m])spiking whilerate(mtg_telegram_connections[5m])is flat → active probing or a buggy client. Cross-reference with logs (next section). -
mtg_replay_attacksnon-zero → either active probing replaying captured ClientHellos, or your anti-replay cache is too small for legitimate session churn. -
mtg_concurrency_limited > 0consistently → raiseconcurrencyin config; legitimate clients are being dropped. -
mtg_client_connections{ip_family="ipv6"}= 0 when you expect IPv6 traffic → checkprefer-ipand IPv6 routing.
scrape_configs:
- job_name: mtg
static_configs:
- targets: ['127.0.0.1:3129']
scrape_interval: 30sIf mtg is on a remote host, run a SSH tunnel rather than exposing the endpoint publicly:
ssh -L 9129:127.0.0.1:3129 alexey@your.proxy.hostThe same events feed [stats.statsd], which speaks UDP statsd with
datadog/influxdb/graphite tag formats. Configured the same way.
mtg logs every failed FakeTLS handshake at info level with the
message cannot read client hello, then transparently fronts. A burst
of these from many distinct source IPs is the classic active-probing
signature.
Other relevant log lines:
| Message | Level | Source line | Meaning |
|---|---|---|---|
cannot read client hello |
info | proxy.go:198 | Handshake failed → falling through to fronting. One per non-MTProto connection. |
replay attack has been detected! |
warning | proxy.go:204 | SessionID seen before. Probe or buggy client. |
cannot send welcome packet |
info | proxy.go:214 | Client hung up mid-handshake. Probe scanner. |
cannot dial to telegram |
warning | proxy.go:112 | Egress to a Telegram DC failed. |
cannot dial to the fronting domain |
warning | proxy.go:303 | Fronting upstream is down → probe sees a RST/reset, very bad signal. |
unknown DC, fallbacks |
warning | proxy.go:242 | Client requested a DC mtg doesn't know. |
ip was rejected by allowlist / ip was blacklisted
|
info | proxy.go:147,155 | IP-list rejection. mtg routes to fronting anyway. |
# Count "cannot read client hello" per minute over the last hour:
journalctl -u mtg --since "1 hour ago" -o json \
| jq -r 'select(.MESSAGE | contains("cannot read client hello")) | .__REALTIME_TIMESTAMP[:10]' \
| awk '{print strftime("%Y-%m-%d %H:%M", $1/1000000)}' \
| sort | uniq -c | sort -rn | headAdjust for your log format if mtg is logging to a file or to docker. A sustained rate of >> baseline is suspicious — compare to a quiet hour on a working day.
sum by (host) (rate({job="mtg"} |= "cannot read client hello" [5m]))
Threshold this against the expected rate of organic non-MTProto traffic on your fronting domain. For most operator-run proxies that's near zero; a 30/min sustained rate is already a probe campaign.
Run these in order. Stop at the first one that yields a smoking gun.
# 0. Are the basics up?
systemctl status mtg # service running?
ss -tlnp | grep mtg # listening on the right port?
ping -c2 1.1.1.1 # egress works?
# 1. Is mtg internally healthy?
mtg doctor /etc/mtg/config.toml
# Look at: SNI-DNS match, fronting reachable, all 5 DCs green.
# 2. Was the secret/domain changed recently?
sudo journalctl -u mtg --since today | grep -i "SNI-DNS\|configuration"
# 3. Does a probe from outside see a real TLS service?
# (Run from a different host; from the proxy itself it's loopback.)
openssl s_client -connect YOUR_IP:443 -servername YOUR_DOMAIN -brief </dev/null
openssl s_client -connect YOUR_IP:443 -servername bing.com -brief </dev/null
# 4. Is the fronting domain itself reachable from the VPS?
curl -vIk --resolve YOUR_DOMAIN:443:$(dig +short A YOUR_DOMAIN | head -1) \
https://YOUR_DOMAIN/
# 5. Is something filtering on the path?
# Look at TCP-level errors and RSTs over the last minute:
nstat -rs >/dev/null
sleep 60
nstat | grep -E "Tcp(ActiveOpens|PassiveOpens|RetransSegs|OutRsts)|TcpExtTCP(SynRetrans|AbortOn|Listen)"
# 6. Are clients reaching us?
ss -tn '( sport = :443 )' | head -20
sudo tcpdump -i any -nn -c 50 'tcp port 443 and tcp[tcpflags] & tcp-syn != 0'
# 7. Bursts of failed handshakes (probe campaign)?
journalctl -u mtg --since "1 hour ago" | grep -c "cannot read client hello"
journalctl -u mtg --since "1 hour ago" | grep -c "replay attack"
# 8. Has the host's public IP changed?
curl -4 https://ifconfig.co
curl -6 https://ifconfig.co
# Compare against `dig +short A YOUR_DOMAIN` and `dig +short AAAA YOUR_DOMAIN`.
# 9. ASN-level reputation?
whois -h whois.cymru.com " -v $(curl -4 -s https://ifconfig.co)"
# 10. From inside a censored network: does TLS even reach you?
# (Ask a user to run `openssl s_client` from the censored side.)
# If their TCP SYN never arrives at step 6, it's an L3/L4 block, not
# something mtg can fix.If steps 1–4 pass but step 6 shows zero incoming SYNs from a censored network, you are IP-blocked at the network edge — only an IP/ASN change or a CDN front (see Surviving Active Probing § B) helps.
If step 7 shows a sustained burst right before the block hit, you were actively probed and your active-probing surface was insufficient — see Surviving Active Probing § 1–3. Move to a real SNI router and a real co-located web service before bringing the proxy back up.
- Surviving Active Probing — deployment patterns that make mtg invisible to active DPI.
-
example.config.toml— every option documented inline. - stats/init.go — authoritative metric names and tags.