Skip to content

Monitoring and Diagnostics

Alexey Dolotov edited this page Apr 25, 2026 · 2 revisions

A blocked proxy is silent. mtg keeps serving Telegram traffic on the loopback while every probe from the outside hits a RST or sees a mismatched cert. This page is a field guide: how to tell whether mtg is healthy, whether the host looks like the website it claims to be, and what to look at first when "the proxy stopped working."

Everything below has been tested on Ubuntu 24.04 with mtg v2 (PR #461, PR #462 branches at 9seconds/mtg). Commands are copy-paste ready.


1. mtg doctor — the pre-flight check

mtg doctor /path/to/config.toml runs five validation passes and exits non-zero (1) if any of them fails. It is the single most useful command to run after editing config or moving the proxy to a new host.

What each section means

Section What it does What "OK" looks like What failure means
Deprecated options Warns on domain-fronting-ip, domain-fronting-port, domain-fronting-proxy-protocol, network.doh-ip All good Migrate to the new [domain-fronting] / dns keys before v2.3.0
Time skewness Hits 0.pool.ntp.org once, compares drift against tolerate-time-skewness (default 5s) Time drift is X, but tolerate-time-skewness is 5s (green check) At >70% of the tolerance, FakeTLS will reject many real client connections. Fix NTP.
Native network connectivity TCP-dials each Telegram DC (1..5) on its public IPs, honours prefer-ip filter DC 1 ... DC 5 all green Outbound to Telegram is blocked or filtered. Check egress firewall, ASN reputation.
Fronting domain TCP-dials the host from the secret (or domain-fronting.ip) on port 443 host:443 is reachable The fronting domain itself is unreachable. mtg will fail every cover-traffic relay.
SNI-DNS match Resolves the hostname encoded in the secret, checks at least one A/AAAA record matches the host's public IP IP address X matches secret hostname Y The hostname doesn't point to your VPS. Censors block on this alone — see Surviving Active Probing.

Caveats

  • Time check is single-shot — one NTP query, one server. If that query times out you get cannot access ntp pool and the whole invocation returns failure even when the local clock is fine. Re-run.
  • prefer-ip = "only-ipv4" filters DC addresses before dialing. If you set IPv4-only on a host with broken IPv4 routing, the DC checks will fail loudly — that's the intended behaviour.
  • Public IP autodetect uses ifconfig.co. On a host that can't reach ifconfig.co (firewalled egress) doctor falls through to "cannot detect public IP address" and the SNI-DNS check fails. Set public-ipv4 / public-ipv6 in the config to bypass.
  • The fronting check is TCP-only. It opens a socket and closes it. It does not validate that the remote actually serves TLS — a hijacked port-443-anything will pass.

Exit code

mtg doctor exits 1 on the first failed section accumulator (it runs all sections, then exits at the end), 0 otherwise. Suitable for cron / pre-deploy gating:

mtg doctor /etc/mtg/config.toml || { echo "doctor failed"; exit 1; }

2. Startup warning: SNI/IP mismatch (PR #461)

Starting with PR #461, mtg run performs the SNI-DNS check at startup and emits a warning to the log if the secret's hostname does not resolve to the host's public IP. It does not abort — the proxy starts anyway. The intent is to catch operators who deployed without running mtg doctor.

What the warning looks like

mtg logs JSON via zerolog (logger/zerolog.go). On mismatch you'll see something like:

{
  "level": "warning",
  "hostname": "www.vk.com",
  "resolved": "87.240.190.78, 87.240.190.67",
  "public_ip": "65.108.5.233",
  "ipv4_match": "false",
  "message": "SNI-DNS mismatch: secret hostname does not resolve to this server's public IP. DPI may detect and block the proxy. See 'mtg doctor' for details"
}

Other variants from the same code path:

  • "SNI-DNS check: cannot resolve secret hostname" — DNS failure on the proxy host. Often means egress DNS is broken or the secret encodes a typo.
  • "SNI-DNS check: cannot detect public IP address; set public-ipv4/public-ipv6 in config or run 'mtg doctor'" — same ifconfig.co reachability issue as in doctor.

Grepping for it

journalctl -u mtg --since "10 min ago" | grep -i "SNI-DNS"

# Or, if you're running mtg under docker-compose:
docker logs mtg 2>&1 | grep -i "SNI-DNS"

If you see the warning, fix DNS (point your domain at the VPS) or regenerate the secret with a domain that already points there (mtg generate-secret --hex your.domain). Then restart mtg.


3. Active-probe simulation with openssl s_client

The single most useful black-box test is to be the censor and see what your proxy returns to a probe. mtg's domain-fronting fallback fires when the FakeTLS handshake doesn't authenticate; a probe from a censor doesn't hold the secret, so it always falls through to fronting.

Setup

PROXY_IP=65.108.5.233       # your VPS public IP
PROXY_PORT=443              # whatever mtg is bound to (or its TLS frontend)
DOMAIN=your.domain.example  # the hostname encoded in the secret

A. Correct SNI, no MTProto bytes

This is what a passive probe does first.

openssl s_client -connect "$PROXY_IP:$PROXY_PORT" \
    -servername "$DOMAIN" -brief </dev/null

Expected:

CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Peer certificate: CN = your.domain.example
Verification: OK

The CN/SAN must match $DOMAIN and the chain must verify. If you see Verification error: self-signed certificate or unable to get local issuer certificate, your fronting domain is not serving a real Let's Encrypt cert — that's a giveaway.

B. Wrong / random SNI

A probing crawler rotates SNI to detect MTProto proxies that break on non-MTProto handshakes.

openssl s_client -connect "$PROXY_IP:$PROXY_PORT" \
    -servername "random$(date +%s).example.org" -brief </dev/null

Expected on a correctly deployed proxy with a real SNI router (see Surviving Active Probing):

  • Either a valid certificate from a default backend (Caddy / nginx).
  • Or unrecognized_name TLS alert (alert handshake failure) which is also normal HTTPS behaviour.

Bad signs:

  • Connection reset by peer — kernel RST. No real TLS service behind port 443. The host looks naked. Censor flags it.
  • read:errno=0 after Server certificate from $DOMAIN for a totally unrelated SNI — means mtg is always serving the fronting cert regardless of SNI, which is itself a fingerprint.

C. No SNI at all

openssl s_client -connect "$PROXY_IP:$PROXY_PORT" -brief </dev/null

Expected: a valid default cert from your web stack. A real HTTPS server almost always answers without SNI (browsers don't always send it on retries). Anything that hangs or RSTs is a problem.

D. Compare to a baseline

Run the same three commands against a known-good website on similar hosting and diff. Any structural difference (cipher list ordering, ALPN advertisement, supported_versions extension) leaks "this is not nginx".

openssl s_client -connect www.cloudflare.com:443 \
    -servername www.cloudflare.com -brief </dev/null

-brief cuts the noise to ~10 lines. Drop it if you need to inspect extensions; add -msg to dump every TLS record.


4. tcpdump / ss / nstat recipes

These are kernel-level views of the proxy. mtg has no direct hook into them, but the kernel sees every probe regardless of how mtg responds.

Live connection table (ss)

# Established connections to mtg's port:
ss -tnp state established '( sport = :443 )'

# Anything in SYN-RECV (incoming half-open, possible scan):
ss -tn state syn-recv

# Aggregate counts by state:
ss -tan '( sport = :443 )' | awk 'NR>1 {print $1}' | sort | uniq -c

Connection rate and TCP errors (nstat)

nstat reads /proc/net/netstat and /proc/net/snmp and shows deltas since the last invocation per user. Verified counter names on Ubuntu 24.04:

# Per-interval (5s) snapshot of the relevant counters:
nstat -t 5 TcpActiveOpens TcpPassiveOpens TcpCurrEstab \
            TcpExtTCPSynRetrans TcpExtTCPAbortOnData \
            TcpExtListenDrops TcpExtListenOverflows

What to watch:

  • TcpPassiveOpens — incoming TCP handshakes completed. Spikes with no matching TcpCurrEstab increase = scan / probe burst.
  • TcpExtListenDrops / ListenOverflows — accept queue full. Means mtg's concurrency or kernel somaxconn is too low under load.
  • TcpExtTCPAbortOnData — RSTs sent because data arrived on a closed socket. A sudden spike often correlates with active probing plus mtg killing connections post-handshake.

For a one-shot delta snapshot:

nstat -rs >/dev/null; sleep 60; nstat

Packet capture (tcpdump)

# All traffic to/from port 443, with TCP flags, no name resolution:
sudo tcpdump -i any -nn -tttt 'tcp port 443' -c 200

# RSTs only — useful for spotting probes mtg or the kernel reset:
sudo tcpdump -i any -nn 'tcp port 443 and tcp[tcpflags] & tcp-rst != 0'

# ClientHello SNI extraction (works on most distros' tcpdump):
sudo tcpdump -i any -nn -A -s 0 'tcp port 443 and (tcp[((tcp[12:1] & 0xf0) >> 2)+5:1] = 0x01)' \
    | grep -aoE '[a-z0-9.-]+\.[a-z]{2,}' | sort -u

The last recipe is approximate — it greps printable text out of TLS records that look like ClientHellos. For real SNI logging on the host, use a SNI-routing frontend (HAProxy / sslh) and read its logs instead; those are reliable.

Watching mtg's own listener

# Find mtg's listening sockets and PID:
ss -tlnp | grep mtg

# Follow established sessions over time:
watch -n2 "ss -tnp '( sport = :443 )' | wc -l"

5. Prometheus stats endpoint

mtg ships a Prometheus exporter. Enable in config:

[stats.prometheus]
enabled = true
bind-to = "127.0.0.1:3129"
http-path = "/"
metric-prefix = "mtg"

Scrape it with curl http://127.0.0.1:3129/. Bind to loopback unless you have a real reason — the endpoint exposes per-IP DC connection counts.

Exposed metrics (verified in source)

All names are prefixed with metric-prefix (default mtg). Definitions are in stats/init.go:

Metric Type Labels Meaning
mtg_client_connections gauge ip_family (ipv4|ipv6) Active client sessions
mtg_telegram_connections gauge telegram_ip, dc Active upstream sessions to Telegram DCs
mtg_domain_fronting_connections gauge ip_family Active sessions routed to the fronting domain (= probes + non-MTProto traffic)
mtg_telegram_traffic counter telegram_ip, dc, direction (to_client|from_client) Bytes to/from Telegram
mtg_domain_fronting_traffic counter direction Bytes to/from the fronting domain
mtg_domain_fronting counter Total times mtg fell through to fronting
mtg_concurrency_limited counter Sessions rejected due to concurrency cap
mtg_ip_blocklisted counter ip_list (blocklist|allowlist) Sessions rejected by IP list
mtg_replay_attacks counter Detected SessionID replays (mtg routes them to fronting)
mtg_iplist_size gauge ip_list Loaded entries in block/allow list

What to alert on

  • rate(mtg_domain_fronting[5m]) spiking while rate(mtg_telegram_connections[5m]) is flat → active probing or a buggy client. Cross-reference with logs (next section).
  • mtg_replay_attacks non-zero → either active probing replaying captured ClientHellos, or your anti-replay cache is too small for legitimate session churn.
  • mtg_concurrency_limited > 0 consistently → raise concurrency in config; legitimate clients are being dropped.
  • mtg_client_connections{ip_family="ipv6"} = 0 when you expect IPv6 traffic → check prefer-ip and IPv6 routing.

Minimal Prometheus scrape config

scrape_configs:
  - job_name: mtg
    static_configs:
      - targets: ['127.0.0.1:3129']
    scrape_interval: 30s

If mtg is on a remote host, run a SSH tunnel rather than exposing the endpoint publicly:

ssh -L 9129:127.0.0.1:3129 alexey@your.proxy.host

statsd alternative

The same events feed [stats.statsd], which speaks UDP statsd with datadog/influxdb/graphite tag formats. Configured the same way.


6. Log-based detection of active probing

mtg logs every failed FakeTLS handshake at info level with the message cannot read client hello, then transparently fronts. A burst of these from many distinct source IPs is the classic active-probing signature.

Other relevant log lines:

Message Level Source line Meaning
cannot read client hello info proxy.go:198 Handshake failed → falling through to fronting. One per non-MTProto connection.
replay attack has been detected! warning proxy.go:204 SessionID seen before. Probe or buggy client.
cannot send welcome packet info proxy.go:214 Client hung up mid-handshake. Probe scanner.
cannot dial to telegram warning proxy.go:112 Egress to a Telegram DC failed.
cannot dial to the fronting domain warning proxy.go:303 Fronting upstream is down → probe sees a RST/reset, very bad signal.
unknown DC, fallbacks warning proxy.go:242 Client requested a DC mtg doesn't know.
ip was rejected by allowlist / ip was blacklisted info proxy.go:147,155 IP-list rejection. mtg routes to fronting anyway.

Quick burst detector

# Count "cannot read client hello" per minute over the last hour:
journalctl -u mtg --since "1 hour ago" -o json \
  | jq -r 'select(.MESSAGE | contains("cannot read client hello")) | .__REALTIME_TIMESTAMP[:10]' \
  | awk '{print strftime("%Y-%m-%d %H:%M", $1/1000000)}' \
  | sort | uniq -c | sort -rn | head

Adjust for your log format if mtg is logging to a file or to docker. A sustained rate of >> baseline is suspicious — compare to a quiet hour on a working day.

Sample alerting query (Loki / Grafana)

sum by (host) (rate({job="mtg"} |= "cannot read client hello" [5m]))

Threshold this against the expected rate of organic non-MTProto traffic on your fronting domain. For most operator-run proxies that's near zero; a 30/min sustained rate is already a probe campaign.


7. Forensic checklist: "my proxy suddenly got blocked"

Run these in order. Stop at the first one that yields a smoking gun.

# 0. Are the basics up?
systemctl status mtg                  # service running?
ss -tlnp | grep mtg                   # listening on the right port?
ping -c2 1.1.1.1                      # egress works?

# 1. Is mtg internally healthy?
mtg doctor /etc/mtg/config.toml
# Look at: SNI-DNS match, fronting reachable, all 5 DCs green.

# 2. Was the secret/domain changed recently?
sudo journalctl -u mtg --since today | grep -i "SNI-DNS\|configuration"

# 3. Does a probe from outside see a real TLS service?
# (Run from a different host; from the proxy itself it's loopback.)
openssl s_client -connect YOUR_IP:443 -servername YOUR_DOMAIN -brief </dev/null
openssl s_client -connect YOUR_IP:443 -servername bing.com -brief </dev/null

# 4. Is the fronting domain itself reachable from the VPS?
curl -vIk --resolve YOUR_DOMAIN:443:$(dig +short A YOUR_DOMAIN | head -1) \
    https://YOUR_DOMAIN/

# 5. Is something filtering on the path?
# Look at TCP-level errors and RSTs over the last minute:
nstat -rs >/dev/null
sleep 60
nstat | grep -E "Tcp(ActiveOpens|PassiveOpens|RetransSegs|OutRsts)|TcpExtTCP(SynRetrans|AbortOn|Listen)"

# 6. Are clients reaching us?
ss -tn '( sport = :443 )' | head -20
sudo tcpdump -i any -nn -c 50 'tcp port 443 and tcp[tcpflags] & tcp-syn != 0'

# 7. Bursts of failed handshakes (probe campaign)?
journalctl -u mtg --since "1 hour ago" | grep -c "cannot read client hello"
journalctl -u mtg --since "1 hour ago" | grep -c "replay attack"

# 8. Has the host's public IP changed?
curl -4 https://ifconfig.co
curl -6 https://ifconfig.co
# Compare against `dig +short A YOUR_DOMAIN` and `dig +short AAAA YOUR_DOMAIN`.

# 9. ASN-level reputation?
whois -h whois.cymru.com " -v $(curl -4 -s https://ifconfig.co)"

# 10. From inside a censored network: does TLS even reach you?
# (Ask a user to run `openssl s_client` from the censored side.)
# If their TCP SYN never arrives at step 6, it's an L3/L4 block, not
# something mtg can fix.

If steps 1–4 pass but step 6 shows zero incoming SYNs from a censored network, you are IP-blocked at the network edge — only an IP/ASN change or a CDN front (see Surviving Active Probing § B) helps.

If step 7 shows a sustained burst right before the block hit, you were actively probed and your active-probing surface was insufficient — see Surviving Active Probing § 1–3. Move to a real SNI router and a real co-located web service before bringing the proxy back up.


See also

Clone this wiki locally