Summary
All three file-access functions in totp.py use synchronous subprocess.run() to call sudo cat and sudo tee. These are called from async handlers in bot.py, blocking the asyncio event loop for the duration of each subprocess. A single TOTP verification can spawn up to three sequential sudo calls, each with a 5-second timeout, meaning the event loop can be blocked for up to 15 seconds during a single verify_code() invocation.
Blocking calls
Three private functions perform synchronous subprocess calls:
_read_secret() (line 49): subprocess.run(["sudo", "-n", "cat", TOTP_SECRET_PATH], timeout=5) - reads the TOTP secret
_read_attempts() (line 73): subprocess.run(["sudo", "-n", "cat", TOTP_ATTEMPTS_PATH], timeout=5) - reads rate-limiting state
_write_attempts() (line 105): subprocess.run(["sudo", "-n", "tee", TOTP_ATTEMPTS_PATH], timeout=5) - persists rate-limiting state
These are wrapped by the public API that bot.py calls:
| Public function |
Subprocess calls |
Max block time |
is_totp_configured() |
_read_secret() x1 |
5s (cached after first True) |
get_lockout_remaining() |
_read_attempts() x1 |
5s |
get_failure_count() |
_read_attempts() x1 |
5s |
verify_code() |
_read_attempts() + _read_secret() + _write_attempts() |
15s |
Call sites in bot.py
During a TOTP verification flow in handle_message(), the following synchronous calls execute on the event loop:
- Line 1627:
is_totp_configured() - usually cached, but blocks on first call
- Line 1658:
get_lockout_remaining() - one sudo call
- Line 1669:
verify_code() - three sequential sudo calls
- Line 1676:
get_lockout_remaining() - one sudo call (on failure path)
- Line 1683:
get_failure_count() - one sudo call (on failure path)
A failed verification attempt hits steps 1-5, totaling up to 5 sudo calls with a combined max block time of 25 seconds. Even on the success path (steps 1-3), that is up to 4 calls and 20 seconds.
Impact
While subprocess.run() is executing, the entire asyncio event loop is frozen. No other coroutines can make progress:
- All users are affected, not just the one authenticating. Other users' messages, webhook deliveries, health checks, and cron job dispatches all stall.
- Under normal conditions,
sudo -n completes in milliseconds and the blocking is negligible. But if sudo hangs (misconfigured sudoers, PAM issue, NFS-mounted /etc, disk I/O pressure), the event loop freezes for up to the 5-second timeout per call.
- The 5-second timeout per call is a hard cap, but even sub-second blocking is problematic in principle. Synchronous I/O in an async event loop violates the cooperative scheduling contract and can cause cascading latency spikes for all concurrent operations.
Why sudo is involved
The TOTP secret and rate-limiting state live in root-owned files under /etc/kai/ (mode 0600). The bot runs as user kai and accesses them via sudoers-authorized sudo -n cat and sudo -n tee commands. This is a deliberate security boundary: the kai user (and inner Claude) cannot directly read the secret or tamper with the lockout state. The subprocess-based access is correct from a security standpoint; the issue is that it uses the synchronous subprocess API instead of the async one.
Severity
MEDIUM - only affects deployments with TOTP enabled (opt-in). Under normal conditions the blocking is sub-millisecond and invisible. But it degrades poorly: any sudo slowdown (disk, PAM, sudoers misconfiguration) blocks all bot operations for all users, with no timeout visibility or graceful degradation.
Summary
All three file-access functions in
totp.pyuse synchronoussubprocess.run()to callsudo catandsudo tee. These are called from async handlers inbot.py, blocking the asyncio event loop for the duration of each subprocess. A single TOTP verification can spawn up to three sequentialsudocalls, each with a 5-second timeout, meaning the event loop can be blocked for up to 15 seconds during a singleverify_code()invocation.Blocking calls
Three private functions perform synchronous subprocess calls:
_read_secret()(line 49):subprocess.run(["sudo", "-n", "cat", TOTP_SECRET_PATH], timeout=5)- reads the TOTP secret_read_attempts()(line 73):subprocess.run(["sudo", "-n", "cat", TOTP_ATTEMPTS_PATH], timeout=5)- reads rate-limiting state_write_attempts()(line 105):subprocess.run(["sudo", "-n", "tee", TOTP_ATTEMPTS_PATH], timeout=5)- persists rate-limiting stateThese are wrapped by the public API that
bot.pycalls:is_totp_configured()_read_secret()x1get_lockout_remaining()_read_attempts()x1get_failure_count()_read_attempts()x1verify_code()_read_attempts()+_read_secret()+_write_attempts()Call sites in bot.py
During a TOTP verification flow in
handle_message(), the following synchronous calls execute on the event loop:is_totp_configured()- usually cached, but blocks on first callget_lockout_remaining()- one sudo callverify_code()- three sequential sudo callsget_lockout_remaining()- one sudo call (on failure path)get_failure_count()- one sudo call (on failure path)A failed verification attempt hits steps 1-5, totaling up to 5 sudo calls with a combined max block time of 25 seconds. Even on the success path (steps 1-3), that is up to 4 calls and 20 seconds.
Impact
While
subprocess.run()is executing, the entire asyncio event loop is frozen. No other coroutines can make progress:sudo -ncompletes in milliseconds and the blocking is negligible. But ifsudohangs (misconfigured sudoers, PAM issue, NFS-mounted /etc, disk I/O pressure), the event loop freezes for up to the 5-second timeout per call.Why sudo is involved
The TOTP secret and rate-limiting state live in root-owned files under
/etc/kai/(mode 0600). The bot runs as userkaiand accesses them via sudoers-authorizedsudo -n catandsudo -n teecommands. This is a deliberate security boundary: the kai user (and inner Claude) cannot directly read the secret or tamper with the lockout state. The subprocess-based access is correct from a security standpoint; the issue is that it uses the synchronous subprocess API instead of the async one.Severity
MEDIUM - only affects deployments with TOTP enabled (opt-in). Under normal conditions the blocking is sub-millisecond and invisible. But it degrades poorly: any sudo slowdown (disk, PAM, sudoers misconfiguration) blocks all bot operations for all users, with no timeout visibility or graceful degradation.