Skip to content

traffic_server deadlocked after config reload #1298

@randall

Description

@randall

There was a configuration change(an update to parent.config) pushed to a set of servers. Shortly after the config reload(via traffic_ctl), one host's traffic_cop starting failing heartbeats. The ATS process quit serving traffic. Attaching gdb, I see a number of threads attempting to do a hostdb lookup. Filesystem has a host.db.syncing file that is dated soon after the reload.

At the time of reload, there were approximately 800 active server connections. The same configuration was applied to 23 other hosts at the same time, which successfully reloaded without issue.

syslog:

Jan  3 16:29:37 s_sys@host traffic_manager[7156]: {0x7f6ec8ffe700} NOTE: User has changed config file parent.config
Jan  3 16:29:45 s_sys@host traffic_server[7169]: {0x2aaab470c700} NOTE: loading SSL certificate configuration from /opt/user/etc/trafficserver/ssl_multicert.config
Jan  3 16:34:50 s_sys@host traffic_cop[7154]: (test) read timeout [180000 ]
Jan  3 16:34:50 s_sys@host traffic_cop[7154]: server heartbeat failed [1]
Jan  3 16:38:00 s_sys@host traffic_cop[7154]: (test) read timeout [180000 ]
Jan  3 16:38:00 s_sys@host traffic_cop[7154]: server heartbeat failed [2]

/var/cache/trafficserver:

[user@host trafficserver]$ ls -altr
total 28
drwxr-xr-x. 10 root    root     4096 Oct 17 09:00 ..
-rw-r--r--   1 user user 12029 Jan  3 16:31 host.db
drwxr-xr-x   2 user user  4096 Jan  3 16:31 .
-rw-r--r--   1 user user  4109 Jan  3 16:32 host.db.syncing

stack.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions