Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATS 10.0.0 crashes with segmentation fault SIG 11 #8952

Closed
NischalBhandari opened this issue Jul 11, 2022 · 6 comments
Closed

ATS 10.0.0 crashes with segmentation fault SIG 11 #8952

NischalBhandari opened this issue Jul 11, 2022 · 6 comments
Assignees

Comments

@NischalBhandari
Copy link

NischalBhandari commented Jul 11, 2022

I am trying to rewrite a header for a video streaming website where i am using a header_rewrite plugin to just convert a origin server response of 4xx code to 200 and also add certain headers. This essentially is a preflight request. This is not important but after the request is accepted and after that the video starts streaming but after some time the traffic server restarts .

The crashlog is a SIG 11 : Segmentation Fault but after that I do not know how to resolve the issue.
Just Before the crash in traffic.out file shows the following and a crash report is also generated but i don't know how to read the crashlog.

[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpTunnel.cc:1178 (producer_handler)> (http_redirect) [54] enable_redirection: [0 0 0] event: 104, state: 10
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:3062 (tunnel_handler_server)> (http) [54] [&HttpSM::tunnel_handler_server, VC_EVENT_EOS/TS_EVENT_VCONN_EOS]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:3135 (tunnel_handler_server)> (http) [54] finishing HTTP tunnel
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <Http1ServerSession.cc:121 (do_io_close)> (http_ss) [93] session close: nevtc 0x2ad9801befa0
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpTunnel.cc:1344 (consumer_handler)> (http_tunnel) [54] consumer_handler [user agent VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:3341 (tunnel_handler_ua)> (http) [54] [&HttpSM::tunnel_handler_ua, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:2716 (main_handler)> (http) [54] HTTP_TUNNEL_EVENT_DONE, 2301
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:3011 (tunnel_handler)> (http) [54] [&HttpSM::tunnel_handler, HTTP_TUNNEL_EVENT_DONE]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:8486 (clear)> (http_redirect) [PostDataBuffers::clear]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:470 (state_remove_from_list)> (http) [54] [&HttpSM::state_remove_from_list, EVENT_NONE/VC_EVENT_NONE]
[Jul 11 08:31:07.210] [ET_NET 4] DEBUG: <HttpSM.cc:7239 (kill_this)> (http_seq) [54] Logging transaction
[Jul 11 08:31:07.212] traffic_crashlo NOTE: crashlog started, target=26750, debug=false syslog=true, uid=175 euid=0
[Jul 11 08:31:07.213] traffic_crashlo NOTE: logging to 0x9fcb30
[Jul 11 08:31:07.368] traffic_crashlo ERROR: wrote crash log to /opt/trafficserver/var/log/trafficserver/crash-2022-07-11-083107.log
traffic_server: received signal 11 (Segmentation fault)
traffic_server - STACK TRACE:
/opt/trafficserver/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x8e)[0x4ea38e]
/lib64/libpthread.so.0(+0xf630)[0x2ad9668b0630]
/lib64/libc.so.6(_IO_vfprintf+0x4a79)[0x2ad96752e079]
/opt/trafficserver/lib/libtscore.so.10(_ZNK5Diags8print_vaEPKc10DiagsLevelPK14SourceLocationS1_P13__va_list_tag+0xe88)[0x2ad965436ef8]
/opt/trafficserver/lib/libtscore.so.10(_ZNK5Diags8error_vaE10DiagsLevelPK14SourceLocationPKcP13__va_list_tag+0x27)[0x2ad965437977]
/opt/trafficserver/lib/libtscore.so.10(_ZN10LogMessage14message_helperENSt6chrono8durationIlSt5ratioILl1ELl1000000EEEERKSt8functionIFvPKcP13__va_list_tagEES7_S9_+0x42)[0x2ad96546a4d2]
/opt/trafficserver/lib/libtscore.so.10(_ZN10LogMessage23standard_message_helperE10DiagsLevelRK14SourceLocationPKcP13__va_list_tag+0x55)[0x2ad96546ada5]
/opt/trafficserver/lib/libtscore.so.10(_ZN10LogMessage7messageE10DiagsLevelRK14SourceLocationPKcz+0x7d)[0x2ad96546b2cd]
/opt/trafficserver/bin/traffic_server(_ZN12LogFieldList11marshal_aggEPc+0x107)[0x69a437]
/opt/trafficserver/bin/traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x738)[0x6b3b98]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM9kill_thisEv+0x45c)[0x59133c]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x20f)[0x5919ff]
/opt/trafficserver/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x331)[0x601de1]
/opt/trafficserver/bin/traffic_server[0x84a49e]
/opt/trafficserver/bin/traffic_server[0x84a5ce]
/opt/trafficserver/bin/traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x82a)[0x850aaa]
/opt/trafficserver/bin/traffic_server(_ZN10NetHandler18process_ready_listEv+0x2d8)[0x839118]
/opt/trafficserver/bin/traffic_server(_ZN10NetHandler15waitForActivityEl+0x24a)[0x8393ba]
/opt/trafficserver/bin/traffic_server(_ZN7EThread15execute_regularEv+0x902)[0x8a3092]
/opt/trafficserver/bin/traffic_server(_ZN7EThread7executeEv+0x1c2)[0x8a3732]
/opt/trafficserver/bin/traffic_server[0x8a0e32]
/lib64/libpthread.so.0(+0x7ea5)[0x2ad9668a8ea5]
/lib64/libc.so.6(clone+0x6d)[0x2ad9675dfb0d]
@randall
Copy link
Contributor

randall commented Jul 11, 2022

demangled stack:

/opt/trafficserver/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, void*)+0x8e)[0x4ea38e]
/lib64/libpthread.so.0(+0xf630)[0x2ad9668b0630]
/lib64/libc.so.6(_IO_vfprintf+0x4a79)[0x2ad96752e079]
/opt/trafficserver/lib/libtscore.so.10(Diags::print_va(char const*, DiagsLevel, SourceLocation const*, char const*, __va_list_tag*) const+0xe88)[0x2ad965436ef8]
/opt/trafficserver/lib/libtscore.so.10(Diags::error_va(DiagsLevel, SourceLocation const*, char const*, __va_list_tag*) const+0x27)[0x2ad965437977]
/opt/trafficserver/lib/libtscore.so.10(LogMessage::message_helper(std::chrono::duration<long, std::ratio<1l, 1000000l> >, std::function<void (char const*, __va_list_tag*)> const&, char const*, __va_list_tag*)+0x42)[0x2ad96546a4d2]
/opt/trafficserver/lib/libtscore.so.10(LogMessage::standard_message_helper(DiagsLevel, SourceLocation const&, char const*, __va_list_tag*)+0x55)[0x2ad96546ada5]
/opt/trafficserver/lib/libtscore.so.10(LogMessage::message(DiagsLevel, SourceLocation const&, char const*, ...)+0x7d)[0x2ad96546b2cd]
/opt/trafficserver/bin/traffic_server(LogFieldList::marshal_agg(char*)+0x107)[0x69a437]
/opt/trafficserver/bin/traffic_server(LogObjectManager::log(LogAccess*)+0x738)[0x6b3b98]
/opt/trafficserver/bin/traffic_server(HttpSM::kill_this()+0x45c)[0x59133c]
/opt/trafficserver/bin/traffic_server(HttpSM::main_handler(int, void*)+0x20f)[0x5919ff]
/opt/trafficserver/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x331)[0x601de1]
/opt/trafficserver/bin/traffic_server[0x84a49e]
/opt/trafficserver/bin/traffic_server[0x84a5ce]
/opt/trafficserver/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)+0x82a)[0x850aaa]
/opt/trafficserver/bin/traffic_server(NetHandler::process_ready_list()+0x2d8)[0x839118]
/opt/trafficserver/bin/traffic_server(NetHandler::waitForActivity(long)+0x24a)[0x8393ba]
/opt/trafficserver/bin/traffic_server(EThread::execute_regular()+0x902)[0x8a3092]
/opt/trafficserver/bin/traffic_server(EThread::execute()+0x1c2)[0x8a3732]
/opt/trafficserver/bin/traffic_server[0x8a0e32]
/lib64/libpthread.so.0(+0x7ea5)[0x2ad9668a8ea5]
/lib64/libc.so.6(clone+0x6d)[0x2ad9675dfb0d]

@NischalBhandari
Copy link
Author

NischalBhandari commented Jul 11, 2022

As per my conversation in slack . In the diags.log This Note is also being logged

[Jul 11 22:18:11.768] [ET_NET 6] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:18:11.768] [ET_NET 6] NOTE: Cannot marshal aggregate field cquc; invalid aggregate operator: 0
[Jul 11 22:18:12.444] [ET_NET 7] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:18:13.021] [ET_NET 0] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:18:15.353] [ET_NET 7] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:18:18.263] [ET_NET 7] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:18:21.173] [ET_NET 7] NOTE: Cannot update aggregate field; invalid operator 0
.....
[Jul 11 22:21:13.924] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 925197683
[Jul 11 22:21:13.924] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:21:14.248] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 925197683

@NischalBhandari
Copy link
Author

NischalBhandari commented Jul 11, 2022

The issue was in my configuration in the logging.yaml It was a bad configuration as i was trying to create a summary log. This caused the above notes as

[Jul 11 22:21:13.924] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 925197683
[Jul 11 22:21:13.924] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 0
[Jul 11 22:21:14.248] [ET_NET 1] NOTE: Cannot update aggregate field; invalid operator 925197683

SO these have been mitigated now. But i am again seeing some crashes with I think are related to configuration as well

[Jul 11 22:39:28.361] [ET_NET 3] DEBUG: <HttpTransact.cc:2015 (OSDNSLookup)> (http_trans) [20] DNS lookup for O.S. successful IP: 45.115.218.207
[Jul 11 22:39:28.361] [ET_NET 3] DEBUG: <HttpTransact.cc:2087 (OSDNSLookup)> (http_trans) Next action SM_ACTION_API_OS_DNS; HandleCacheOpenReadHit
[Jul 11 22:39:28.361] [ET_NET 3] DEBUG: <HttpSM.cc:7540 (call_transact_and_set_next_state)> (http) [20] State Transition: SM_ACTION_DNS_LOOKUP -> SM_ACTION_API_OS_DNS
[Jul 11 22:39:28.363] traffic_crashlo NOTE: crashlog started, target=5423, debug=false syslog=true, uid=175 euid=0
[Jul 11 22:39:28.364] traffic_crashlo NOTE: logging to 0x98ab30
[Jul 11 22:39:28.504] traffic_crashlo ERROR: wrote crash log to /opt/trafficserver/var/log/trafficserver/crash-2022-07-11-223928.log
traffic_server: received signal 11 (Segmentation fault)
traffic_server - STACK TRACE:
/opt/trafficserver/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x8e)[0x4ea38e]
/lib64/libpthread.so.0(+0xf630)[0x2b47b3ca9630]
/opt/trafficserver/bin/traffic_server(_ZN12HttpTransact22HandleCacheOpenReadHitEPNS_5StateE+0x22)[0x5e54b2]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM32call_transact_and_set_next_stateEPFvPN12HttpTransact5StateEE+0x1e1)[0x57a911]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM17state_api_calloutEiPv+0x811)[0x58db41]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x5eb)[0x599f8b]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM16do_hostdb_lookupEv+0x255)[0x57d6d5]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x7d2)[0x59a172]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM17state_api_calloutEiPv+0x811)[0x58db41]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x5eb)[0x599f8b]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM17state_api_calloutEiPv+0x811)[0x58db41]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x5eb)[0x599f8b]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM21state_cache_open_readEiPv+0x3b3)[0x5902b3]
/opt/trafficserver/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x198)[0x591988]
/opt/trafficserver/bin/traffic_server(_ZN11HttpCacheSM21state_cache_open_readEiPv+0x343)[0x6135e3]
/opt/trafficserver/bin/traffic_server(_ZN7CacheVC8callcontEi+0xaf)[0x7746cf]
/opt/trafficserver/bin/traffic_server(_ZN7CacheVC17openReadStartHeadEiP5Event+0x605)[0x7700c5]
/opt/trafficserver/bin/traffic_server(_ZN7CacheVC14handleReadDoneEiP5Event+0x26d)[0x74a54d]
/opt/trafficserver/bin/traffic_server(_ZN19AIOCallbackInternal11io_completeEiPv+0x39)[0x74df89]
/opt/trafficserver/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x30f)[0x8a195f]
/opt/trafficserver/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x186)[0x8a1f06]
/opt/trafficserver/bin/traffic_server(_ZN7EThread15execute_regularEv+0x18d)[0x8a291d]
/opt/trafficserver/bin/traffic_server(_ZN7EThread7executeEv+0x1c2)[0x8a3732]
/opt/trafficserver/bin/traffic_server[0x8a0e32]
/lib64/libpthread.so.0(+0x7ea5)[0x2b47b3ca1ea5]
/lib64/libc.so.6(clone+0x6d)[0x2b47b49d8b0d]

I am monitoring the situation and will update .

@NischalBhandari
Copy link
Author

NischalBhandari commented Jul 12, 2022

so i did a gdb debugging for the crash and using bt in gdb i got

#0  response_get (this=<optimized out>) at /home/nischal/trafficserver/proxy/hdrs/HTTP.h:1388
#1  HttpTransact::HandleCacheOpenReadHit (s=0x2b5da8e4fc20) at HttpTransact.cc:2803
#2  0x000000000057a911 in HttpSM::call_transact_and_set_next_state (this=0x2b5da8e4fb20, f=<optimized out>) at HttpSM.cc:7534
#3  0x000000000058db41 in HttpSM::state_api_callout (this=0x2b5da8e4fb20, event=<optimized out>, data=<optimized out>) at HttpSM.cc:1620
#4  0x0000000000599f8b in HttpSM::set_next_state (this=0x2b5da8e4fb20) at HttpSM.cc:7576
#5  0x000000000057d6d5 in HttpSM::do_hostdb_lookup (this=this@entry=0x2b5da8e4fb20) at HttpSM.cc:4315
#6  0x000000000059a172 in HttpSM::set_next_state (this=0x2b5da8e4fb20) at HttpSM.cc:7877
#7  0x000000000058db41 in HttpSM::state_api_callout (this=0x2b5da8e4fb20, event=<optimized out>, data=<optimized out>) at HttpSM.cc:1620
#8  0x0000000000599f8b in HttpSM::set_next_state (this=0x2b5da8e4fb20) at HttpSM.cc:7576
#9  0x000000000058db41 in HttpSM::state_api_callout (this=0x2b5da8e4fb20, event=<optimized out>, data=<optimized out>) at HttpSM.cc:1620
#10 0x0000000000599f8b in HttpSM::set_next_state (this=0x2b5da8e4fb20) at HttpSM.cc:7576
#11 0x00000000005902b3 in HttpSM::state_cache_open_read (this=0x2b5da8e4fb20, event=1102, data=<optimized out>) at HttpSM.cc:2672
#12 0x0000000000591988 in HttpSM::main_handler (this=0x2b5da8e4fb20, event=1102, data=0x2b5da8e519f0) at HttpSM.cc:2736
#13 0x00000000006135e3 in handleEvent (data=0x2b5da8e519f0, event=1102, this=0x2b5da8e4fb20) at /home/nischal/trafficserver/iocore/eventsystem/I_Continuation.h:219
#14 HttpCacheSM::state_cache_open_read (this=0x2b5da8e51988, event=<optimized out>, data=<optimized out>) at HttpCacheSM.cc:118
#15 0x00000000007746cf in handleEvent (data=0x2b5d7c065880, event=1102, this=0x2b5da8e51988) at /home/nischal/trafficserver/iocore/eventsystem/I_Continuation.h:219
#16 CacheVC::callcont (this=this@entry=0x2b5d7c065880, event=event@entry=1102) at P_CacheInternal.h:650
#17 0x000000000076f45f in CacheVC::openReadStartEarliest (this=0x2b5d7c065880) at CacheRead.cc:996
#18 0x000000000074a54d in CacheVC::handleReadDone (this=0x2b5d7c065880, event=<optimized out>, e=<optimized out>) at Cache.cc:2289
#19 0x000000000074df89 in AIOCallbackInternal::io_complete (this=0x2b5d7c065a10, event=<optimized out>, data=<optimized out>) at /home/nischal/trafficserver/iocore/aio/P_AIO.h:122
#20 0x00000000008a195f in handleEvent (data=0x1d61b00, event=1, this=<optimized out>) at I_Continuation.h:219
#21 EThread::process_event (this=0x2b5d5c21a010, e=0x1d61b00, calling_code=1) at UnixEThread.cc:153
#22 0x00000000008a1f06 in EThread::process_queue (this=this@entry=0x2b5d5c21a010, NegativeQueue=NegativeQueue@entry=0x2b5d5cb60330, ev_count=ev_count@entry=0x2b5d5cb6032c,
    nq_count=nq_count@entry=0x2b5d5cb60328) at UnixEThread.cc:188
#23 0x00000000008a291d in EThread::execute_regular (this=this@entry=0x2b5d5c21a010) at UnixEThread.cc:244
#24 0x00000000008a3732 in EThread::execute (this=0x2b5d5c21a010) at UnixEThread.cc:349
#25 0x00000000008a0e32 in spawn_thread_internal (a=0x1d5cba0) at Thread.cc:79
#26 0x00002b5d54e77ea5 in start_thread () from /lib64/libpthread.so.0

@bneradt
Copy link
Contributor

bneradt commented Jul 12, 2022

Hi @NischalBhandari . Thank you for the update! This latter crash is different than the first. Can you please file a separate issue for that newer crash?

For the original logging crash, can you provide the logging.yaml configuration that triggered the crash? We should address things so a misconfiguration in logging does not trigger a crash.

@NischalBhandari
Copy link
Author

The error configuration that caused the issue in the logging.yaml was as follows So i think we can close the issue for this error.
I was trying to create the Summarizing Number of Requests and Total Bytes Sent Every 10 Seconds but with the client request URL which is a bad configuration as i think it could not summarize all the client request in the single location ( i Guess). Hence the error in diags and a subsequent crash.

#originwisesummary
    - name: originsummary
      format: '%<LAST(cqtq)> %<COUNT(*)> %<SUM(psql)> %<cquc>'
      interval: 10

logs 
   - filename: originsum
      format: originsummary
      mode: ascii

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants