Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRITE_COMPLETE event in http server open causes traffic server to abort #1930

Closed
vmamidi opened this issue May 17, 2017 · 45 comments
Closed

Comments

@vmamidi
Copy link
Contributor

vmamidi commented May 17, 2017

Stack trace :
ink_abort(char const*, ...)
HttpSM::state_http_server_open(int, void*)
HttpSM::main_handler(int, void*)
write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)
NetHandler::mainNetEvent(int, Event*)
EThread::execute()

and corresponding error messages:
traffic_server[33593]: {0x2aaab6d0c700} ERROR: [HttpSM::state_http_server_open] Unknown event: 103
traffic_server[33593]: FATAL: HttpSM.cc:1798: failed assertion 0

We are seeing this in 7.0.0

@scw00
Copy link
Member

scw00 commented May 17, 2017

Print history in HttpSM

@vmamidi
Copy link
Contributor Author

vmamidi commented May 18, 2017

i dont have any more information

@scw00
Copy link
Member

scw00 commented May 18, 2017

:(.....sad!!

@bryancall bryancall added the HTTP label May 23, 2017
@bryancall
Copy link
Contributor

@vmamidi What version is this for?

@vmamidi
Copy link
Contributor Author

vmamidi commented May 23, 2017

we are seeing this on 7.0.0

@bryancall
Copy link
Contributor

@vmamidi Can you post a better backtrace with line numbers in the code? bt full would be great!

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@scw00 I was able to get the history. Here is the history for ATS on 7.1.x branch.
[0] =
fileline = 0x55a3621f7912
*0x55a3621f7912 = string(0x55a3621f7912, [HttpSM.cc:630])
event = 100
reentrancy = 2
[1] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 3
[2] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 3
[3] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 4
[4] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 4
[5] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 5
[6] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 5
[7] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 6
[8] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 6
[9] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 7
[10] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 7
[11] =
fileline = 0x55a3621fa4bc
*0x55a3621fa4bc = string(0x55a3621fa4bc, [HttpSM.cc:7373])
event = 65535
reentrancy = 7
[12] =
fileline = 0x55a3621f2ebd
*0x55a3621f2ebd = string(0x55a3621f2ebd, [HttpCacheSM.cc:118])
event = 1103
reentrancy = -1
[13] =
fileline = 0x55a3621f84d6
*0x55a3621f84d6 = string(0x55a3621f84d6, [HttpSM.cc:2577])
event = 1103
reentrancy = 8
[14] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 9
[15] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 9
[16] =
fileline = 0x55a3621f7d6d
*0x55a3621f7d6d = string(0x55a3621f7d6d, [HttpSM.cc:1339])
event = 60000
reentrancy = 10
[17] =
fileline = 0x55a3621f7d98
*0x55a3621f7d98 = string(0x55a3621f7d98, [HttpSM.cc:1379])
event = 60000
reentrancy = 10
[18] =
fileline = 0x55a3621fa49e
*0x55a3621fa49e = string(0x55a3621fa49e, [HttpSM.cc:7359])
event = 65535
reentrancy = 10
[19] =
fileline = 0x55a3621fa4e9
*0x55a3621fa4e9 = string(0x55a3621fa4e9, [HttpSM.cc:7488])
event = 65535
reentrancy = 10
[20] =
fileline = 0x55a3621f2f73
*0x55a3621f2f73 = string(0x55a3621f2f73, [HttpCacheSM.cc:177])
event = 1108
reentrancy = -1
[21] =
fileline = 0x55a3621f8429
*0x55a3621f8429 = string(0x55a3621f8429, [HttpSM.cc:2458])
event = 1108
reentrancy = 11
[22] =
fileline = 0x55a3621fa4da
*0x55a3621fa4da = string(0x55a3621fa4da, [HttpSM.cc:7386])
event = 65535
reentrancy = 11
[23] =
fileline = 0x55a3621f9263
*0x55a3621f9263 = string(0x55a3621f9263, [HttpSM.cc:4578])
event = 0
reentrancy = 7
[24] =
fileline = 0x55a3621f8055
*0x55a3621f8055 = string(0x55a3621f8055, [HttpSM.cc:1980])
event = 103
reentrancy = 1
[25] =
fileline = 0x55a3621f7fd2
*0x55a3621f7fd2 = string(0x55a3621f7fd2, [HttpSM.cc:1810])
event = 104
reentrancy = 1
[26] =
fileline = 0x55a3621f96cc
*0x55a3621f96cc = string(0x55a3621f96cc, [HttpSM.cc:5485])
event = 104
reentrancy = 1
[27] =
fileline = 0x55a3621fa4da
*0x55a3621fa4da = string(0x55a3621fa4da, [HttpSM.cc:7386])
event = 65535
reentrancy = 1
[28] =
fileline = 0x55a3621f8055
*0x55a3621f8055 = string(0x55a3621f8055, [HttpSM.cc:1980])
event = 103
reentrancy = 1
[29] =
fileline = 0x55a3621f7fd2
*0x55a3621f7fd2 = string(0x55a3621f7fd2, [HttpSM.cc:1810])
event = 104
reentrancy = 1
[30] =
fileline = 0x55a3621f96cc
*0x55a3621f96cc = string(0x55a3621f96cc, [HttpSM.cc:5485])
event = 104
reentrancy = 1
[31] =
fileline = 0x55a3621fa49e
*0x55a3621fa49e = string(0x55a3621fa49e, [HttpSM.cc:7359])
event = 65535
reentrancy = 1
[32] =
fileline = 0x55a3621f7fd2
*0x55a3621f7fd2 = string(0x55a3621f7fd2, [HttpSM.cc:1810])
event = 3
reentrancy = 1
[33] =
fileline = 0x55a3621f96cc
*0x55a3621f96cc = string(0x55a3621f96cc, [HttpSM.cc:5485])
event = 3
reentrancy = 1
[34] =
fileline = 0x55a3621fa4da
*0x55a3621fa4da = string(0x55a3621fa4da, [HttpSM.cc:7386])
event = 65535
reentrancy = 1
[35] =
fileline = 0x55a3621f7ef5
*0x55a3621f7ef5 = string(0x55a3621f7ef5, [HttpSM.cc:1716])
event = 200
reentrancy = 2
[36] =
fileline = 0x55a3621f8055
*0x55a3621f8055 = string(0x55a3621f8055, [HttpSM.cc:1980])
event = 103
reentrancy = 1
[37] =
fileline = 0x55a3621f7ef5
*0x55a3621f7ef5 = string(0x55a3621f7ef5, [HttpSM.cc:1716])
event = 500
reentrancy = 1

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

great!

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

Ok I got it.

The reason is that the HttpSM::state_send_server_request_header may receive the ERROR event when read side down. Then ats retry another server without cleaning the write side.

It may reference to #1629 and #1559 .

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

@oknet

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

looks like HttpSM is receiving VC_EVENT_ERROR and also WRITE_COMPLETE event after EOS.

@oknet
Copy link
Member

oknet commented Jun 1, 2017

ATS do a hostdb lookup to retry the next ip by round robin.

[31] =
fileline = 0x55a3621fa49e
*0x55a3621fa49e = string(0x55a3621fa49e, [HttpSM.cc:7359])
event = 65535
reentrancy = 1

The hostdb callback at [37]

[37] =
fileline = 0x55a3621f7ef5
*0x55a3621f7ef5 = string(0x55a3621f7ef5, [HttpSM.cc:1716])
event = 500
reentrancy = 1

I don't know the reason why [32] shows in the history, it should blocked by pending_action.

Also, I noticed :

traffic_server[33593]: {0x2aaab6d0c700} ERROR: [HttpSM::state_http_server_open] Unknown event: 103

The event is changed from 500 to 103. What is that means ?
May be one callback to HttpSM without get the mutex lock first ? HostDB bug ?

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@scw00 I think we should cleanup server_entry at HttpSM.cc:7359, just do the same thing at HttpSM.cc:7386.

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi Did you seen "Unknown event: 500" ?

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet yes, it is different from the original stack trace that I originally posted but it still falls under the same category of receiving the wrong event.
Log is
traffic_server[3732]: {0x2b23e4f0c700} ERROR: [HttpSM::state_http_server_open] Unknown event: 500

and the Stack trace is:

abort
ink_abort
HttpSM::state_http_server_open(int, void*)
HttpSM::main_handler(int, void*)
handleEvent
reply_to_cont
HostDBContinuation::dnsEvent(int, HostEnt*)
DNSEntry::postEvent(int, Event*)
EThread::process_event(Event*, int)
EThread::execute()
spawn_thread_internal

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi I think I found the reason, a patch will be pushed later. Can you reproduce it and have a try ?

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

what do you think is the problem?

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi The HttpSM retry to connect to other server IPs by DNS rr. (The [31] shows)
The HttpSM should waiting for HostDB callback and go on.
But the server_vc is still not closed, then the NetHandler got an error from the socket fd and callback to HttpSM. (The [32] shows)

From the HttpSM::history, the [32] changes the state of HttpSM and reset the default handler of HttpSM.
It makes the HttpSM goes to wrong state and the HostDB callback goes to wrong default handler.

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet HttpSM::handle_server_setup_error at [30] should clear all of that

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi No, the [30] does not cleanup server entry.

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

Can we do clean in HttpSM::handle_server_setup_error ?

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@scw00 I don't know. :-(

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet @scw00
I don't think this is related to just DNS.

look at this stack trace.

abort
ink_abort(char const*, ...)
HttpSM::state_http_server_open(int, void*)
HttpSM::main_handler(int, void*)
write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)
NetHandler::mainNetEvent(int, Event*)
EThread::execute()

and corresponding log message.
traffic_server[29074]: {0x2aaab730c700} ERROR: [HttpSM::state_http_server_open] Unknown event: 103

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

Also, i am not pretty sure but i dont think this was an issue in earlier versions of ATS.

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi
In the [31], the default handler is set to HttpSM::state_hostdb_lookup.

The NetHandler callback ERROR to HttpSM [32] and then run into [33] and [34].
In the [34], the default handler changed by:

7250   case HttpTransact::SM_ACTION_ORIGIN_SERVER_OPEN: {
7251     if (congestionControlEnabled && (t_state.congest_saved_next_action == HttpTransact::SM_ACTION_UNDEFINED)) {
7252       t_state.congest_saved_next_action = HttpTransact::SM_ACTION_ORIGIN_SERVER_OPEN;
7253       HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_congestion_control_lookup);
7254       if (!do_congestion_control_lookup()) {
7255         break;
7256       }
7257     }
7258     HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_http_server_open);

In the [37], the HostDB callback to HttpSM and the default handler should be HttpSM::state_hostdb_lookup but it is changed to HttpSM::state_http_server_open.

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi Yes, only after #947 merged.

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

Hmm! In my opinion, we should clean up the server entry when we decide to try again, regardless of whether the server crashes.

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet yes, i agree with related to history that i have pasted here but i dont think that is the only one of the scenarios.

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@scw00 yes, i agreee

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet here is another history.

        [0] = 
           fileline = 0x2aaaaaf2caef
              *0x2aaaaaf2caef = string(0x2aaaaaf2caef, [HttpSM.cc:628])
           event = 100
           reentrancy = 2
        [1] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 3
        [2] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 3
        [3] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 4
        [4] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 4
        [5] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 5
        [6] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 5
        [7] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 6
        [8] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 6
        [9] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 7
        [10] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 7
        [11] = 
           fileline = 0x2aaaaaf2d046
              *0x2aaaaaf2d046 = string(0x2aaaaaf2d046, [HttpSM.cc:7260])
           event = 65535
           reentrancy = 7
        [12] = 
           fileline = 0x2aaaaaf257e1
              *0x2aaaaaf257e1 = string(0x2aaaaaf257e1, [HttpCacheSM.cc:118])
           event = 1103
           reentrancy = -1
        [13] = 
           fileline = 0x2aaaaaf2cfdf
              *0x2aaaaaf2cfdf = string(0x2aaaaaf2cfdf, [HttpSM.cc:2573])
           event = 1103
           reentrancy = 8
        [14] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 9
        [15] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 9
        [16] = 
           fileline = 0x2aaaaaf2d147
              *0x2aaaaaf2d147 = string(0x2aaaaaf2d147, [HttpSM.cc:1337])
           event = 60000
           reentrancy = 10
        [17] = 
           fileline = 0x2aaaaaf2ce6a
              *0x2aaaaaf2ce6a = string(0x2aaaaaf2ce6a, [HttpSM.cc:1377])
           event = 60000
           reentrancy = 10
        [18] = 
           fileline = 0x2aaaaaf2d028
              *0x2aaaaaf2d028 = string(0x2aaaaaf2d028, [HttpSM.cc:7246])
           event = 65535
           reentrancy = 10
        [19] = 
           fileline = 0x2aaaaaf2d028
              *0x2aaaaaf2d028 = string(0x2aaaaaf2d028, [HttpSM.cc:7246])
           event = 65535
           reentrancy = 10
        [20] = 
           fileline = 0x2aaaaaf2d073
              *0x2aaaaaf2d073 = string(0x2aaaaaf2d073, [HttpSM.cc:7375])
           event = 65535
           reentrancy = 10
        [21] = 
           fileline = 0x2aaaaaf2581d
              *0x2aaaaaf2581d = string(0x2aaaaaf2581d, [HttpCacheSM.cc:177])
           event = 1108
           reentrancy = -1
        [22] = 
           fileline = 0x2aaaaaf2cac3
              *0x2aaaaaf2cac3 = string(0x2aaaaaf2cac3, [HttpSM.cc:2454])
           event = 1108
           reentrancy = 11
        [23] = 
           fileline = 0x2aaaaaf2d064
              *0x2aaaaaf2d064 = string(0x2aaaaaf2d064, [HttpSM.cc:7273])
           event = 65535
           reentrancy = 11
        [24] = 
           fileline = 0x2aaaaaf2c6e4
              *0x2aaaaaf2c6e4 = string(0x2aaaaaf2c6e4, [HttpSM.cc:4545])
           event = 0
           reentrancy = 7
        [25] = 
           fileline = 0x2aaaaaf2ceb2
              *0x2aaaaaf2ceb2 = string(0x2aaaaaf2ceb2, [HttpSM.cc:1978])
           event = 103
           reentrancy = 1
        [26] = 
           fileline = 0x2aaaaaf2c7c6
              *0x2aaaaaf2c7c6 = string(0x2aaaaaf2c7c6, [HttpSM.cc:5621])
           event = 65535
           reentrancy = 1
        [27] = 
           fileline = 0x2aaaaaf2c90c
              *0x2aaaaaf2c90c = string(0x2aaaaaf2c90c, [HttpSM.cc:3465])
           event = 105
           reentrancy = 0
        [28] = 
           fileline = 0x2aaaaaf2ceb2
              *0x2aaaaaf2ceb2 = string(0x2aaaaaf2ceb2, [HttpSM.cc:1978])
           event = 105
           reentrancy = 1
        [29] = 
           fileline = 0x2aaaaaf2cae0
              *0x2aaaaaf2cae0 = string(0x2aaaaaf2cae0, [HttpSM.cc:5384])
           event = 105
           reentrancy = 1
        [30] = 
           fileline = 0x2aaaaaf2c588
              *0x2aaaaaf2c588 = string(0x2aaaaaf2c588, [HttpSM.cc:3585])
           event = 105
           reentrancy = 1
        [31] = 
           fileline = 0x2aaaaaf2cf0b
              *0x2aaaaaf2cf0b = string(0x2aaaaaf2cf0b, [HttpSM.cc:2743])
           event = 2301
           reentrancy = 2
        [32] = 
           fileline = 0x2aaaaaf2ced0
              *0x2aaaaaf2ced0 = string(0x2aaaaaf2ced0, [HttpSM.cc:5272])
           event = 0
           reentrancy = 2
        [33] = 
           fileline = 0x2aaaaaf2d064
              *0x2aaaaaf2d064 = string(0x2aaaaaf2d064, [HttpSM.cc:7273])
           event = 65535
           reentrancy = 2
        [34] = 
           fileline = 0x2aaaaaf2cf8c
              *0x2aaaaaf2cf8c = string(0x2aaaaaf2cf8c, [HttpSM.cc:1714])
           event = 103

Stack trace:

abort
ink_abort
HttpSM::state_http_server_open(int, void*)
HttpSM::main_handler(int, void*)
handleEvent
write_signal_and_update
write_signal_done(int, NetHandler*, UnixNetVConnection*)
write_to_net_io
NetHandler::mainNetEvent(int, Event*)

and corresponding log message:

traffic_server[33674]: {0x2aaab470c700} ERROR: [HttpSM::state_http_server_open] Unknown event: 103

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@vmamidi What version is the last HttpSM::history for?

@oknet
Copy link
Member

oknet commented Jun 1, 2017

[25] WRITE_COMPLETE
[27] TIMEOUT
[31] HTTP_TUNNEL_EVENT_DONE

Is this a POST request ?

This is a retry after transform finished if it is a POST request with post transform enabled.
It is a different issue.

The PR #2034 is only for the first HttpSM::history.

@scw00
Copy link
Member

scw00 commented Jun 1, 2017

The post retry has broken currently because tunnel has consumed all post data when os down. So we can see [27] TIMEOUT.

@oknet
Copy link
Member

oknet commented Jun 1, 2017

The transformed data is dropped by HttpSM if server failed. The HttpSM can not rebuild the tunnel chain if retry to connect server.

The server connection retry mechanism has some bug, my suggestion is set the retry to 0 to disable the mechanism.

proxy.config.http.connect_attempts_max_retries INT 0
proxy.config.http.connect_attempts_max_retries_dead_server INT 0

If congestion.config is used then:

live_os_conn_retries and dead_os_conn_retries INT 0

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 1, 2017

@oknet it is 7.0.0.

Yes, ATS connection retries are broken and could cause a crash. This is the reason why I was saying having a fix while doing DNS which only solve one scenario.

@scw00
Copy link
Member

scw00 commented Jun 5, 2017

In the second history, we can see that client post timeout ([27]) and we send the 408 to client. Then server side will receive timeout and retry [28].

The problem is we did not close the vc directly.

That was fixed in 8.0.x

@oknet
Copy link
Member

oknet commented Jun 5, 2017

@vmamidi please try to fix the 2nd scenario by backport #1583 .

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 5, 2017

@oknet @scw00 @zwoop I am okay with having patches for each of the histories that we have on this issue but I am worried that state machine can receive events for past states that too because of something we did in #947 as pointed out by @oknet. As of now, I don't think state machine is robust enough to handle events for past states but if that is expected behavior we should make sure that HttpSM is robust enough to handle that.

@vmamidi
Copy link
Contributor Author

vmamidi commented Jun 5, 2017

@oknet @scw00 @zwoop @bryancall @SolidWallOfCode I understand the scenario mentioned in #947 but, state machine now can receive unexpected events. Now, we have to either fix the state machine with the new expectation or revert #947 and come up with a different solution which does not break the expectation and fixes the mentioned scenario. I personally do not think the current approach of leaving the state machine in not so robust state is a good idea.

@zwoop
Copy link
Contributor

zwoop commented Jun 5, 2017

This might be worthwhile to take to the mailing list (dev@trafficserver). I'm ok either way, but I think it'd be better discussed / decided on the mailing list (I'm not sure who, if any, actually read these Issue comments :-).

@zwoop zwoop added this to the 7.1.0 milestone Jun 6, 2017
@zwoop zwoop added this to Critical issues in 7.x releases Jun 6, 2017
@zwoop
Copy link
Contributor

zwoop commented Jun 7, 2017

Everyone: I think we're to the point now of considering backing out #947. Is this agreeable by everyone?

@scw00
Copy link
Member

scw00 commented Jun 8, 2017

Agree!

@vmamidi vmamidi mentioned this issue Jun 8, 2017
@oknet
Copy link
Member

oknet commented Jun 8, 2017

+1

@zwoop zwoop removed this from Critical issues in 7.x releases Jun 8, 2017
vmamidi added a commit to vmamidi/trafficserver that referenced this issue Jun 8, 2017
This reverts PRs apache#1559, apache#1522 and apache#947

PR apache#947 made the HTTP state machine unstable and lead to crashes in production like apache#1930 apache#1559 apache#1522 apache#1531 apache#1629

This reverts commit c1ac5f8.
zwoop pushed a commit that referenced this issue Jun 8, 2017
This reverts PRs #1559, #1522 and #947

PR #947 made the HTTP state machine unstable and lead to crashes in production like #1930 #1559 #1522 #1531 #1629

This reverts commit c1ac5f8.
@zwoop zwoop removed this from the 7.1.0 milestone Jun 23, 2017
@zwoop
Copy link
Contributor

zwoop commented Jun 23, 2017

I'm removing the 7.1.0 Milestone from this Issue, but keeping this open for now as there's valuable discussions here.

@bryancall
Copy link
Contributor

If this is still an issue please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants