-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore read and write errors if vio has been cleared #1522
Conversation
Here is a workaround for issue #1401. I ran into issues with the read also coring. |
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1660/ for details. |
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1556/ for details. |
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/92/ for details. |
clang-analyzer build successful! See https://ci.trafficserver.apache.org/job/clang-analyzer-github/224/ for details. |
Looks reasonable to me. |
Testing this on docs, it's similar to #1444 which did not solve the problems for us there (completely at least), hopefully the additions to the read case helps. |
Still looking good on Docs, but would like to give it at least another 24 hours before we declare victory. |
to review #947 again, my suggestion is
Before PR#947:
The target of PR#947 is making iocore to notify EPOLLERR to SM even read.enabled is set to 0. the vc->net_read_io is designed to callback SM and Net sub-system must assure the vio can be callback. |
or we can use:
keep old condition then add new condition. |
I think, it #never calls handler with read.enabled = 0(or write.enabled). It means the handler don not want this type of event .It may never handle this event and just assert! |
@oknet Thank for the suggestion I am running in 7.1.0 in production with the change you mention above instead of this PR
|
FreeBSD build failed! See https://ci.trafficserver.apache.org/job/freebsd-github/1680/ for details. |
I updated the PR to use oknet's recommendation. I am going to test a another version of this fix to be:
|
Linux build failed! See https://ci.trafficserver.apache.org/job/linux-github/1576/ for details. |
Given the mixing of || and &&, I'd highly suggest superfluous parentheses for clarity. nm... looks like this was done. |
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/112/ for details. |
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1577/ for details. |
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/113/ for details. |
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1681/ for details. |
Actually, there is still a problem in my test after this patch according to jtest.
We got the error event and callback to SM, but SM do not want to handle this write event , because write.enabled is 0, then SM assert ! The backtrace is the same as #1531 . Here is 6.x.x
|
recommendation from Oknet
@scw00 What are |
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1684/ for details. |
|
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1580/ for details. |
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/116/ for details. |
@bryancall @scw00 after #947 , We must change the SM to handle EVENT_ERROR event just like EVENT_TIMEOUT. before #947 , an ERROR indicate a VIO error. after #947 , an ERROR indicate a VC error. And we should combine vc->read.error & vc->write.error to one event if the vc->read._cont and vc->write._cont point to same SM. It is just like the EVENT_TIMEOUT callback that we do it in UnixNetVC::mainEvent(). |
clang-analyzer build successful! See https://ci.trafficserver.apache.org/job/clang-analyzer-github/248/ for details. |
I think the issue raised by @scw00 is different. It looks like what @zwoop identified in issue #1531. In that case the write vio errored, so the write vio is being sent up as data to a handler expecting only a read vio. In the error case it shouldn't matter whether it is a read or a write vio and the error clean up should occur regardless. |
I ran my production box overnight with the latest work around. I think this change is a step in the right direction and we should merge it and bring it back to 7.1. |
Linux build successful! See https://ci.trafficserver.apache.org/job/RAT-github/2/ for details. |
Agree!! |
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont. (cherry picked from commit aee3f3b)
…s to the VConnection" this reverts PRs apache#1559, apache#1522 and apache#947 This reverts commit c1ac5f8.
This reverts PRs apache#1559, apache#1522 and apache#947 PR apache#947 made the HTTP state machine unstable and lead to crashes in production like apache#1930 apache#1559 apache#1522 apache#1531 apache#1629 This reverts commit c1ac5f8.
No description provided.