New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nicer error handling after eof, warning for second response being created #2323
Nicer error handling after eof, warning for second response being created #2323
Conversation
1-2. Trying to make another response should probably just fail, especially if the headers of the previous one have already been sent. Perhaps this can be handled if the original one was never actually sent anywhere. 3 - I believe you are right. IDLE means that no request is being served but that the pipeline is still alive. I am not exactly sure why that also needs to test for |
Thank you! Currently, the exception raised for sending some data after |
Like raising an exception when the user is trying to create second response and the first response has sent some data? |
Found a tricky problem that in some circumstances, if an exception raised during the making response, the exception handler would try to generate another response which can lead to the exception again. Have no idea how to make progress so far. Probably need to look into the exception handling mechanism, which is complicated. |
I would like to propose the change included in this PR for issues: #2219 and #2197 Basically, it prevents second response object being created from |
Agreed. Those can be tackled in this PR.
I am not sure what you mean by fail. I agree that we should not allow multiple response instances to be generated. As the original use case from the issue shows, weird behaviors arise. However, I am not sure that raising an exception is appropriate either. It sounds like it might an area ripe for race conditions as the sending of the response might be in different states. The exception might be thrown inconsistently during the response. Therefore, I am more in favor of the idea that it either fails silently to the logs (probably also difficult to debug), or just return the same response (as mentioned, what would the memory impact look like?) Clearly any attempt to change the headers after the first byte was sent should fail. Again, should it sack the whole response or just do it to the logs? I am more inclined to allow the response to complete. Another though popped into my head. I am not sure this is a road we want to go down, but at startup we already examine route handlers to see what kind of responses they might send. Perhaps we piggyback off that to inspect if there might be the case for multiple responses. If we throw an exception, I would rather do it at startup. |
Where could we end up in this situation? If a handler attempts to start another response or returns something after using Now, in either of these situations, what to do with the response in progress? I believe that in any case, if the handler has not signaled EOF yet, it is wrong for Sanic to complete the response (even if it can, depending on content-length header). It is better to disconnect the TCP connection, which will cause an error in browser that is not pretty but it is better than giving "successful" 200 responses with incomplete data. On the other hand, if the handler has signaled EOF, from client perspective there is nothing wrong, and Sanic can just silently log the error and even keep the keep-alive alive. A bit of risky business there in case we have any logic errors in Sanic, so it might still be safer to disable keep-alive at this point. No error on client but it will have to make another connection. |
Something like this is what I had in mind: diff --git a/sanic/mixins/routes.py b/sanic/mixins/routes.py
index a543a05..667ad64 100644
--- a/sanic/mixins/routes.py
+++ b/sanic/mixins/routes.py
@@ -20,6 +20,7 @@ from sanic.exceptions import (
FileNotFound,
HeaderNotFound,
InvalidUsage,
+ SanicException,
)
from sanic.handlers import ContentRangeHandler
from sanic.log import error_logger
@@ -934,6 +935,7 @@ class RouteMixin:
def _get_response_types(self, node):
types = set()
+ responds = 0
class HttpResponseVisitor(NodeVisitor):
def visit_Return(self, node: Return) -> Any:
@@ -954,6 +956,14 @@ class RouteMixin:
except AttributeError:
...
- HttpResponseVisitor().visit(node)
+ def visit_Call(self, node) -> Any:
+ nonlocal responds
+
+ responds += node.func.attr == "respond"
+ HttpResponseVisitor().visit(node)
+ if responds > 1:
+ raise SanicException(
+ "Multiple streaming responses identified in handler"
+ )
return types Obviously we would need to work on this to look at scoping, but this is the idea. |
@ahopkins Is that Visitor pattern? I am too tired to understand that code now. |
Yeah. It's a super simple method to look for callables named "respond". Just meant as an illustration, not a complete solution. |
@ahopkins Do you think we should still handler the second response situation beside the code inspection for the user specified request handler? One reason I can think of is that second responding not only happens in the request handler, but also in Sanic's internal code, for example exception handler. Consider the code below: @app.get("/")
async def handler(request: Request):
resp1: HTTPResponse = await request.respond()
await resp1.send(b"Now I'm free, free-falling")
raise SanicException("Some text here.")
|
Currently, I plan to make the following changes as we discussed:
Does it sound good for you guys? @Tronic @ahopkins Thanks for feedbacks! |
I believe there should be no effort to do any of this. 2a may lead to problematic responses that accidentally combine unrelated headers, status or content types when the latter response assumes to have the default values. 2b should In 2b, as a practical example:
A very similar but maybe more common flow follows if step 3 is an exception (e.g. cannot read more data from database) and the exception is handled by Sanic, and then in step 4 the client silently appends the error message to the file it downloads, but there is no step 5 because the exception caused the handler to terminate. The only case of another response that could be accepted is when no response packets have been sent to the client yet, and in that case this might be best implemented in http.py: def respond(self, response: BaseHTTPResponse) -> BaseHTTPResponse:
"""
Initiate new streaming response.
Nothing is sent until the first send() call on the returned object, and
calling this function multiple times will just alter the response to be
given.
"""
if self.stage is not Stage.HANDLER:
self.stage = Stage.FAILED
raise RuntimeError("Response already started")
# Disconnect any earlier but unused response object (this part is new code)
if self.response is not None:
self.response.stream = None
# Connect and return the response
self.response, response.stream = response, self
return response The ASGI mode does not use http.py and that backend might be sending the headers earlier anyway (I don't recall, @ahopkins might), but in any case would then need separate handling of this to safely support replacing unused response objects with new ones. # This logic of determining which response to use is subject to change
if response is None:
response = (self.stream and self.stream.response) or HTTPResponse( The above part in request.py is written by me and seems to be a part of the problem. I now believe it is wrong to use an existing response object (this would silently ignore any status, headers and content type parameters).
This is a good thing, and is what I proposed in the previous message. |
I would very much like to see this PR included in 21.12 LTS. Looks like it fixes some rather nasty and dangerous bugs. |
I am for getting this one in. We have time, but it is December now, so that means we need to finish this in the next week or so. I agree that the checks on the subsequent respond calls are unnecessary and too complicated. I am okay with raising an error there, but does bring into question the existing response. resp1 = await request.respond(...)
await resp.send(...)
resp2 = await request.respond(...) Raising an exception here would be weird. Headers have been sent. The client is already expecting a complete response. This is why I was suggesting to return the same instance and have some sort of logging only failure. I suppose it is really the same as this: resp = await request.respond()
await resp.send(...)
raise Exception(...) So, ultimately I agree that is probably the correct approach. Plus it should be simpler to implement. @ChihweiLHBird As for Item 1 (code inspection), if we can sneak that in great. But do not focus on that. It is much more important to get the rest of this right and in 21.12. If the inspection needs to happen in 22.3, then that is fine with me. |
Correction, now that I am looking at this and playing again I remember that this fails anyway: resp1 = await request.respond(...)
await resp.send(...)
resp2 = await request.respond(...) The use case that is the problem is instantiating the response twice before anything is sent which seems to work okay client side but fails server side. In that case, it really should be fine to throw the exception the second time |
Exactly, that's what I concerned about; some exception handlers not necessarily have to return a response but might have some business logic for the exception, like sending a report to an email address or log it somewhere else. And I am wondering whether it is worth to still execute the handler without sending its response. |
I am not sure that this have ever been an express supported use case for exception handlers. This is precisely the sort of thing that |
I spent a little time hacking on the code inspection. Again, this is not necessarily meant to be included here, but it seems as good a place as any to keep it for now. Source code for respond inspectorclass Visitor(ast.NodeVisitor):
def __init__(self, tree, assignments) -> None:
super().__init__()
self.respond_refs = 0
self.tree = tree
self.assignments = assignments
@classmethod
def load(cls, handler):
src = dedent(getsource(handler))
tree = ast.parse(src)
node = tree.body[0]
idx = 1 if node.args.args[0].arg == "self" else 0
assignments = [(node.args.args[idx].arg, "respond")]
return cls(tree, assignments)
def count_refs(self):
self.visit(self.tree)
return self.respond_refs
def visit_Assign(self, node: ast.Assign) -> Any:
for name, attr in self.assignments:
if (
isinstance(node.value, ast.Attribute)
and node.value.value.id == name
and node.value.attr == attr
) or (
isinstance(node.value, ast.Name)
and not attr
and node.value.id == name
):
for target in node.targets:
self.assignments.append((target.id, None))
def visit_Call(self, node: ast.Call) -> Any:
for name, attr in self.assignments:
if (
attr
and isinstance(node.func, ast.Attribute)
and node.func.value.id == name
and node.func.attr == attr
):
self.respond_refs += 1
elif (
not attr
and isinstance(node.func, ast.Name)
and node.func.id == name
):
self.respond_refs += 1 Current limitations:
What it does handle: @app.get("/")
async def handler(request):
await request.respond()
foo = request.respond
bar = foo
await bar()
await foo()
class HandlerView(HTTPMethodView):
async def get(self, request: Request):
await request.respond()
await request.respond()
@staticmethod
async def post(req: Request):
await req.respond()
await req.respond() One potential option might just be to use some iteration of this and instead of throwing an exception, just raising a warning:
|
error_logger.exception( | ||
"Exception occurred in one of response middleware handlers" | ||
) | ||
if not re_use_res: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this better. Let's just use a better variable name: should_reuse_response
@ChihweiLHBird Please check for Still, I suppose that ASGI might need separate handling (does it use Stages at all?). |
One thing I mentioned earlier, that is not implemented, is that responses should never be reused or combined. Could you fix that as well? And possibly we could delay the running of response middlewares to the first send? This would entirely avoid the problem with them being executed twice, or being executed for responses that never get sent. |
asgi.py lines 166-190 show no Stages and no handling of multiple responses. Namely, calling respond() again does not check for a response already started and does not disconnect any previous response, just connects with the new one blindly. The send() function then uses the most recent response and clears it. So, then one can initiate another response and corrupt the stream (which hopefully then raises an exception in the ASGI server). This needs to be fixed. I suppose that adding a stage variable on ASGI would be a rather simple way to do it. |
I am assuming you mean right before it? As in the first |
Yes, but I am not very convinced that this is a good idea. I gather that it is very rare for multiple responses to be created, and that an extra execution of the middleware then (for the response never sent) is not a big deal. Also, the way that it is done now potentially allows middlewares to pass information back to handler (not that I could immediately think of any use for that either). But maybe most importantly, it might be a bit confusing to developers if these middlewares are triggered on |
Agreed. |
Do you expect that to be done in this PR? I think this PR became complicated and we might want a separate PR for it, lol. |
Yeah, let's open another PR later for it later. |
Too many stuffs going on, let's continue here to summarize them: #2327 |
Trying to solve issue: #2219
Some doubts I still have:
response
reference inrequest
instance. I recently learned from the memory leak issue; we should not create more reference if not necessary. I am considering replace it with a bool flag.request.respond
method is called more than 1 time.self.response_func is None and self.stage == Stage.IDLE
implies the request has been responded and the response stream is ended. Am I correct on this?Please let me know your suggestions, thanks! @sanic-org/core-devs