-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch_url doesn't return RawResponse and doesn't provide access to response code #322
Comments
Hi @edkrueger, thanks for your feedback. I agree there is something missing here but in my opinion it's more a documentation issue. Rather than risking to break existing code I'd prefer to explain how to use
We could break things in expert mode (i.e. change the class and its attributes) and extend the docs, does that work for you and do you see a way to move on? |
Hey @adbar, I think the mixed returns add some complexity to using the tool. For example, consider someone has written code in the "expert mode" but then decides to use Also, getting the status code along with the HTML seems essential to most analyses. For example, you want to filter out cases where you get the HTML for a 404 page. In order to avoid breaking anything in the external interface, here is an idea:
|
The problem is that I believe it's easier to make construction work on parts for which developers are likely to read the docs or are ready for a bit of tinkering:
What do you think? |
@edkrueger What do you think and do you have time to work on the PR? |
My grain of sand. I would suggest the names |
To sum up, here is what I'd suggest in order to implement useful changes step-by-step:
If we agree on most of the sequence we could start working on it. Pull requests are welcome. |
Sounds good, I am only missing the two versions of fetch. I like the idea of separating the simple fetch (just content) from the advanced fetch (full response). Many users would prefer to use the simple fetch. (not sure about the RawResponse and Response, so I am not commenting on that one) |
In the end In a second step we could also change that but users proficient enough to be interested in status codes would probably just use |
Ongoing work is in #479. |
Steps 1 to 3 are now implemented. Feel free to provide feedback or additional functionality with a PR. I'll leave this thread open at least until the next release (and docs update). |
The documentation says that
fetch_url
returns aRawResponse
with headers, body and status code. However, when passeddecode=True
(the default), it actually returns either the HTML as a string orNone
. What appears to be happening is that both_send_request
and_send_pycurl_request
return aRawResponse
, but_handle_response
converts the return tostr
orNone
whendecode=True
.The reason I think this is important is that I think the automatic decoding is a nice feature, but I'd also like to be able to see the status code saved.
Proposed solution:
I think it would be nice to guarantee that the return type of this function is consistent but also accommodate the
decode=True
option.So, I propose that:
RawResponse
be replaced with a classResponse
with fieldsdata
,status
,url
andhtml
fields and default values of None.decode=False
,Response
should have the fieldhtml
beNone
and ifdecode=True
thehtml
field should hold the HTML as string._handle_response
should both take and return aResponse
returning a copy withResponse.html
set to the HTML as a string whendecode=True
self.data == None
to preserve truthy/false checksResponse
should have anas_dict
method (like inDocument
)fetch_url
to say that it returns aResponse
.I realize this change could break some downstream code, but I think it would also make
fetch_url
more useful.Let me know what you think!
The text was updated successfully, but these errors were encountered: