added response body to response object in onResourceReceived event #11484
Conversation
This is useful, thanks! I think having a full proxy interface will be useful for many things. For comparison, here is a small patch I was previously using to collect resource content. diff --git c/src/networkaccessmanager.cpp w/src/networkaccessmanager.cpp
index 17112b3..a73218d 100644
--- c/src/networkaccessmanager.cpp
+++ w/src/networkaccessmanager.cpp
@@ -303,11 +307,20 @@ void NetworkAccessManager::handleStarted()
QNetworkReply *reply = qobject_cast<QNetworkReply*>(sender());
if (!reply)
return;
+
+ QByteArray chunk( reply->peek(reply->size()) );
+ if (m_content.contains(reply)) {
+ m_content[reply].append(chunk);
+ }
+ else {
+ m_content[reply] = chunk;
+ }
+
if (m_started.contains(reply))
return;
m_started += reply;
@@ -367,6 +379,8 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
headers += header;
}
+ QByteArray content = m_content.value(reply, QByteArray());
+
QVariantMap data;
data["stage"] = "end";
data["id"] = m_ids.value(reply);
@@ -377,9 +391,12 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
data["headers"] = headers;
data["time"] = QDateTime::currentDateTime();
+ data["body"] = content.toBase64().data();
+ data["bodySize"] = content.size();
m_ids.remove(reply);
m_started.remove(reply);
+ m_content.remove(reply);
emit resourceReceived(data);
}
diff --git c/src/networkaccessmanager.h w/src/networkaccessmanager.h
index 1b1a8af..e49351a 100644
--- c/src/networkaccessmanager.h
+++ w/src/networkaccessmanager.h
@@ -108,6 +109,7 @@ private slots:
private:
QHash<QNetworkReply*, int> m_ids;
QSet<QNetworkReply*> m_started;
+ QHash<QNetworkReply*, QByteArray> m_content;
int m_idCounter;
QNetworkDiskCache* m_networkDiskCache;
QVariantMap m_customHeaders; |
Hi, These are great patches adding a much needed feature. Is there a reason you guys are base64 encoding the body? Couldn't we call btoa() on the value ourselves if we want that? |
@richardjharris: I thought of using 'reply->peek', but it is possible that QWebPage(or any other consumer) can read only a part of the reply, leaving rest of data in reply buffer, in this case on next chunk you will get overlapped data in m_content. |
@dustypebbles : to be honest, I couldn't find a good way to pass binary data to javascript, at first I just used QString without base64 encoding, but it turned out QString cuts the string at first 0 byte. |
Yeah, if you pass binary data you get truncated content. Typed arrays are the correct way to do it but I am unsure if phantomjs's version of JavascriptCore supports them. |
Passing a length to the QString constructor seems to work fine for the filesystem module: https://github.com/ariya/phantomjs/blob/master/src/filesystem.cpp#L109 |
All constructors are listed in QString documentation: https://qt-project.org/doc/qt-4.8/qstring.html |
This needs a lot of tests. To make sure everything works, especially AJAX requests. |
@dustypebbles removed bas64 encoding, now body is converted in the same way as in filesystem module. |
coming from here I am a bit lost, would this allow for storing loaded resources? (e.g. if a page in Phantom loads an image, we can store that image on disk without having to do something like a wget) |
@pongells |
@vitallium @dustypebbles thanks for your comments, anything else I can do to accelerate review of this patch? |
Also fixed same code in filesystem module ariya#10158
Hi, this is a useful feature. I implemented it months ago into SlimerJS. However, this is a feature that could take many memory resources (images, videos) ... uselessly, if we don't need the body! This is why in SlimerJS you have to indicate content types of resources for which you want to have the body, in an array webpage.captureContent. this is an array of regular expressions that should match the mime type of content you want to have in the body property.
If the content type of the resource does not match one of this regular expression or if it is not the main page, the body property is empty. What do you think about it? |
Just cherry-picked these commits into latest 1.9 branch and got this to build and work on windows. There was one issue in the code though,
|
hi, we are using this for our cases, and |
hi, sorry accidentally posted, we are using this code and mime type selector will help in our case. we are using phantomjs with casperjs, so problem is using onResourceReceived make casperjs download() function useless. we need to change download() so that it uses onResourceReceived(). we are running this code on windows and macosx. |
+1 This would be immensely useful for my use case as well. I understand the hesitation but can there be a flag that must be run for PhantomJS to enable this feature? The option to parse the body of a request/response is pretty useful. |
if url matches one of the patterns in 'page.captureContent' property ariya#10158
@laurentj @rquinlivan Thanks for suggestion, last commit contains my implementation of 'page.captureContent' property. From now response body will be captured only for urls matched to one of patterns in the 'page.captureContent' property. By default the property is empty, so nothing will be captured. Some examples of what it may contain:
Also, patterns are case insensitive. |
One more for "would like to see #11484 in master" here |
|
||
it("should not contain resource body if not in the captureContent list", function() { | ||
var page = require('webpage').create(); | ||
page.captureContent = ['/foo']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if here could be an option here for capturing content by MIME type. I.e., I would prefer to specify only loading body for "application/json" requests. My use case for this feature was being able to snoop on server responses, not so much being able to read individual files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still can capture all the content(using ".*" mask) and use only those responses you actually need based on content-type header, will this work?
This is really risky to implement this as a proxy between PhantomJS and Webkit. I remember we had a lot of issues with similar implementation. |
@vitallium @ariya we've been using this patch for couple of years, issuing about 100k requests daily. I'm pretty sure its not that risky, if you could point me to the issues you have encountered with this approach I'd gladly verify them and make fixes if necessary. |
@vitallium, done |
@dparshin landed! Thank you! You did awesome job! |
Actually, wait. I can't merge it normally :X |
@vitallium, is there anything wrong with it? |
@dparshin no, I just need to merge it correctly |
Phew. Now it's landed! Thanks! |
@vitallium, great!Thanks. |
Sorry for the stupid question: The PR was closed, but was it also merged into master or another branch? I cannot find the changes when browsing the repo. |
@kriegaex yes, it was merged into master. But this PR caused weird errors for sync AJAX requests, and we reverted it. We landed a fix for sync AJAX requests some time ago. I think we can bring this feature back into the master branch. |
@vitallium any ETA? Do you plan to use the same type of proxy class or will that not work anymore because of the AJAX issue? |
@erikdubbelboer I think we can just pull the original code back. I'll try to land it this weekend. |
What is the situation with response.body in onResourceReceived on 2.1.1? |
Same question.Is it done yet? I need that feature.Anything i can do right now? like using some old version. |
I have some phantom scripts dependent on this functionality (written when it was working with an earlier version of 2.1.1)....and I really needed the functionality back so I have temporarily built using the commit bced581 - it works for my use case. (I hope this helps someone else, and that this functionality is back in an official version soon!) |
Using this commit causes Phantomjs to crash when navigating out of a page that has a window.onbeforeunload function that fires a async ajax POST request . Please refer to the html below :-
|
added response body to response object in onResourceReceived event
#10158