Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

added response body to response object in onResourceReceived event #11484

Closed
wants to merge 9 commits into from

Conversation

dparshin
Copy link
Contributor

added response body to response object in onResourceReceived event

#10158

@richardjharris
Copy link

This is useful, thanks! I think having a full proxy interface will be useful for many things.

For comparison, here is a small patch I was previously using to collect resource content.

diff --git c/src/networkaccessmanager.cpp w/src/networkaccessmanager.cpp
index 17112b3..a73218d 100644
--- c/src/networkaccessmanager.cpp
+++ w/src/networkaccessmanager.cpp
@@ -303,11 +307,20 @@ void NetworkAccessManager::handleStarted()
     QNetworkReply *reply = qobject_cast<QNetworkReply*>(sender());
     if (!reply)
         return;
+
+    QByteArray chunk( reply->peek(reply->size()) );
+    if (m_content.contains(reply)) {
+        m_content[reply].append(chunk);
+    }
+    else {
+        m_content[reply] = chunk;
+    }
+
     if (m_started.contains(reply))
         return;

     m_started += reply;
@@ -367,6 +379,8 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
         headers += header;
     }

+    QByteArray content = m_content.value(reply, QByteArray());
+
     QVariantMap data;
     data["stage"] = "end";
     data["id"] = m_ids.value(reply);
@@ -377,9 +391,12 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
     data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
     data["headers"] = headers;
     data["time"] = QDateTime::currentDateTime();
+    data["body"] = content.toBase64().data();
+    data["bodySize"] = content.size();

     m_ids.remove(reply);
     m_started.remove(reply);
+    m_content.remove(reply);

     emit resourceReceived(data);
 }
diff --git c/src/networkaccessmanager.h w/src/networkaccessmanager.h
index 1b1a8af..e49351a 100644
--- c/src/networkaccessmanager.h
+++ w/src/networkaccessmanager.h
@@ -108,6 +109,7 @@ private slots:
 private:
     QHash<QNetworkReply*, int> m_ids;
     QSet<QNetworkReply*> m_started;
+    QHash<QNetworkReply*, QByteArray> m_content;
     int m_idCounter;
     QNetworkDiskCache* m_networkDiskCache;
     QVariantMap m_customHeaders;

@dustypebbles
Copy link

Hi,

These are great patches adding a much needed feature.

Is there a reason you guys are base64 encoding the body? Couldn't we call btoa() on the value ourselves if we want that?

@dparshin
Copy link
Contributor Author

@richardjharris: I thought of using 'reply->peek', but it is possible that QWebPage(or any other consumer) can read only a part of the reply, leaving rest of data in reply buffer, in this case on next chunk you will get overlapped data in m_content.

@dparshin
Copy link
Contributor Author

@dustypebbles : to be honest, I couldn't find a good way to pass binary data to javascript, at first I just used QString without base64 encoding, but it turned out QString cuts the string at first 0 byte.

@richardjharris
Copy link

Yeah, if you pass binary data you get truncated content. Typed arrays are the correct way to do it but I am unsure if phantomjs's version of JavascriptCore supports them.

@dustypebbles
Copy link

Passing a length to the QString constructor seems to work fine for the filesystem module: https://github.com/ariya/phantomjs/blob/master/src/filesystem.cpp#L109

@dustypebbles
Copy link

All constructors are listed in QString documentation: https://qt-project.org/doc/qt-4.8/qstring.html
A QByteArray can be passed straight!

@vitallium
Copy link
Collaborator

This needs a lot of tests. To make sure everything works, especially AJAX requests.

@dparshin
Copy link
Contributor Author

@dustypebbles removed bas64 encoding, now body is converted in the same way as in filesystem module.

@pongells
Copy link

coming from here

I am a bit lost, would this allow for storing loaded resources? (e.g. if a page in Phantom loads an image, we can store that image on disk without having to do something like a wget)

@osthafen
Copy link

@pongells
Yes that would allow for storing loaded resources.
I am desperately looking for this feature as well.

@dparshin
Copy link
Contributor Author

@vitallium @dustypebbles thanks for your comments, anything else I can do to accelerate review of this patch?

@masahirominami masahirominami mentioned this pull request Sep 8, 2013
@laurentj
Copy link
Contributor

laurentj commented Nov 7, 2013

Hi, this is a useful feature. I implemented it months ago into SlimerJS. However, this is a feature that could take many memory resources (images, videos) ... uselessly, if we don't need the body! This is why in SlimerJS you have to indicate content types of resources for which you want to have the body, in an array webpage.captureContent. this is an array of regular expressions that should match the mime type of content you want to have in the body property.

webpage.captureContent = [ /css/, /image\/.*/ ]

If the content type of the resource does not match one of this regular expression or if it is not the main page, the body property is empty.

What do you think about it?

@chrisparnin
Copy link

Just cherry-picked these commits into latest 1.9 branch and got this to build and work on windows. There was one issue in the code though, setHeader() isn't declared in the networkaccessmanger.h, which raises compile errors. Just adding the one line as follows resolves the issue.

class JsNetworkRequest : public QObject
{
     Q_OBJECT
     public:
           JsNetworkRequest(QNetworkRequest* request, QObject* parent = 0);
           Q_INVOKABLE void abort();
           Q_INVOKABLE void changeUrl(const QString& url);
       bool setHeader(const QString& name, const QVariant& value);
     ....
};

@masahirominami
Copy link

hi, we are using this for our cases, and

@masahirominami
Copy link

hi, sorry accidentally posted, we are using this code and mime type selector will help in our case.

we are using phantomjs with casperjs, so problem is using onResourceReceived make casperjs download() function useless. we need to change download() so that it uses onResourceReceived().

we are running this code on windows and macosx.

@ghost
Copy link

ghost commented Dec 5, 2013

+1 This would be immensely useful for my use case as well. I understand the hesitation but can there be a flag that must be run for PhantomJS to enable this feature? The option to parse the body of a request/response is pretty useful.

Dmitry Parshin added 2 commits April 23, 2014 22:38
@dparshin
Copy link
Contributor Author

@laurentj @rquinlivan Thanks for suggestion, last commit contains my implementation of 'page.captureContent' property. From now response body will be captured only for urls matched to one of patterns in the 'page.captureContent' property. By default the property is empty, so nothing will be captured. Some examples of what it may contain:

page.captureContent = ['.*']; // capture everything
page.captureContent = ['/foo', '\.jpg']; // only urls containing '/foo' or '.jpg' in them

Also, patterns are case insensitive.

@pbkhrv
Copy link

pbkhrv commented Oct 23, 2014

One more for "would like to see #11484 in master" here


it("should not contain resource body if not in the captureContent list", function() {
var page = require('webpage').create();
page.captureContent = ['/foo'];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if here could be an option here for capturing content by MIME type. I.e., I would prefer to specify only loading body for "application/json" requests. My use case for this feature was being able to snoop on server responses, not so much being able to read individual files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still can capture all the content(using ".*" mask) and use only those responses you actually need based on content-type header, will this work?

@vitallium
Copy link
Collaborator

This is really risky to implement this as a proxy between PhantomJS and Webkit. I remember we had a lot of issues with similar implementation.

@vitallium vitallium closed this May 9, 2015
@dparshin
Copy link
Contributor Author

dparshin commented May 9, 2015

@vitallium @ariya we've been using this patch for couple of years, issuing about 100k requests daily. I'm pretty sure its not that risky, if you could point me to the issues you have encountered with this approach I'd gladly verify them and make fixes if necessary.

@vitallium
Copy link
Collaborator

@dparshin in this case, I think we can merge it.
/cc @ariya

@vitallium vitallium reopened this May 10, 2015
@vitallium
Copy link
Collaborator

Ok, I'm going to merge this! I had been playing with it some time. Looks good and stable. If something will not work, blame it on me.
/cc @ariya @dparshin

@dparshin could you please add your name and email to networkreplyproxy.[cpp|h] and networkreplytracker.[cpp|h]?

@dparshin
Copy link
Contributor Author

@vitallium, done

@vitallium
Copy link
Collaborator

@dparshin landed! Thank you! You did awesome job!

@vitallium vitallium closed this May 21, 2015
@vitallium vitallium reopened this May 21, 2015
@vitallium
Copy link
Collaborator

Actually, wait. I can't merge it normally :X

@dparshin
Copy link
Contributor Author

@vitallium, is there anything wrong with it?

@vitallium
Copy link
Collaborator

@dparshin no, I just need to merge it correctly

@vitallium
Copy link
Collaborator

Phew. Now it's landed! Thanks!

@vitallium vitallium closed this May 31, 2015
@dparshin
Copy link
Contributor Author

@vitallium, great!Thanks.

@kriegaex
Copy link

kriegaex commented Jun 7, 2016

Sorry for the stupid question: The PR was closed, but was it also merged into master or another branch? I cannot find the changes when browsing the repo.

@vitallium
Copy link
Collaborator

@kriegaex yes, it was merged into master. But this PR caused weird errors for sync AJAX requests, and we reverted it. We landed a fix for sync AJAX requests some time ago. I think we can bring this feature back into the master branch.

@erikdubbelboer
Copy link
Contributor

@vitallium any ETA? Do you plan to use the same type of proxy class or will that not work anymore because of the AJAX issue?

@vitallium
Copy link
Collaborator

@erikdubbelboer I think we can just pull the original code back. I'll try to land it this weekend.

@dpanic
Copy link

dpanic commented Jun 20, 2016

What is the situation with response.body in onResourceReceived on 2.1.1?

@yeyu456
Copy link

yeyu456 commented Jun 20, 2016

Same question.Is it done yet? I need that feature.Anything i can do right now? like using some old version.

@dmncls
Copy link

dmncls commented Jun 28, 2016

I have some phantom scripts dependent on this functionality (written when it was working with an earlier version of 2.1.1)....and I really needed the functionality back so I have temporarily built using the commit bced581 - it works for my use case. (I hope this helps someone else, and that this functionality is back in an official version soon!)

@redzedi
Copy link

redzedi commented Aug 18, 2016

Using this commit causes Phantomjs to crash when navigating out of a page that has a window.onbeforeunload function that fires a async ajax POST request . Please refer to the html below :-

<!DOCTYPE html>
<html>
<head>
    <script type="text/javascript">
    function testCallbackComplete(){
        console.log("Done onbeforeunload !!");
    }
        window.onbeforeunload = function(){
            var xmlRequest = new XMLHttpRequest();
            xmlRequest.onreadystatechange = testCallbackComplete;
            xmlRequest.open("POST", encodeURI("http://localhost:5000/testEcho"), true);
       xmlRequest.setRequestHeader("Content-Type", "application/json");
        xmlRequest.send("{\"test\":true}");
        return;
    } 

    </script>
    <title>testing On Before Unload</title>
</head>
<body>
   <p>
    <a href="index.html" id="goToIndex">Go to Index</a>
   </p>

</body>
</html>

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet