Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added response body to response object in onResourceReceived event #11484

Closed
wants to merge 9 commits into from

Conversation

@dparshin
Copy link
Contributor

@dparshin dparshin commented Jul 16, 2013

added response body to response object in onResourceReceived event

#10158

@richardjharris
Copy link

@richardjharris richardjharris commented Jul 24, 2013

This is useful, thanks! I think having a full proxy interface will be useful for many things.

For comparison, here is a small patch I was previously using to collect resource content.

diff --git c/src/networkaccessmanager.cpp w/src/networkaccessmanager.cpp
index 17112b3..a73218d 100644
--- c/src/networkaccessmanager.cpp
+++ w/src/networkaccessmanager.cpp
@@ -303,11 +307,20 @@ void NetworkAccessManager::handleStarted()
     QNetworkReply *reply = qobject_cast<QNetworkReply*>(sender());
     if (!reply)
         return;
+
+    QByteArray chunk( reply->peek(reply->size()) );
+    if (m_content.contains(reply)) {
+        m_content[reply].append(chunk);
+    }
+    else {
+        m_content[reply] = chunk;
+    }
+
     if (m_started.contains(reply))
         return;

     m_started += reply;
@@ -367,6 +379,8 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
         headers += header;
     }

+    QByteArray content = m_content.value(reply, QByteArray());
+
     QVariantMap data;
     data["stage"] = "end";
     data["id"] = m_ids.value(reply);
@@ -377,9 +391,12 @@ void NetworkAccessManager::handleFinished(QNetworkReply *reply, const QVariant &
     data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
     data["headers"] = headers;
     data["time"] = QDateTime::currentDateTime();
+    data["body"] = content.toBase64().data();
+    data["bodySize"] = content.size();

     m_ids.remove(reply);
     m_started.remove(reply);
+    m_content.remove(reply);

     emit resourceReceived(data);
 }
diff --git c/src/networkaccessmanager.h w/src/networkaccessmanager.h
index 1b1a8af..e49351a 100644
--- c/src/networkaccessmanager.h
+++ w/src/networkaccessmanager.h
@@ -108,6 +109,7 @@ private slots:
 private:
     QHash<QNetworkReply*, int> m_ids;
     QSet<QNetworkReply*> m_started;
+    QHash<QNetworkReply*, QByteArray> m_content;
     int m_idCounter;
     QNetworkDiskCache* m_networkDiskCache;
     QVariantMap m_customHeaders;
@dustypebbles
Copy link

@dustypebbles dustypebbles commented Jul 24, 2013

Hi,

These are great patches adding a much needed feature.

Is there a reason you guys are base64 encoding the body? Couldn't we call btoa() on the value ourselves if we want that?

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented Jul 24, 2013

@richardjharris: I thought of using 'reply->peek', but it is possible that QWebPage(or any other consumer) can read only a part of the reply, leaving rest of data in reply buffer, in this case on next chunk you will get overlapped data in m_content.

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented Jul 24, 2013

@dustypebbles : to be honest, I couldn't find a good way to pass binary data to javascript, at first I just used QString without base64 encoding, but it turned out QString cuts the string at first 0 byte.

@richardjharris
Copy link

@richardjharris richardjharris commented Jul 24, 2013

Yeah, if you pass binary data you get truncated content. Typed arrays are the correct way to do it but I am unsure if phantomjs's version of JavascriptCore supports them.

@dustypebbles
Copy link

@dustypebbles dustypebbles commented Jul 28, 2013

Passing a length to the QString constructor seems to work fine for the filesystem module: https://github.com/ariya/phantomjs/blob/master/src/filesystem.cpp#L109

@dustypebbles
Copy link

@dustypebbles dustypebbles commented Jul 28, 2013

All constructors are listed in QString documentation: https://qt-project.org/doc/qt-4.8/qstring.html
A QByteArray can be passed straight!

@vitallium
Copy link
Collaborator

@vitallium vitallium commented Jul 28, 2013

This needs a lot of tests. To make sure everything works, especially AJAX requests.

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented Jul 29, 2013

@dustypebbles removed bas64 encoding, now body is converted in the same way as in filesystem module.

@pongells
Copy link

@pongells pongells commented Aug 14, 2013

coming from here

I am a bit lost, would this allow for storing loaded resources? (e.g. if a page in Phantom loads an image, we can store that image on disk without having to do something like a wget)

@osthafen
Copy link

@osthafen osthafen commented Aug 14, 2013

@pongells
Yes that would allow for storing loaded resources.
I am desperately looking for this feature as well.

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented Aug 14, 2013

@vitallium @dustypebbles thanks for your comments, anything else I can do to accelerate review of this patch?

@tomvangoethem

This comment has been minimized.

Copy link

@tomvangoethem tomvangoethem commented on 7b0e2ab Aug 29, 2013

I built phantomjs with this commit included (on MacOSX and Ubuntu), but onResourceReceived events are still fired twice: once with an empty body and once with the full body.
This is the script I ran to test:

var system = require('system'),
    fs = require("fs"),
    page = require('webpage').create();

page.onResourceReceived = function(r) {
    console.log('received: "' + r.url + '" with ' + ((r.body === '') ? 'no body' : ('"' + r.body + '" as body')));
};

page.open('http://tomvglabs.be/phantomjs/resource-test.html', function (status) {
    if (status === "success") {
        console.log('opened page!');
    }
    phantom.exit();
});

And this is the result:

$ ./phantomjs resource_test.js
received: "http://tomvglabs.be/phantomjs/resource-test.html" with no body
received: "http://tomvglabs.be/phantomjs/resource-test.html" with "<html><head><link rel="stylesheet" type="text/css" href="style.css"></head></html>" as body
received: "http://tomvglabs.be/phantomjs/style.css" with no body
received: "http://tomvglabs.be/phantomjs/style.css" with "body { background-color: red; }" as body
opened page!

The resources with empty body are unexpected here...

Also: bodySize is undefined when the body is set (not so when no body is set)
Also: when a resource is requested which returns status 204 (No response), onResourceReceived is called only once (with a body). In this case the body is the null byte ('\u0000')

This comment has been minimized.

Copy link
Owner

@dparshin dparshin replied Aug 29, 2013

Please take into account 'stage' attribute of the response object, it takes one of two values: 'start', 'end'. As far as I understand 'start' event is emitted when HTTP headers received, thus it may contain proper 'bodySize' (content-length header), but the body itself can only be obtained on 'end' stage, when all body chunks are received. This is how phantomjs have been working even in absence of my patch. Though I'm going to fix undefined 'bodySize' at the end stage.

This comment has been minimized.

Copy link

@tomvangoethem tomvangoethem replied Aug 30, 2013

Oh sorry, I was unaware that onResourceReceived was fired twice. The API Reference does not mention this.

Anyways: great stuff you're doing 👍

I evaluated the stability of this patch by visiting the homepage of Alexa's Top 100 websites, and encountered no crashes nor additional errors.

This comment has been minimized.

Copy link
Owner

@dparshin dparshin replied Sep 3, 2013

@tomvangoethem I tried to reproduce the problem, but could not get results you've described. I'm using test/webpage-spec.js test to reproduce:

    it("should return properly from a 401 status", function() {
        var page = require('webpage').create();
        var server = require('webserver').create();
        server.listen(12345, function(request, response) {
            response.statusCode = 200;
//            modifing this line
//            response.write('df');
            response.close();
        });

        var url = "http://localhost:12345/foo";
        var handled = 0;
        runs(function() {
            page.onResourceReceived = function(resource) {
                handled++;
            };
            page.open(url, function(status) {
            });
        });

        waits(1000);

        runs(function() {
            expect(handled).toEqual(1);
            page.onResourceReceived = null;
            server.close();
        });

    });

The behavior I get:

  1. If response doesn't contain the body(empty body, or no 'response.write' at all), on onResourceReceived event is fired
  2. If response contains any amount of characters, two onResourceReceived events fired.

This comment has been minimized.

Copy link

@tomvangoethem tomvangoethem replied Sep 3, 2013

@dparshin This is the PhantomJS script I ran.

And this is the result I get back:

onResourceReceived called: (http://tomvglabs.be/phantomjs/resource-test.html) stage=start
onResourceReceived called: (http://tomvglabs.be/phantomjs/resource-test.html) stage=end
=== START BODY ===
"<html><head><link rel=\"stylesheet\" type=\"text/css\" href=\"style.css\"><img src=\"204.php\"></head></html>"
=== END BODY ===
onResourceReceived called: (http://tomvglabs.be/phantomjs/style.css) stage=start
onResourceReceived called: (http://tomvglabs.be/phantomjs/style.css) stage=end
=== START BODY ===
"body { background-color: red; }"
=== END BODY ===
onResourceReceived called: (http://tomvglabs.be/phantomjs/204.php) stage=end
=== START BODY ===
"\u0000"
=== END BODY ===
opened http://tomvglabs.be/phantomjs/resource-test.html

As you can see: there is no entry of 204.php with stage=start. And the body (wrongfully) contains the NULL-byte in stage=end.

This is the PHP code I have running on my server:

<?php
header("status: 204"); 
header("HTTP/1.0 204 No Response");
?>

These are the headers sent by the server when requesting the 204.php file:

  HTTP/1.1 204 No Response
  Date: Tue, 03 Sep 2013 16:38:01 GMT
  Server: Apache
  Content-Length: 0
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html

This is the result of running the same file (with the body-part commented out ofcourse) with PhantomJS 1.9.1:

onResourceReceived called: (http://tomvglabs.be/phantomjs/resource-test.html) stage=start
onResourceReceived called: (http://tomvglabs.be/phantomjs/resource-test.html) stage=end
onResourceReceived called: (http://tomvglabs.be/phantomjs/style.css) stage=start
onResourceReceived called: (http://tomvglabs.be/phantomjs/style.css) stage=end
onResourceReceived called: (http://tomvglabs.be/phantomjs/204.php) stage=end
opened http://tomvglabs.be/phantomjs/resource-test.html

So the issue is that the body (r.body) is not empty, but contains the NULL byte ('\u0000').

@masahirominami masahirominami mentioned this pull request Sep 8, 2013
@laurentj
Copy link
Contributor

@laurentj laurentj commented Nov 7, 2013

Hi, this is a useful feature. I implemented it months ago into SlimerJS. However, this is a feature that could take many memory resources (images, videos) ... uselessly, if we don't need the body! This is why in SlimerJS you have to indicate content types of resources for which you want to have the body, in an array webpage.captureContent. this is an array of regular expressions that should match the mime type of content you want to have in the body property.

webpage.captureContent = [ /css/, /image\/.*/ ]

If the content type of the resource does not match one of this regular expression or if it is not the main page, the body property is empty.

What do you think about it?

@chrisparnin
Copy link

@chrisparnin chrisparnin commented Nov 23, 2013

Just cherry-picked these commits into latest 1.9 branch and got this to build and work on windows. There was one issue in the code though, setHeader() isn't declared in the networkaccessmanger.h, which raises compile errors. Just adding the one line as follows resolves the issue.

class JsNetworkRequest : public QObject
{
     Q_OBJECT
     public:
           JsNetworkRequest(QNetworkRequest* request, QObject* parent = 0);
           Q_INVOKABLE void abort();
           Q_INVOKABLE void changeUrl(const QString& url);
       bool setHeader(const QString& name, const QVariant& value);
     ....
};
@masahirominami
Copy link

@masahirominami masahirominami commented Nov 26, 2013

hi, we are using this for our cases, and

@masahirominami
Copy link

@masahirominami masahirominami commented Nov 26, 2013

hi, sorry accidentally posted, we are using this code and mime type selector will help in our case.

we are using phantomjs with casperjs, so problem is using onResourceReceived make casperjs download() function useless. we need to change download() so that it uses onResourceReceived().

we are running this code on windows and macosx.

@ghost
Copy link

@ghost ghost commented Dec 5, 2013

+1 This would be immensely useful for my use case as well. I understand the hesitation but can there be a flag that must be run for PhantomJS to enable this feature? The option to parse the body of a request/response is pretty useful.

Dmitry Parshin added 2 commits Apr 24, 2014
@ariya ariya force-pushed the ariya:master branch from 46b0a8a to bad1fee Oct 2, 2014
@pbkhrv
Copy link

@pbkhrv pbkhrv commented Oct 23, 2014

One more for "would like to see #11484 in master" here


it("should not contain resource body if not in the captureContent list", function() {
var page = require('webpage').create();
page.captureContent = ['/foo'];

This comment has been minimized.

@ghost

ghost Oct 23, 2014

I wonder if here could be an option here for capturing content by MIME type. I.e., I would prefer to specify only loading body for "application/json" requests. My use case for this feature was being able to snoop on server responses, not so much being able to read individual files.

This comment has been minimized.

@dparshin

dparshin Oct 24, 2014
Author Contributor

You still can capture all the content(using ".*" mask) and use only those responses you actually need based on content-type header, will this work?

@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 9, 2015

This is really risky to implement this as a proxy between PhantomJS and Webkit. I remember we had a lot of issues with similar implementation.

@vitallium vitallium closed this May 9, 2015
@dparshin
Copy link
Contributor Author

@dparshin dparshin commented May 9, 2015

@vitallium @ariya we've been using this patch for couple of years, issuing about 100k requests daily. I'm pretty sure its not that risky, if you could point me to the issues you have encountered with this approach I'd gladly verify them and make fixes if necessary.

@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 10, 2015

@dparshin in this case, I think we can merge it.
/cc @ariya

@vitallium vitallium reopened this May 10, 2015
@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 18, 2015

Ok, I'm going to merge this! I had been playing with it some time. Looks good and stable. If something will not work, blame it on me.
/cc @ariya @dparshin

@dparshin could you please add your name and email to networkreplyproxy.[cpp|h] and networkreplytracker.[cpp|h]?

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented May 18, 2015

@vitallium, done

@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 21, 2015

@dparshin landed! Thank you! You did awesome job!

@vitallium vitallium closed this May 21, 2015
@vitallium vitallium reopened this May 21, 2015
@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 21, 2015

Actually, wait. I can't merge it normally :X

@dparshin
Copy link
Contributor Author

@dparshin dparshin commented May 22, 2015

@vitallium, is there anything wrong with it?

@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 22, 2015

@dparshin no, I just need to merge it correctly

@vitallium
Copy link
Collaborator

@vitallium vitallium commented May 31, 2015

Phew. Now it's landed! Thanks!

@vitallium vitallium closed this May 31, 2015
@dparshin
Copy link
Contributor Author

@dparshin dparshin commented May 31, 2015

@vitallium, great!Thanks.

@kriegaex
Copy link

@kriegaex kriegaex commented Jun 7, 2016

Sorry for the stupid question: The PR was closed, but was it also merged into master or another branch? I cannot find the changes when browsing the repo.

@vitallium
Copy link
Collaborator

@vitallium vitallium commented Jun 7, 2016

@kriegaex yes, it was merged into master. But this PR caused weird errors for sync AJAX requests, and we reverted it. We landed a fix for sync AJAX requests some time ago. I think we can bring this feature back into the master branch.

@erikdubbelboer
Copy link
Contributor

@erikdubbelboer erikdubbelboer commented Jun 8, 2016

@vitallium any ETA? Do you plan to use the same type of proxy class or will that not work anymore because of the AJAX issue?

@vitallium
Copy link
Collaborator

@vitallium vitallium commented Jun 8, 2016

@erikdubbelboer I think we can just pull the original code back. I'll try to land it this weekend.

@dpanic
Copy link

@dpanic dpanic commented Jun 20, 2016

What is the situation with response.body in onResourceReceived on 2.1.1?

@yeyu456
Copy link

@yeyu456 yeyu456 commented Jun 20, 2016

Same question.Is it done yet? I need that feature.Anything i can do right now? like using some old version.

@dmncls
Copy link

@dmncls dmncls commented Jun 28, 2016

I have some phantom scripts dependent on this functionality (written when it was working with an earlier version of 2.1.1)....and I really needed the functionality back so I have temporarily built using the commit bced581 - it works for my use case. (I hope this helps someone else, and that this functionality is back in an official version soon!)

@redzedi
Copy link

@redzedi redzedi commented Aug 18, 2016

Using this commit causes Phantomjs to crash when navigating out of a page that has a window.onbeforeunload function that fires a async ajax POST request . Please refer to the html below :-

<!DOCTYPE html>
<html>
<head>
    <script type="text/javascript">
    function testCallbackComplete(){
        console.log("Done onbeforeunload !!");
    }
        window.onbeforeunload = function(){
            var xmlRequest = new XMLHttpRequest();
            xmlRequest.onreadystatechange = testCallbackComplete;
            xmlRequest.open("POST", encodeURI("http://localhost:5000/testEcho"), true);
       xmlRequest.setRequestHeader("Content-Type", "application/json");
        xmlRequest.send("{\"test\":true}");
        return;
    } 

    </script>
    <title>testing On Before Unload</title>
</head>
<body>
   <p>
    <a href="index.html" id="goToIndex">Go to Index</a>
   </p>

</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet