Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: No data found for resource with given identifier #260

Closed
bookin opened this issue Sep 8, 2017 · 25 comments
Closed

Error: No data found for resource with given identifier #260

bookin opened this issue Sep 8, 2017 · 25 comments

Comments

@bookin
Copy link

bookin commented Sep 8, 2017

Perhaps you can help me with this, I try to get body from some ajax request on the page, but I all time getting Error: No data found for resource with given identifier and looks like problem only with this request, maybe I'm missing something

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;
        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);
            }
        });
        Network.responseReceived(async ({requestId, response}) => {
            if(response.url.indexOf("ct2/results/rpc") != -1){
                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);

Thanks.

@cyrus-and
Copy link
Owner

This happens because AFAIK you're only allowed to call Network.getResponseBody when the Network.loadingFinished event has fired. Unfortunately this event doesn't contain the associated request object so you have to keep track of the requestId for which you want to fetch the response body.

I implemented this in the following using a Set:

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;

        const requests = new Set(); // <---------- HERE

        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);

                requests.add(requestId); // <---------- HERE

            }
        });
        Network.loadingFinished(async ({requestId}) => {

            if (requests.has(requestId)) { // <---------- HERE

                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);

@bookin
Copy link
Author

bookin commented Sep 9, 2017

Thank you very much for your help, only you are helping people)

@ilanc
Copy link

ilanc commented Jan 25, 2018

Wish I'd found this issue earlier - I actually reported this problem as a bug over on the chromium bug tracker:
https://bugs.chromium.org/p/chromium/issues/detail?id=805887

It still seems to fail occasionally but is much more reliable when called from Network.loadingFinished.

My test code is here:
https://github.com/ilanc/devtools-bugs/blob/master/getResponseBody.js

@pmurley
Copy link

pmurley commented Sep 18, 2018

Got a follow up on this. I'm seeing Network.getResponseBody fail with the error above occasionally (Error: No data found for resource with given identifier) even when waiting for the Network.loadingFinished event and using the requestId from that event as the argument to Network.getResponseBody. It's not common, but it seems to happen consistently for some resources.

Again, the methodology I'm using:

  1. Save request IDs from Network.requestWillBeSent event
  2. When Network.loadingFinished event fires, verify we have seen the requestId before in (1), and then call Network.getResponseBody on that requestId. This (rarely) results in the error.

I'm attempting to save ALL resources loaded by a particular site. I see this problem consistently when visiting cnn.com, for example. It seems like it might have something to do with proxy-related URLs? Here's an example of one resource (which is loaded when visiting CNN) having this problem:

{
 'request': {'documentURL': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
               'frameId': '70E834A1728B9944F7E49654C2892D5E',
               'hasUserGesture': False,
               'initiator': {'lineNumber': 0,
                             'type': 'parser',
                             'url': 'https://www.cnn.com/'},
               'loaderId': 'ED6F00D73ABC7A3C7C2A13AA544542A2',
               'request': {'headers': {'Referer': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
                                       'User-Agent': 'Mozilla/5.0 (X11; Linux '
                                                     'x86_64) '
                                                     'AppleWebKit/537.36 '
                                                     '(KHTML, like Gecko) '
                                                     'Chrome/67.0.3396.0 '
                                                     'Safari/537.36'},
                           'initialPriority': 'Low',
                           'method': 'GET',
                           'mixedContentType': 'none',
                           'referrerPolicy': 'no-referrer-when-downgrade',
                           'url': 'https://bea4.v.fwmrm.net/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D'},
               'requestId': '1000024975.318',
               'timestamp': 133066.557731,
               'type': 'Image',
               'wallTime': 1537295717.29641},
 'response': {'frameId': '70E834A1728B9944F7E49654C2892D5E',
              'loaderId': 'ED6F00D73ABC7A3C7C2A13AA544542A2',
              'requestId': '1000024975.318',
              'response': {'connectionId': 1120,
                           'connectionReused': False,
                           'encodedDataLength': 353,
                           'fromDiskCache': False,
                           'fromServiceWorker': False,
                           'headers': {'Cache-Control': 'no-store',
                                       'Content-Length': '0',
                                       'Content-Type': 'text/html',
                                       'Date': 'Tue, 18 Sep 2018 18:35:17 GMT',
                                       'Expires': '0',
                                       'P3P': 'policyref="https://www.freewheel.tv/w3c/p3p.xml",CP="ALL '
                                              'DSP COR NID"',
                                       'Pragma': 'no-cache',
                                       'Server': 'FWS',
                                       'Set-Cookie': '_uid="f106_6602634828796344746";expires=Wed, '
                                                     '18 Sep 2019 18:35:17 '
                                                     'GMT;domain=.fwmrm.net;path=/;'},
                           'headersText': 'HTTP/1.1 200 OK\r\n'
                                          'Set-Cookie: '
                                          '_uid="f106_6602634828796344746";expires=Wed, '
                                          '18 Sep 2019 18:35:17 '
                                          'GMT;domain=.fwmrm.net;path=/;\r\n'
                                          'Content-Type: text/html\r\n'
                                          'Content-Length: 0\r\n'
                                          'Expires: 0\r\n'
                                          'Pragma: no-cache\r\n'
                                          'Cache-Control: no-store\r\n'
                                          'Date: Tue, 18 Sep 2018 18:35:17 '
                                          'GMT\r\n'
                                          'Server: FWS\r\n'
                                          'P3P: '
                                          'policyref="https://www.freewheel.tv/w3c/p3p.xml",CP="ALL '
                                          'DSP COR NID"\r\n'
                                          '\r\n',
                           'mimeType': 'text/html',
                           'protocol': 'http/1.1',
                           'remoteIPAddress': '38.71.2.160',
                           'remotePort': 443,
                           'requestHeaders': {'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
                                              'Accept-Encoding': 'gzip, '
                                                                 'deflate, br',
                                              'Accept-Language': 'en-US,en;q=0.9',
                                              'Connection': 'keep-alive',
                                              'Host': 'bea4.v.fwmrm.net',
                                              'Referer': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
                                              'User-Agent': 'Mozilla/5.0 (X11; '
                                                            'Linux x86_64) '
                                                            'AppleWebKit/537.36 '
                                                            '(KHTML, like '
                                                            'Gecko) '
                                                            'Chrome/67.0.3396.0 '
                                                            'Safari/537.36'},
                           'requestHeadersText': 'GET '
                                                 '/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D '
                                                 'HTTP/1.1\r\n'
                                                 'Host: bea4.v.fwmrm.net\r\n'
                                                 'Connection: keep-alive\r\n'
                                                 'User-Agent: Mozilla/5.0 '
                                                 '(X11; Linux x86_64) '
                                                 'AppleWebKit/537.36 (KHTML, '
                                                 'like Gecko) '
                                                 'Chrome/67.0.3396.0 '
                                                 'Safari/537.36\r\n'
                                                 'Accept: '
                                                 'image/webp,image/apng,image/*,*/*;q=0.8\r\n'
                                                 'Referer: '
                                                 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html\r\n'
                                                 'Accept-Encoding: gzip, '
                                                 'deflate, br\r\n'
                                                 'Accept-Language: '
                                                 'en-US,en;q=0.9\r\n',
                           'securityDetails': {'certificateId': 0,
                                               'certificateTransparencyCompliance': 'not-compliant',
                                               'cipher': 'AES_256_GCM',
                                               'issuer': 'DigiCert SHA2 High '
                                                         'Assurance Server CA',
                                               'keyExchange': 'ECDHE_RSA',
                                               'keyExchangeGroup': 'P-256',
                                               'protocol': 'TLS 1.2',
                                               'sanList': ['*.v.fwmrm.net',
                                                           'v.fwmrm.net'],
                                               'signedCertificateTimestampList': [],
                                               'subjectName': '*.v.fwmrm.net',
                                               'validFrom': 1509494400,
                                               'validTo': 1610539200},
                           'securityState': 'secure',
                           'status': 200,
                           'statusText': 'OK',
                           'timing': {'connectEnd': 144.491,
                                      'connectStart': 7.448,
                                      'dnsEnd': 7.448,
                                      'dnsStart': 0.401,
                                      'proxyEnd': -1,
                                      'proxyStart': -1,
                                      'pushEnd': 0,
                                      'pushStart': 0,
                                      'receiveHeadersEnd': 211.143,
                                      'requestTime': 133066.560247,
                                      'sendEnd': 144.745,
                                      'sendStart': 144.716,
                                      'sslEnd': 144.486,
                                      'sslStart': 76.332,
                                      'workerReady': -1,
                                      'workerStart': -1},
                           'url': 'https://bea4.v.fwmrm.net/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D'},
              'timestamp': 133066.772863,
              'type': 'Image'}}
}

@cyrus-and
Copy link
Owner

@pmurley that may also happen (if I recall correctly) when you navigate away from the URL then you call Network.getResponseBody on a stale object; possibly not your case.

@pmurley
Copy link

pmurley commented Sep 18, 2018

Wow, you're quick! Yeah, that makes sense, and it's something I should look into a bit more, but I don't think that's what is going on here - at least not in a straightforward way. I'm certainly only calling Page.navigate once to visit the target site.

I guess it could be something caused by some sort of auto-navigation/redirection within a particular frame(?), but if anyone has any other ideas, I'd be grateful!

@cyrus-and
Copy link
Owner

@pmurley if you can come up with a minimal program that reproduces this issue I could take a look at it.

@pmurley
Copy link

pmurley commented Sep 18, 2018

Here's an example. This usually (but not every single time) prints "Why does this happen?" at least once.

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;
        var req_ids = new Set();
        Network.requestWillBeSent(({requestId, request}) => {
            req_ids.add(requestId);
        });
        Network.responseReceived(async ({requestId, response}) => {
        });
        Network.loadingFinished(async ({requestId, response}) => {
            if (req_ids.has(requestId)) {
                try {
                    var response_body = await Network.getResponseBody({requestId});
                } catch (err) {
                    console.log(err);
                    console.log('Why does this happen?');
                }
            } else {
                // I am also confused as to why we sometimes we get here,
                // but this is not my main concern.
                console.log('requestId not seen before');
            }

        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'http://cnn.com'});
            await Page.loadEventFired();
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 10000);

@cyrus-and
Copy link
Owner

cyrus-and commented Sep 19, 2018

@pmurley thanks, why the 10s delay though?

So I think the problem here is that you're reusing the same tab for multiple page loads, so you end up with unprocessed events coming from the previous instance that reference stale objects.

In fact, I consistently get that error if I run the script against a tab that has been used for a previous page load, and never with a blank new tab.

I am also confused as to why we sometimes we get here, but this is not my main concern.

Because they are served from the cache, you can catch them with Network.requestServedFromCache.


Here's what I mean:

const CDP = require('chrome-remote-interface');

async function test() {
    try {
        // this is basically the new part //////////////////////////
        const target = await CDP.New();
        const client = await CDP({target});
        ////////////////////////////////////////////////////////////

        const {Network, Page, Runtime} = client;
        const req_ids = new Set();

        Network.requestWillBeSent(({requestId}) => {
            console.log(`${requestId} Network.requestWillBeSent`);
            req_ids.add(requestId);
        });

        Network.requestServedFromCache(({requestId}) => {
            console.log(`${requestId} Network.requestServedFromCache`);
        });

        Network.responseReceived(async ({requestId}) => {
            console.log(`${requestId} Network.responseReceived`);
        });

        Network.loadingFinished(async ({requestId}) => {
            console.log(`${requestId} Network.loadingFinished`);
            if (req_ids.has(requestId)) {
                try {
                    var response_body = await Network.getResponseBody({requestId});
                } catch (err) {
                    console.log(`${requestId} Network.getResponseBody: FAILED`);
                }
            } else {
                console.log(`${requestId} UNKNOWN`);
            }
        });

        Network.loadingFailed(async ({requestId}) => {
            console.log(`${requestId} Network.loadingFailed`);
        });

        await Network.enable();
        await Page.enable();
        await Page.navigate({url: 'http://cnn.com'});
        await Page.loadEventFired();
        console.log('Page.loadEventFired');
    } catch (err) {
        console.error(err);
    }
}

test();

You should handle the errors better and possibly close client and the newly created tab to avoid creating a bunch of tabs.

Bonus: it might be wise to load each page in a new browser context (incognito-like), if that's the case take a look here.

@xgj1988
Copy link

xgj1988 commented Mar 20, 2020

This happens because AFAIK you're only allowed to call Network.getResponseBody when the Network.loadingFinished event has fired. Unfortunately this event doesn't contain the associated request object so you have to keep track of the requestId for which you want to fetch the response body.

I implemented this in the following using a Set:

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;

        const requests = new Set(); // <---------- HERE

        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);

                requests.add(requestId); // <---------- HERE

            }
        });
        Network.loadingFinished(async ({requestId}) => {

            if (requests.has(requestId)) { // <---------- HERE

                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);

How to filter url when i use loadingFinished.

@cyrus-and
Copy link
Owner

@xgj1988 you need to keep track of the actual request URL, e.g., using a Map. In a nutshell:

  • in Network.requestWillBeSent associate request.url to requestId;
  • in Network.loadingFinished fetch the URL using requestId as key.

This is off topic though, file a new issue if needed.

@xgj1988
Copy link

xgj1988 commented Mar 22, 2020

@xgj1988 Actually ,I use webContens.debugger of electron ,I couldn't find how to get response body .I found the issue from this page , So i think you know the electron too. Could you tell me how to get the response body in the electron ?

@cyrus-and
Copy link
Owner

@xgj1988 in the same way AFAIK, the API should be the same. Anyway I don't know electron at all. :)
Feel free to file a new issue with some minimal working example.

@xgj1988
Copy link

xgj1988 commented Mar 22, 2020

@cyrus-and Network.getResponseBody how to get requetId?

@cyrus-and
Copy link
Owner

@xgj1988 as I told you, it's the one that you get in the Network.requestWillBeSent, it's all in the original snippet really.

@xgj1988
Copy link

xgj1988 commented Mar 22, 2020

@cyrus-and ok I got it . THANKS

@maklimcz
Copy link

maklimcz commented Feb 8, 2021

@pmurley @xgj1988 @cyrus-and did you figure out what causes problem getting response for some requests? i am also facing this in electron. Could you have a look https://stackoverflow.com/questions/66101799/electron-browserwindow-cannot-get-response-when-debugger-is-attached

@cyrus-and
Copy link
Owner

@maklimcz could it be about caching as mentioned above?

@maklimcz
Copy link

maklimcz commented Feb 8, 2021

@cyrus-and nay, I think it isn't loaded from cache because I don't get a Network.requestServedFromCache
I have traced the event stack for this particular request and got:

Network.requestWillBeSentExtraInfo 13548.212
Network.requestWillBeSent 13548.212
Network.responseReceivedExtraInfo 13548.212
Network.responseReceived 13548.212
Network.dataReceived 13548.212 [repeated 135 times]
...
Network.loadingFinished 13548.212

13548.212 is a requestId

@cyrus-and
Copy link
Owner

@maklimcz see if you can reproduce this without Electron.

@ilanc
Copy link

ilanc commented Feb 8, 2021

The conclusion that I came to was that any attempt to "sniff"[^1] network packets using the devtools protocol is not reliable. The getResponse* functions may work but when they fail you can't pin it down to anything that you can trace or correct - it's burried somwhere within chromium (more below).

My solution has been to use workaround code - e.g if I need data in the response body and fail to get it then either I find the data on the rendered html page, or I send a fetch request from the app code directly (i.e. rather than asking devtools to ask chrome to fetch it ... and then trying to sniff the response).

I expect that getResponse* are just poorly tested, poorly used parts of devtools and over time bugs are introduced or get resolved which leads to erratic performance i.e. there isn't a large enough community of people relying on them to moan about it. I came to this conclusion based on this stackoverflow post[^2] which talks about various changes to to chromium which have caused requestWillBeSent to omit some headers, as well as a lot of my own failed experimentation[^3]. It's been years since I did any c++ dev though - so I can't say for sure (haven't tried to tack it down in the chromium source).

Hope it helps.

[^1] like Network.getResponseBody, Network.getResponseBodyForInterception on the receiving side, or Network.requestWillBeSent etc on the sending side
[^2] albeit about requestWillBeSent - not getResponse*
[^3] not much of which is public other than intercept image data and my original chromium bug

@maklimcz
Copy link

maklimcz commented Feb 8, 2021

@cyrus-and @ilanc it seems i managed to do what I wanted. However I didn't use Network.getResponseBody. I used https://chromedevtools.github.io/devtools-protocol/tot/Fetch/
To use that one need to subscribe for Responses matching a pattern. Then you can react on Fetch.requestPaused events. During that you have direct access to request and indirect to response. To get the response call Fetch.getResponseBody with proper requestId. Below I've pasted a snippet.

dbg.sendCommand('Fetch.enable', {
            patterns: [
                { urlPattern: interestingURLpattern, requestStage: "Response" }
            ]})

   var getResponseJson = async (requestId) => {
         const res = await dbg.sendCommand("Fetch.getResponseBody", {requestId: requestId})
         return JSON.parse(res.base64Encoded ? Buffer.from(res.body, 'base64').toString() : res.body)
     }
    dbg.on('message', (e, m, p) => {
        if(m === 'Fetch.requestPaused') {
            var reqJson = JSON.parse(p.request.postData)
            var resJson = await getResponseJson(p.requestId)
            ...

        await dbg.sendCommand("Fetch.continueRequest", {requestId: p.requestId})
        }
    });

Also remember to send Fetch.continueRequest as

The request is paused until the client responds with one of continueRequest, failRequest or fulfillRequest

https://chromedevtools.github.io/devtools-protocol/tot/Fetch/#event-requestPaused

@iddoeldor
Copy link

iddoeldor commented Mar 26, 2021

dbg.sendCommand('Fetch.enable', {

@maklimcz can you please elaborate how to run your example.

@robd
Copy link

robd commented Jun 28, 2022

One thing I found was that it was necessary to ignore 'preflight' requests when calling getResponseBody

Network.on("responseReceived",(async (params) => {
  if (params.type == 'Preflight') return
  const {body} = await Network.getResponseBody(params);
  // Do something with the body
}));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants