Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readData takes request response #2

Closed
andrew-s opened this issue Dec 6, 2017 · 3 comments
Closed

readData takes request response #2

andrew-s opened this issue Dec 6, 2017 · 3 comments

Comments

@andrew-s
Copy link
Contributor

andrew-s commented Dec 6, 2017

Not sure where the best place of communication is so I've opened up another ticket for this, I've been working on implementing DOM.getFlattenedDocument so that I can get the contents. However, this line and the socket returns are a bit awkward.

https://github.com/gsouf/headless-chromium-php/blob/f584792aced5033fda1968d8c666b5b91af57835/src/Communication/Connection.php#L259

An example response looks like;

[2017-12-06 23:10:43] DEBUG socket: |=> sending data:{"id":6,"method":"Target.sendMessageToTarget","params":{"message":"{\"id\":5,\"method\":\"DOM.enable\",\"params\":[]}","sessionId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1"}}
[2017-12-06 23:10:43] DEBUG socket: <=| receiving data:{"method":"Target.receivedMessageFromTarget","params":{"sessionId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1","message":"{\"id\":3,\"result\":{\"frameId\":\"26824.1\"}}","targetId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b"}}
[2017-12-06 23:10:43] DEBUG socket: <=| receiving data:{"id":6,"result":{}}
[2017-12-06 23:10:43] DEBUG socket: |=> sending data:{"id":8,"method":"Target.sendMessageToTarget","params":{"message":"{\"id\":7,\"method\":\"DOM.getFlattenedDocument\",\"params\":{\"depth\":-1}}","sessionId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1"}}
[2017-12-06 23:10:44] DEBUG socket: <=| receiving data:{"method":"Target.receivedMessageFromTarget","params":{"sessionId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1","message":"{\"id\":5,\"result\":{}}","targetId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b"}}
[2017-12-06 23:10:44] DEBUG socket: <=| receiving data:{"id":8,"result":{}}
[2017-12-06 23:10:44] DEBUG socket: <=| receiving data:{"method":"Target.receivedMessageFromTarget","params":{"sessionId":"21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1","message":"{\"id\":7,\"result\":{\"nodes\":[{\"nodeId\":2,\"parentId\":1,\"backendNodeId\":16,\"nodeType\":10,\"nodeName\":\"html\",\"localName\":\"\",\"nodeValue\":\"\",\"publicId\":\"\",\"systemId\":\"\"},{\"nodeId\":5,\"parentId\":4,\"backendNodeId\":19,\"nodeType\":1,\"nodeName\":\"META\",\"localName\":\"meta\",\"nodeValue\":\"\",\"childNodeCount\":0,\"children\":[],\"attributes\":[\"content\",\"/images/branding/googleg/1x/googleg_standard_color_128dp.png\",\"itemp .... truncated ....

Problem here is, my response from the request, I guess it fulfilled the requirements of checkForResponse;

object(HeadlessChromium\Communication\Response)#103 (2) {
  ["message":protected]=>
  object(HeadlessChromium\Communication\Message)#98 (3) {
    ["id":protected]=>
    int(8)
    ["method":protected]=>
    string(26) "Target.sendMessageToTarget"
    ["params":protected]=>
    array(2) {
      ["message"]=>
      string(66) "{"id":7,"method":"DOM.getFlattenedDocument","params":{"depth":-1}}"
      ["sessionId"]=>
      string(38) "21dee360-7e84-45dd-b3ba-1573ab9f4e1b:1"
    }
  }
  ["data":protected]=>
  array(2) {
    ["id"]=>
    int(8)
    ["result"]=>
    array(0) {
    }
  }
}

I'm not sure if there's cases where you'd want to keep multiple responses for the same request (or if there's any requests that even do that) or, to say, filter out responses to requests that don't get immediate responses.

More than happy to take direction on this and implement some methods that manipulate the DOM, maybe a separate class that can be called with a Page object.

Thanks!

@gsouf
Copy link
Member

gsouf commented Dec 7, 2017

Chrome wont send multiple response for a single request. It uses events to communicate about things that happen.

I have some pending work concerning communication with DOM and loading. I'm affraid dont have a clear answer on this topic yet

@andrew-s
Copy link
Contributor Author

andrew-s commented Dec 7, 2017

I might not have phrased that well, but, I think there's two points here;

  1. Waiting for the relevant event, currently, you get the result of the request and not of the event response later if you return sendMessage() from an event

  2. Perhaps just creating a set of classes that abstract all of the DevTools domains; https://chromedevtools.github.io/devtools-protocol/

I can work on abstracting them but, the missing piece there is defining that communication layer - especially if there's a dependency on knowing an event has been triggered - or in some ways, chaining commands. What are your thoughts on this?

@gsouf
Copy link
Member

gsouf commented Apr 6, 2018

Hi @andrew-s I made some progress on this field and now sending message to a target will return message from the target.

You can see example for getting data from the page: https://github.com/gsouf/headless-chromium-php#evaluate-script-on-the-page

@gsouf gsouf closed this as completed Apr 6, 2018
divinity76 added a commit to divinity76/chrome that referenced this issue May 3, 2023
for a split second, documentElement might be missing, causing getHtml() to crash.
I had a program that was doing page stuff and calling getHtml() like every 10 milliseconds (100 times per second), and got an unexpected crash. Was able to create a small reproducible sample:
```php
<?php

declare(strict_types=1);
require_once('vendor/autoload.php');
$chromeBinary = "/snap/bin/chromium";
$browser_factory = new \HeadlessChromium\BrowserFactory($chromeBinary);
$browser_factory->setOptions([
    "headless" => true,
    "noSandbox" => true,
    'windowSize'   => [1000, 1000]
]);
$browser = $browser_factory->createBrowser();
$page = $browser->createPage();
for ($i = 0; $i < 100; ++$i) {
    $page->navigate("http://example.com");
    $html = $page->getHtml();
    $page->navigate("http://example.org");
    $html = $page->getHtml();
}
```
consistently crash with:
```
PHP Fatal error:  Uncaught HeadlessChromium\Exception\JavascriptException: Error during javascript evaluation: TypeError: Cannot read properties of null (reading 'outerHTML')
    at <anonymous>:1:26 in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php:89
Stack trace:
#0 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php(108): HeadlessChromium\PageUtils\PageEvaluation->waitForResponse()
chrome-php#1 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/Page.php(894): HeadlessChromium\PageUtils\PageEvaluation->getReturnValue()
chrome-php#2 /home/hans/projects/ibkr/test_crash.php(16): HeadlessChromium\Page->getHtml()
chrome-php#3 {main}
  thrown in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php on line 89
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants