Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goutte can't find elements that are out of view or still haven't loaded #423

Open
matveynikon opened this issue Aug 31, 2020 · 1 comment

Comments

@matveynikon
Copy link

matveynikon commented Aug 31, 2020

I am trying to make a simple youtube seo tool with goutte. It is supposed to search for a keyword, find a certain video and print the position at which the video is at for that keyword. My problem is that my goutte bot can't find videos that are under the top 10 results. I suppose that is either because those videos haven't loaded yet because for those videos to load a person has to actually scroll down(which I am unable to do with goutte) or because the video is simply out of view port.

Does anyone know a solution? Or If anyone knows if there is a way to scroll in goute, please tell me.

My code:

request('GET', 'https://www.youtube.com/results?search_query=php+web+scraping'); sleep(5); $crawler->selectLink('php web scraping tutorial(simple)')->link();//this video is in the top 30 ?>
@jeromegamez
Copy link

jeromegamez commented Sep 20, 2020

I had the same issue with another site and, while debugging, stumbled upon the mention of a HTML5 class in the Crawler class of the DOMCrawler component:

use Masterminds\HTML5;
// ...
$this->html5Parser = class_exists(HTML5::class) ? new HTML5(['disable_html_ns' => true]) : null;

A follow-up Google search then lead me to https://github.com/Masterminds/html5-php and https://symfony.com/blog/new-in-symfony-4-3-better-html5-parser-for-domcrawler

Long story short: a composer require masterminds/html5 solved the issue for me 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants