Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping problem with indexer #8

Open
aw-gerrit opened this issue Jun 16, 2021 · 3 comments
Open

Scraping problem with indexer #8

aw-gerrit opened this issue Jun 16, 2021 · 3 comments
Assignees

Comments

@aw-gerrit
Copy link
Member

Use case

https://sag-sh.de

Issue: no hit although term exists on page

The search term „Albert-Schweitzer-Schule“ should throw a hit on following page:
https://sag-sh.de/referenzschulnetzwerk/archiv (but doesen't)

On the page the term exists as follows (span in a)
<a data-v-0c0bb767="" href="http://www.ass-wedel.de/" target="_blank" rel="noopener noreferrer" class="gtl-link"> <span data-v-0c0bb767="" class="font-semibold">Albert-Schweitzer-Schule</span></a>

@jannescb
Copy link
Member

jannescb commented Jun 16, 2021

This is because

$content = file_get_contents($url);

file_get_contents() wont render Javascript. The table in your example is a Vue-App.

@cbl

How much effort do you think would it take to add an optional chromium feature that could render each URL?

Alternative

We might consider implementing Browsershot and simply do:

Browsershot also can get the body of an html page after JavaScript has been executed:

Browsershot::url('https://example.com')->bodyHtml()

@aw-gerrit
Copy link
Member Author

Could the @bot blade directive also be a solution for this? This would ensure a proper google index as well.

@jannescb
Copy link
Member

@aw-gerrit

Yes, if you build a server rendered bot version and the @bot directive will trigger on the user agent of php it would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants