Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hquery works on localhost but not on server for certain websites #30

Closed
hassanzohdy opened this issue Mar 21, 2018 · 1 comment
Closed
Labels

Comments

@hassanzohdy
Copy link

When i use the package on localhost, it works fine

but when i use it on my vps, it doesn't work for some websites

The website that i'm trying to reach is
https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O

Any idea why that happens?

@duzun
Copy link
Owner

duzun commented Mar 22, 2018

I guess they have some algorithms to detect and block robots.
Same happens on my side: works from localhost, doesn't work from VPS.

If I run a simple curl 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O', it blocks indefinitely (local and VPS).

On local curl -H 'User-Agent: Mozilla/5.0 (NT; Windows)' 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O' works just fine, but not on VPS.

On VPS curl -H 'User-Agent: bot' 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O' works, but I get a redirect to https://www.michaelkors.fr/.

And so on and so forth...

So, the conclusion is they block some clients dynamically, redirect others, based on IP + User-Agent and all the information they can get from the client.

With hQuery::fromUrl($url, $headers, $req_body, $options) you can try and mimic the browser.
I recommend using some method for fetching HTML from the server (there are a lot of options out there) and then pass it to hQuery::fromHTML().

This library focuses primarily on PARSING HTML, not on fetching, but fetching is there for convenience.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants