You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I guess they have some algorithms to detect and block robots.
Same happens on my side: works from localhost, doesn't work from VPS.
If I run a simple curl 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O', it blocks indefinitely (local and VPS).
On local curl -H 'User-Agent: Mozilla/5.0 (NT; Windows)' 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O' works just fine, but not on VPS.
On VPS curl -H 'User-Agent: bot' 'https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O' works, but I get a redirect to https://www.michaelkors.fr/.
And so on and so forth...
So, the conclusion is they block some clients dynamically, redirect others, based on IP + User-Agent and all the information they can get from the client.
With hQuery::fromUrl($url, $headers, $req_body, $options) you can try and mimic the browser.
I recommend using some method for fetching HTML from the server (there are a lot of options out there) and then pass it to hQuery::fromHTML().
This library focuses primarily on PARSING HTML, not on fetching, but fetching is there for convenience.
When i use the package on localhost, it works fine
but when i use it on my vps, it doesn't work for some websites
The website that i'm trying to reach is
https://www.michaelkors.com/jet-set-travel-grommeted-saffiano-leather-tote/_/R-US_30F7GTVT6O
Any idea why that happens?
The text was updated successfully, but these errors were encountered: