-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google sometimes serves a legacy page meant for bots and/or old browsers - this breaks the search controller and results in a TimeoutException. #11
Comments
I figured I should attach my chrome options. chrome_options = ChromeOptions()
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--disable-notifications")
chrome_options.add_argument("--ignore-ssl-errors")
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument(f"--user-agent={user_agent_str}")
chrome_options.add_argument("--user-data-dir=/tmp/chrome")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-application-cache")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
#if headless
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1920,1080") |
Update on this: the legacy page does not serve ads any more, as far as I'm aware. This means we need to figure out what triggers it, which I found out is some obscured fingerprinting code. Unless we can bypass it, mass deployment of this is likely toast after a while on instances dedicated to it. I'm waiting for a friend to spin up some VMs for me on her home lab so I can test this using non-oracle IP ranges. |
It is related to selected user agent string. I updated it to take user agent only from the predefined list. |
Got it. Pulled the latest commit, everything seems to work now. Thanks a lot! |
In certain circumstances, Google serves an older search page without the
appbar
element. I have no clue what triggers it but I've managed to get it to appear consistently on Oracle Cloud Infrastructure IPs, even with desktop user agents.The issue with this page is that even though ads might still appear and may be treated legitimately as this page is served to older browsers like IE11,
utils.py
is instructed to look for an element that is not present. This results in a TimeoutException, even if the page is fully loaded and ads are present.Attached is a stacktrace that I spawned manually after the timeout error, and what the legacy search page looks like.
The two sites are certainly structured differently with different class names but I don't doubt that it should be trivial to figure out where the ads are on the legacy page, if google does decide to serve me some since my adblocker is off.
image
The text was updated successfully, but these errors were encountered: