Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

[fix] Force Google old UI #1597

Merged
merged 2 commits into from May 29, 2019
Merged

[fix] Force Google old UI #1597

merged 2 commits into from May 29, 2019

Conversation

unixfox
Copy link
Member

@unixfox unixfox commented May 22, 2019

More details about this PR here: #1596.

In summary: Google sometimes tries to load his new UI that Searx can't parse so by defining the user agent of Internet Explorer 12 it will by default respond with the old UI because it knows that IE doesn't support its new UI.

Copy link

@immanuelfodor immanuelfodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested your PR by these commands and it did not solve the original problem, still getting the message "Sorry! we didn't find any results. Please use another query or search in more categories." :S

git clone https://github.com/asciimoo/searx.git ~/docker/searx

# enable checkout to "hidden" PR branch
cat ~/docker/searx/.git/config
grep "origin/pr" ~/docker/searx/.git/config
sed -i -E 's#(asciimoo\/searx\.git)#\1\n        fetch = +refs/pull/*/head:refs/remotes/origin/pr/*#' ~/docker/searx/.git/config
cat ~/docker/searx/.git/config

# checkout to https://github.com/asciimoo/searx/pull/1597
cd ~/docker/searx
git fetch origin
git checkout pr/1597
git config user.email randomuser@gmail.com
git config user.name randomuser
git merge master
git status

# start testing
docker build -t searx-f ~/docker/searx/Dockerfile ~/docker/searx/
docker run -d --name searx -p 8888:8888 -e IMAGE_PROXY=False -e BASE_URL=https://domain.tld -e TINI_SUBREAPER=True searx

In the docker logs, there are only messages from werkzeug, if you have any idea how to debug this with verbose information produced, I can try.

@immanuelfodor
Copy link

I tried to debug with these steps:

docker exec -it searx sh
~ $ vi searx/engines/google.py

Then add to line ~213:

    file = open('google.html', 'w')          
    file.write(resp.text.encode('utf-8'))                               
    print(resp.text.encode('utf-8'))                           
    file.close()
docker restart searx
docker logs -f searx

Made a search and the classes are still obfuscated:

<div class="ZINbbc xpd O9g5cc uUPGi"><div><div class="jfp3ef"><a href="/url?q=https://www.linkedin.com/company/asdasdasdasds&amp;sa=U&amp;ved=2ahUKEwjq0frGzLbiAhXHLlAKHSTGBBsQFjAEegQIBxAB&amp;usg=AOvVaw3q4RreQiKUrUXDYy9QFB4G"><div class="BNeawe vvjwJb AP7Wnd">asdasd | LinkedIn</div><div class="BNeawe UPmit AP7Wnd">https://www.linkedin.com › company › asdasdasdasds</div></a></div><div class="NJM3tb"></div><div class="jfp3ef"><div><div class="BNeawe s3v9rd AP7Wnd"><div><div><div class="BNeawe s3v9rd AP7Wnd">Learn about working at asdasd. Join LinkedIn today for free. See who you know at asdasd, leverage your professional network, and get hired.</div></div></div></div></div></div></div></div>

@unixfox
Copy link
Member Author

unixfox commented May 25, 2019

Can you check if in your searx/engines/google.py file at the 203th line you have the same exact line as my PR: https://github.com/asciimoo/searx/pull/1597/files#diff-ed0043204dab3ab8a98a1e916146b068R203?

@immanuelfodor
Copy link

Yes, it's the same on the pr/1597 branch.

I also tried out many other user agent strings there starting from IE9 but the responses are always the same.

@unixfox
Copy link
Member Author

unixfox commented May 25, 2019

That's strange... you have the error message Sorry! we didn't find any results. Please use another query or search in more categories. at every search? I had before my patch sometimes that error but not at every search.
Where is your instance located (country)?

@immanuelfodor
Copy link

Yes, each and every search. I recognized this behavior this morning, this is why I came here right after it. Previously, a few tap on the search button solved it, but starting from today I could not get a response anymore. It's in Budapest, Hungary.

@@ -199,6 +199,9 @@ def request(query, params):
params['headers']['Accept-Language'] = language + ',' + language + '-' + country
params['headers']['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'

# Force Internet Explorer 12 user agent to avoid loading the new UI that Searx can't parse
params['headers']['user-agent'] = "Mozilla / 5.0(MSIE 12.0; Trident / 7.0; rv: 11.0) like Gecko"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've found what's the problem here:

  • Your user agent string is not formatted well, extra and/or missing spaces, etc.
  • The header's name should be written as "User-Agent" to be valid.

I tried with this line and now it works! (IE9 on Win7: https://en.wikipedia.org/wiki/Internet_Explorer_9#User_agent_string)

params['headers']['User-Agent'] = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"

Please update the PR and I'll mark it as done :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Nice finding! I used the user agent that a website gave me in the first google results and it worked for me that's why I didn't bother about the formatting.

Copy link

@immanuelfodor immanuelfodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edit, I approve the changes although I don't have write access to the repo, so it just means it finally works for me :)

@kvch If you would be so kind to have a look on this PR and maybe release a minor version, it fixes the broken Google search engine, which is a huge win.

Copy link
Collaborator

@Pofilo Pofilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know how long it will works, but let's enjoy this time !
@kvch, I let you merge if you agree.

@dalf dalf merged commit cbd1ebd into searx:master May 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants