Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Type application/xhtml+xml not filtered #66

Open
WRFan opened this issue Dec 1, 2019 · 2 comments
Open

Content-Type application/xhtml+xml not filtered #66

WRFan opened this issue Dec 1, 2019 · 2 comments

Comments

@WRFan
Copy link

WRFan commented Dec 1, 2019

If you remove the user-agent from the request headers or use some user-agent that google doesn't recognize, it sends some weird mobile page to the browser, which Proxydomo fails to filter. If I enable the Web Filter Debug in the Log window and load the page, Proxydomo just displays some binary (gzipped ?) output. Could you please look into this issue?

https://github.com/amate/Proxydomo/issues/new

Request:

Request sent to website
GET /search?hl=en&nfpr=1&prmd=u&q=a HTTP/1.1
Accept: text/html, application/xhtml+xml, image/jxr, /
Accept-Language: en-GB,en-US,en,de-DE,ru-RU
Accept-Encoding: gzip, deflate
User-Agent: AdsBot-Google
Host: www.google.de
DNT: 1
Connection: Keep-Alive

Response:

Response sent to browser
HTTP/1.1 200 OK
Content-Type: application/xhtml+xml; charset=ISO-8859-1
Date: Sat, 30 Nov 2019 23:41:58 GMT
Content-Encoding: gzip
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *

@nhantrn
Copy link

nhantrn commented Dec 1, 2019

What are you trying to do after removing the UA? Leaving the ua blank probably tripped some Google rule and they switched to their minimal mobile page.

It still got filtered fine on my end with this:

[Patterns]
Name = "google test"
Version = ""
Author = ""
Comment = ""
Active = TRUE
Multi = FALSE
URL = "www.google.com/search*"
Bounds = ""
Limit = 2048
Match = "<header*</header>"
Replace = "<h1>TEST</h1>"

@WRFan
Copy link
Author

WRFan commented Dec 7, 2019

The filter you tested with is a web page filter, I'm talking about the user agent request header - "outgoing header" in proxydomo:

[HTTP headers]
Key = "User-Agent: User-Agent Debug (Out)"
In = FALSE
Out = TRUE
Version = ""
Author = ""
Comment = ""
Active = TRUE
Multi = FALSE
URL = ""
Bounds = ""
Limit = 256
Match = "$URL(http(s|)://(.|)google./search?)"
Replace = "\0"

It's not about google, the question is if it's the only page on the internet that causes this problem. If it is I can live with it, but there may be more pages like this.

Btw, it's interesting which user-agent strings google expects to send the standard non-mobile page. I tested it a little and found out the google servers expect one of the following user-agents:

(MSIE 6; trident/6)

Trident/7
"7" matters

Firefox/7
"gecko/" before the string is ok

(windows) applewebkit/ Edge
(windows) applewebkit/ Chrome/5 Safari

Applewebkit/537 Version/09 Safari/
Applewebkit/600

------------------------------------------------------------------------------- google Images
(MSIE 1
Applewebkit/1

Anything else, or if the user-agent is not there at all (as in the filter above), and google sends the mobile page that's ignored by Proxydomo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants