-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add http: to the generic regex? #31
Comments
Good thing! Some examples I found:
|
I have a few like this in my logs
Should Also, what about I also have a lot like this in my logs |
Technically, AppEngine-Google is an hosting. But then it means it is a programmatic connection, hence a crawler. I don't know from Bsalsa. The webpage is in Chinese, maybe it's one of those less-known proprietary web browsers. I receive some hit from time to time to some Wiko web browser. |
Agreed, I think I will add Also, if you don't mind me asking @romaricdrigon , how much data do you gather regarding user-agents on your project? |
I had a few thousand log entries to dig in - not that much actually, but with a very high ratio of crawlers (~ 50-70%). We logged UA for some time as we had huge differences between bit.ly or goo.gl hit counts vs ours. Because we post content to social network, within seconds any link posted on a public account is visited by at least a dozen crawler, including Facebook checking the OG tags, some social search tools... |
See #30
The text was updated successfully, but these errors were encountered: