/
bot_regex_patterns.txt
348 lines (348 loc) · 3.82 KB
/
bot_regex_patterns.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
[ ]+bot
^2ip.ru
^[a-z.0-9/ \-_]*bot
^ad muncher
^avsdevicesdk/
^bidtellect/
^blackboard
^blogtrottr
^boardreader
^castro
^collectd
^comodo
^cortex
^curl
^ddg_android
^duckduckgo
^email
^expanse
^ez publish
^fdm[\s/]\d
^feedbin
^fever
^holmes
^java/
^javascript
^lcc
^lua-resty-http
^navermailapp
^netlyzer fastprobe
^netsurf
^newsgator
^ning/
^octopus
^pagepeeker
^pagething
^php
^php-curl-class
^postmanruntime
^prittorrent
^rainmeter
^ramblermail
^sentry/
^server density
^sitesucker
^snapchat
^spotify/
^sprinklr
^the knowledge ai
^unityplayer
^viber$
^websitepulse
^whatsapp\+?/[0-9\.]+ [a-z]$
^windows-rss
^wsr-agent
^yahoo:linkexpander
^yahoocachesystem
^zooshot
a6-indexer
aboundex
adbeat
addthis
admantx
adscanner
ahc/
aiohttp
amazon cloudfront
analyzer
anyevent
apachebench/
apercite
apis-google
appengine-google
appinsights
arabot
arachni
archiver
axios
baidu-yunguance
banca caboto
barkrowler
bazqux
biglotron
bingpreview/
binlar
bit\.ly/
bot($|[/\);-]+)
brandverity
browsershots
btwebclient
bubing
buck/
catchpoint
cc metadata scaper
centuryb
changedetection
check_http
checker
checkmarknetwork/
chrome-lighthouse
cincraw
clickagy
cloudflare
coccoc
collection@infegy.com
contextad bot
convera
crawler
curious george
cyberpatrol
dareboost
datadog agent
datafeedwatch
datanyze
dataprovider.com
daum(oa)?[ /][0-9]
daum/
dcrawl
deusu/
digg deeper
disqus
dmbrowser
domainreanimator
domains project/
drupact
duplexweb-google
ec2linkfinder
electricmonk
embedly
eright
europarchive.org
evc-batch
extractor
ezid
ezooms
facebookexternalhit
fedoraplanet
feedly
feedspot
feedvalidator
fetcher
findlink
findthatfile
flamingo_searchengine
flipboardproxy
fluffy
freshrss
friendica
g00g1e.net
g2 web services
genieo
gigablast
go-http-client
gobuster
gomezagent
google favicon
google page speed insight
google search
google web preview
google-
googleimageproxy
goose/
grouphigh/
grub.org
guzzlehttp
gwene
hatena
headlesschrome
help@dataminr\.com
heritrix
http[s]?://
http_get
httpclient
httpunit
httpurlconnection
httpx
httrack
hubspot
ichiro
indeedbot
inoreader\.com
integromedb
internetarchive
ips-agent
iskanie
jetslide
jetty
kaspersky
kouio\.com
larbin
libwww-perl
linkdex
lipperhey
livelapbot
ltx71
m_bot_tab
mappydata
mastodon
mediapartners-google
megaindex
meltwaternews
metauri
miniflux/
mixnodecache/
mnogosearch
moatbot
monitoring
moreover
muckrack
netcraft
netresearchserver
netsystemsresearch
netvibes
newsblur
newsharecounts
newspaper/
nextcloud
nmap scripting engine
node-fetch/
nutch
nuzzel
okhttp
omgili
optimizer
outbrain
page2rss
pagepeeker/
pandalytics
panscient
pcore-http
phantomjs
phpcrawl
pingdom
pocketparser
postrank
pr-cy.ru
proximic
prtg network monitor
ptst[\s/]
pulsepoint
pycurl
python-requests
python-urllib
qihoobot
qqdownload
qwantify
rivva
robot
robozilla
scoutjet
scraper
scrapy
searchatlas
seewithkids
seobility
seokicks
seolizer
seoscanners
seznam
simplepie
site24x7
siteexplorer.info
siteimprove.com
sixy\.ch
skypeuripreview
slack-imgproxy
slurp
snacktory
sogou
sparkler/
spider
sputnik
statically-
staticlogin:productcbox
statuscake
summify
superfeedr
supybot
swimgbot
sysomos
teoma
theoldreader.com
thinkchaos
thinklab
tineye
tiny tiny rss
traackr.com
tracemyfile
transcoder
trendsmapresolver
trove
turbotabbee
tweetedtimes
twingly
twurly
um-ln
unshortenit
upflow
uptime
validator\.nu
vigil/
virustotal
vkshare
voilabot
w3c-checklink
w3c-mobileok
w3c_css_validator
w3c_unicorn
w3c_validator
webdatastats
webmon
webreaper
webthumbnail
wesee:search
wget
whatcms/
wordupinfosearch
wotbox
xenu link sleuth
y!j
yahoo link preview
yak/
yandexadnet
yandexblogs
yandexcalendar
yandexdirect
yandexfavicons
yandexfordomain
yandeximageresizer
yandeximages
yandexmarket
yandexmedia
yandexmetrika
yandexnews
yandexontodb
yandexpartner
yandexrca
yandexsearchshop
yandexsitelinks
yandextracker
yandexturbo
yandexverticals
yandexvertis
yandexvideo
yandexwebmaster
yanga
yeti
zabbix
zgrab