New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit ability to create spam in general #3301
Comments
FYI, the spam is currently going through DMails, not forum. And the guy created hundreds (literally) of accounts in one day to use, all named with the same pattern of one female name followed by a string of digits and possibly one or two letters in the middle. http://danbooru.donmai.us/users?page=2 They start here with "heatherq8611" all the way to http://danbooru.donmai.us/users?page=10 The feedback page http://danbooru.donmai.us/user_feedbacks already shows that some (an infinitely small part of them all) have already been banned. |
Some facts: User.where("created_at > ? and name ~ '^[a-z]+[0-9]{3,}$'", 3.months.ago).count
=> 1384 SELECT network(set_masklen(last_ip_addr, 24)) as ip, count(*) FROM users WHERE (created_at > '2017-07-14 19:19:07.588835' and name ~ '^[a-z]+[0-9]{3,}$') group by ip order by count(*) desc;
ip | count
------------------+-------
172.68.11.0/24 | 121
172.68.245.0/24 | 118
172.68.246.0/24 | 104
172.68.244.0/24 | 87
172.68.10.0/24 | 84
172.68.46.0/24 | 22
162.158.59.0/24 | 22
162.158.58.0/24 | 21
108.162.215.0/24 | 21
172.68.47.0/24 | 18 select count(*) from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$';
count
-------
523 select count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$');
count
-------
51608 select body from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') order by random() limit 10;
body
------------------------------------------------------------------------------------------------------------
hey Veoh89. My private webcam see here http://bit.ly/2vTv9Ki
hey BestTeitoku. My new hot video here http://bit.ly/2vTv9Ki
hi Tunec. My hot videos look here http://bit.ly/2vTv9Ki
hey Korom1004. New sex site! my profile here http://bit.ly/2vTv9Ki
hi ShinyaKiritou. I'm bored! want sex. All my hot photos and videos look my profile http://bit.ly/2vTv9Ki
hey DaaNIK. My private photos look here http://v.ht/hboxxxx
hey darkrai389. My hot webcam see here http://bit.ly/2vTv9Ki
hi Feli223. My new private video here http://v.ht/hboxxxx
hi overdraiv. I'm rarely here, my complete profile here http://v.ht/hboxxxx
hi Sadloserkunt. My collection of hot photos here http://bit.ly/2vTv9Ki
|
I would argue a link outside the site is unusual and definitely a signal that a message is spam. But these messages look pretty predictable. Maybe just using a standard spam filter would do a lot to eliminate these. Akismet? |
Another idea besides those I mentioned above would be to limit the creation of one account per IP address per day. It wouldn't make things impossible for spammers, but it would make it harder, enough to perhaps discourage them, while at the same time not affecting normal users. |
Honey potting is another idea, maybe not just dmail creation but account creation as well. |
Some ideas after a brief discord discussion:
I'm against a captcha (in my experience, they don't really help while greatly annoying users), but the rest might be a good idea. Also, 51608 dmails? Holy crap. Can you select message count grouping by user, where username matches the pattern, having |
select from_id, count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') and body like '%http%' group by from_id order by count(*) desc limit 10;
from_id | count
---------+-------
529839 | 101
529247 | 100
529539 | 100
529358 | 100
529723 | 100
529726 | 100
529758 | 100
529550 | 100
529390 | 100
529790 | 100 This isn't complete. It's probably easier just to do a firewall ban. |
Are there metrics for external links...? I know that I occasionally send DMails with external links when I'm trying to help a user out, such as sending screenshots or directing them to other helpful sites. |
Banning the entire subnets is pretty harsh, it'll net us a lot of false positives. If you give me a full list of IDs I can just run them through a script and ban everyone. (Of course you can do the same in 3 lines in rails console, but still :3 I can get to it tomorrow, if need be.)
Yeah, it's important that the algorithm should only limit and monitor users who have had little other activity before. If user has posted in the forum, or commented, or successfully uploaded something, the limits should be lifted. |
Those are all Cloudflare IP ranges. Nginx needs to be configured to respect the https://www.cloudflare.com/ips/ |
With the Akismet integration I'll propose the following changes:
|
I think doing the one account per IP per day might also be a good measure to put in place. The spammer is still creating accounts, and has created 100's over just the last hour. http://danbooru.donmai.us/users?limit=1000&page=b530087 No harm on regular users, but perhaps enough of a hindrance on malicious users to discourage them to go elsewhere. |
Some ideas on running a script to ban suspected spam accounts:
I can generate a list of users. Ideas on what analytics to run would be helpful. |
Legitimate bot users will have to be whitelisted. |
MIN_USER_ID = 528958
MIN_DATE = "2017-09-01"
NAME_REGEXP = /^[a-z0-9]+\d{3,}$/
BAD_TITLES = ["My collection", "hi", "My private videos", "My video", "hey", "My webcam", "My dirty fantasies", "My new video", "My hot photos", "My hot webcam", "All your desires", "My hot videos", "my profile", "record from my webcam", "my hot webcam"]
spammers = Set.new(Dmail.where("dmails.from_id >= ? and dmails.created_at >= ? and is_spam = ?", MIN_USER_ID, MIN_DATE, true).joins("join users on users.id = dmails.from_id").where("users.name ~ '^[a-z0-9]+[0-9]{3,}$'").pluck("users.id").map(&:to_i).uniq)
spammers.size
=> 1159 User.without_timeout do
Dmail.where("created_at >= ? and is_spam = ?", MIN_DATE, false).find_each do |dmail|
from_name = dmail.from_name
if dmail.from_id >= MIN_USER_ID && from_name =~ NAME_REGEXP
# dmail.update_column(:is_spam, true)
# dmail.spam!
if !spammers.include?(dmail.from_id)
new_spammers.add(dmail.from_id)
end
end
end
end
new_spammers.size
=> 757 new_new_spammers = Set.new(Dmail.where("created_at >= ? and from_id >= ? and title in (?) and from_id not in (?)", MIN_DATE, MIN_USER_ID, BAD_TITLES, (spammers + new_spammers).to_a).pluck(:from_id))
new_new_spammers.size
=> 6 combined_spammers = spammers + new_spammers + new_new_spammers
User.without_timeout do
combined_spammers.each do |uid|
user = User.find(uid)
tag_change_count = PostArchive.where(updater_id: uid).count
vote_count = PostVote.where(user_id: uid).count
comment_count = Comment.where(creator_id: uid).count
dmail_count = Dmail.where(from_id: uid).count
if tag_change_count + vote_count + comment_count > 0
puts "#{user.name},#{uid},#{tag_change_count},#{vote_count},#{comment_count},#{dmail_count}"
end
end
end Unsurprisingly nothing matches. Will probably just run this soon: combined_spammers.each do |uid|
unless Ban.where(user_id: uid).exists?
Ban.create(duration: 10000, reason: "Spam (automated ref f6147ace)", user_id: uid)
puts "banned #{uid}"
sleep 1
end
end |
With the Akismet integration, will all dmails be subjected to it or only dmails sent by regular members and below? I’m not too keen on sharing my dmails with some third party and potentially turning them into false positives, especially as I regularly share links to external sites with some users via dmail. As BrokenEagle said in the OP, Gold+ users are unlikely to actually send spam. My guess would be that most dmails are sent by Gold+ users, especially builders, so this could be a cost issue too, as Automattic charges per Akismet API call. |
There's some conversation going on in topic #14440 about the spam bot. The Admins say that each account has a unique IP address which I am unable to verify for myself. They and other users are also calling for a Captcha, although some users have argued that they're not that effective, and I'll admit that my knowledge is relatively limited on the topic. |
Out of a sample of 60 of the bot accounts. 14 of them used repeated IPs, the others did not share an IP with any other account. So roughly only a quarter of accounts made by the on-going spam bot currently share an IP with another account. |
It's worth pointing out that targeted attacks against the software would render this moot. Coincidentally, though, honey pot users that always appear somewhere near the top of user lists that are auto injected in could act as honeypots. If a direct message is sent to one, some, or all of the honey pot users, you'd know it was spam. If this is a targeted attack against danbooru itself, though, it'll be a game of cat and mouse until one side can't play anymore or gets bored and leaves. |
Just a note, this regex does not include accounts such as As I said in my previous post, the spammer sometimes adds a letter in the middle of the numbers. |
It does, all those usernames have 3 digits at the end. From what I observed so far, the common pattern is female name in lowercase + Also I hate to bring the bad news, but IP filter doesn't really help - https://danbooru.donmai.us/users I'm afraid we have to add the signup captcha after all, though I honestly don't know how effective it would be, given that solving it costs only a few cents on certain web platforms. The worst thing is we can't rely on name pattern to be the same in the future. It'll help us track down already-present bots, but I could probably write a generator with much less suspicious pattern in 10 minutes if I actually analyzed the target site, so... |
Ah seems like you're right, I was using a wrongly-compiled regex testing. |
@r888888888 is there any pattern in spam message recipients? Also, here's what I propose.
This should both stop the spam influx and account for false positives. Also, for automatic permabans, I would really rather opt for silent hard-removal of such users, including connected dmails and records. They'll be cluttering mod logs and feedbacks otherwise. |
Ok, here's two lists. First, the users who match the pattern I described precisely, with girl name + appendix in the |
I've added an invisible recaptcha to the signup process. I'm in process of banning the accounts now. Assuming there are no false positives they can be deleted in bulk later. |
Some users in topic #14440 have been complaining that they are still receiving DMail notices when they get a spam email. I know that yesterday when I received a "spam email" from DanbooruBot, I did receive a delivery notification. Should users still receive delivery notifications then...? On one hand, it leads to users being annoyed when they think they're getting a DMail only to find out it's spam. On the other hand, if it's a legitimate DMail that was accidentally marked as spam, they may never discover it. One idea may be to change the DMail notification if it's only spam messages that are unread. Another idea would be to nix the delivery notification, and just let the hasmail counter next to My Account be the only notification a user will receive. |
I've banned the rest of new users who matched Captcha seems to be working for now, which is good. Note that it's now impossible to register using an old browser. |
https://danbooru.donmai.us/forum_topics/14440?page=3#forum_post_139904 Reopening because there was another small batch of spam. The spam filter did its job in marking these messages as spam, but the user was still able to keep spamming until they were banned. I propose that if a user sends a lot of spam within a short timeframe they are automatically banned. Say, more than 20 spam dmails within 24 hours should trigger a 3 day ban. EDIT: closing, was raised in its own issue. |
Brought up on the Danbooru forum. Basically, there has been an uprise in spambot activity lately. It's given the moderators a lot of additional work compared to the relative ease with which a user can sign up and create multiple new accounts.
Therefore, it might be beneficial to add some additional restrictions for Member-level users for those particular functions, since there wouldn't be as much to fear from Gold+ users since they have something invested in the site.
Some ideas include...
The text was updated successfully, but these errors were encountered: