Limit ability to create spam in general #3301

BrokenEagle · 2017-09-14T18:51:21Z

Brought up on the Danbooru forum. Basically, there has been an uprise in spambot activity lately. It's given the moderators a lot of additional work compared to the relative ease with which a user can sign up and create multiple new accounts.

Therefore, it might be beneficial to add some additional restrictions for Member-level users for those particular functions, since there wouldn't be as much to fear from Gold+ users since they have something invested in the site.

Some ideas include...

Forbid write API access to comments, forum_topics, and forum_posts
Add Recaptcha verification before allowing a user to post
Better verifications present in the signup process

nonamethanks · 2017-09-14T19:04:05Z

FYI, the spam is currently going through DMails, not forum. And the guy created hundreds (literally) of accounts in one day to use, all named with the same pattern of one female name followed by a string of digits and possibly one or two letters in the middle.

http://danbooru.donmai.us/users?page=2

They start here with "heatherq8611" all the way to http://danbooru.donmai.us/users?page=10
And then they resume here http://danbooru.donmai.us/users?page=16 with "sarahls350" to http://danbooru.donmai.us/users?page=48 ("kimp4979"). You can notice that the second batch is 600 accounts, about 7-8 per minute towards the end.
There's obviously innocent users in the middle, but the damage's already done, all the guy has to do is start a random number of those accounts to spam everyone with DMails.

The feedback page http://danbooru.donmai.us/user_feedbacks already shows that some (an infinitely small part of them all) have already been banned.

r888888888 · 2017-09-14T19:29:07Z

Some facts:

User.where("created_at > ? and name ~ '^[a-z]+[0-9]{3,}$'", 3.months.ago).count
=> 1384

SELECT network(set_masklen(last_ip_addr, 24)) as ip, count(*) FROM users WHERE (created_at > '2017-07-14 19:19:07.588835' and name ~ '^[a-z]+[0-9]{3,}$') group by ip order by count(*) desc;

        ip        | count 
------------------+-------
 172.68.11.0/24   |   121
 172.68.245.0/24  |   118
 172.68.246.0/24  |   104
 172.68.244.0/24  |    87
 172.68.10.0/24   |    84
 172.68.46.0/24   |    22
 162.158.59.0/24  |    22
 162.158.58.0/24  |    21
 108.162.215.0/24 |    21
 172.68.47.0/24   |    18

select count(*) from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$';

 count 
-------
   523

select count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$');

 count 
-------
 51608

select body from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') order by random() limit 10;

                                                    body                                                    
------------------------------------------------------------------------------------------------------------
 hey Veoh89.  My private webcam see here http://bit.ly/2vTv9Ki
 hey BestTeitoku.  My new hot video here http://bit.ly/2vTv9Ki
 hi Tunec.  My hot videos look here http://bit.ly/2vTv9Ki
 hey Korom1004.  New sex site! my profile here http://bit.ly/2vTv9Ki
 hi ShinyaKiritou.  I'm bored! want sex. All my hot photos and videos look my profile http://bit.ly/2vTv9Ki
 hey DaaNIK.  My private photos look here http://v.ht/hboxxxx
 hey darkrai389.  My hot webcam see here http://bit.ly/2vTv9Ki
 hi Feli223.  My new private video here http://v.ht/hboxxxx
 hi overdraiv.  I'm rarely here, my complete profile here http://v.ht/hboxxxx
 hi Sadloserkunt.  My collection of hot photos here http://bit.ly/2vTv9Ki

r888888888 · 2017-09-14T19:33:01Z

I would argue a link outside the site is unusual and definitely a signal that a message is spam. But these messages look pretty predictable. Maybe just using a standard spam filter would do a lot to eliminate these. Akismet?

BrokenEagle · 2017-09-14T19:34:00Z

Another idea besides those I mentioned above would be to limit the creation of one account per IP address per day. It wouldn't make things impossible for spammers, but it would make it harder, enough to perhaps discourage them, while at the same time not affecting normal users.

r888888888 · 2017-09-14T19:34:47Z

Honey potting is another idea, maybe not just dmail creation but account creation as well.

Type-kun · 2017-09-14T19:35:47Z

Some ideas after a brief discord discussion:

Show a captcha for new users
Limit creation of user accounts from a single IP
Don't allow external http links in messages unless user has shown some activity on the site, like uploading or commenting
Somehow report to mods/admins if a user tries to send a message containing http link.

I'm against a captcha (in my experience, they don't really help while greatly annoying users), but the rest might be a good idea.

Also, 51608 dmails? Holy crap. Can you select message count grouping by user, where username matches the pattern, having http:// in the body of their messages? We'll get a quick list of who to ban this way.

r888888888 · 2017-09-14T19:39:15Z

select from_id, count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') and body like '%http%' group by from_id order by count(*) desc limit 10;

 from_id | count 
---------+-------
  529839 |   101
  529247 |   100
  529539 |   100
  529358 |   100
  529723 |   100
  529726 |   100
  529758 |   100
  529550 |   100
  529390 |   100
  529790 |   100

This isn't complete. It's probably easier just to do a firewall ban.

BrokenEagle · 2017-09-14T19:39:15Z

I would argue a link outside the site is unusual and definitely a signal that a message is spam.

Somehow report to mods/admins if a user tries to send a message containing http link.

Are there metrics for external links...? I know that I occasionally send DMails with external links when I'm trying to help a user out, such as sending screenshots or directing them to other helpful sites.

Type-kun · 2017-09-14T19:43:59Z

Banning the entire subnets is pretty harsh, it'll net us a lot of false positives. If you give me a full list of IDs I can just run them through a script and ban everyone. (Of course you can do the same in 3 lines in rails console, but still :3 I can get to it tomorrow, if need be.)

Are there metrics for external links...? I know that I occasionally send DMails with external links when I'm trying to help a user out, such as sending screenshots or directing them to other helpful sites.

Yeah, it's important that the algorithm should only limit and monitor users who have had little other activity before. If user has posted in the forum, or commented, or successfully uploaded something, the limits should be lifted.

evazion · 2017-09-14T19:49:07Z

        ip        | count 
------------------+-------
 172.68.11.0/24   |   121
 172.68.245.0/24  |   118
 172.68.246.0/24  |   104
 172.68.244.0/24  |    87
 172.68.10.0/24   |    84
 172.68.46.0/24   |    22
 162.158.59.0/24  |    22
 162.158.58.0/24  |    21
 108.162.215.0/24 |    21
 172.68.47.0/24   |    18

Those are all Cloudflare IP ranges. Nginx needs to be configured to respect the X-Forwarded-For header passed by Cloudflare.

https://www.cloudflare.com/ips/
https://support.cloudflare.com/hc/en-us/articles/200170706-How-do-I-restore-original-visitor-IP-with-Nginx-

r888888888 · 2017-09-14T20:37:24Z

With the Akismet integration I'll propose the following changes:

A new is_spam boolean on dmails
This boolean is initialized on create
A suspected spam dmail will never send an email to the recipient
Spam emails are filtered out by default (treated the same way as deleted emails)
There is a new folder for suspected spam that you can review
If the recipient receives spam they think is ham they can mark it as such
If the recipient receives a message that they think is spam they can mark it as such

BrokenEagle · 2017-09-15T05:52:27Z

I think doing the one account per IP per day might also be a good measure to put in place.

The spammer is still creating accounts, and has created 100's over just the last hour.

http://danbooru.donmai.us/users?limit=1000&page=b530087

No harm on regular users, but perhaps enough of a hindrance on malicious users to discourage them to go elsewhere.

…3301)

r888888888 · 2017-09-15T22:25:31Z

Some ideas on running a script to ban suspected spam accounts:

The user name pattern seems to be [a-z]+\d{3,}
Some common titles that should be autospammed:
- My collection
- hi
- My private videos
- My video
- hey
- My webcam
- My dirty fantasies
- My new video
- My hot photos
- My hot webcam
- All your desires
- My hot videos
- my profile
- record from my webcam
- my hot webcam

I can generate a list of users. Ideas on what analytics to run would be helpful.

r888888888 · 2017-09-15T22:27:10Z

Legitimate bot users will have to be whitelisted.

r888888888 · 2017-09-15T22:49:21Z

MIN_USER_ID = 528958
MIN_DATE = "2017-09-01"
NAME_REGEXP = /^[a-z0-9]+\d{3,}$/
BAD_TITLES = ["My collection", "hi", "My private videos", "My video", "hey", "My webcam", "My dirty fantasies", "My new video", "My hot photos", "My hot webcam", "All your desires", "My hot videos", "my profile", "record from my webcam", "my hot webcam"]

spammers = Set.new(Dmail.where("dmails.from_id >= ? and dmails.created_at >= ? and is_spam = ?", MIN_USER_ID, MIN_DATE, true).joins("join users on users.id = dmails.from_id").where("users.name ~ '^[a-z0-9]+[0-9]{3,}$'").pluck("users.id").map(&:to_i).uniq)
spammers.size
=> 1159

User.without_timeout do 
  Dmail.where("created_at >= ? and is_spam = ?", MIN_DATE, false).find_each do |dmail|
    from_name = dmail.from_name
    if dmail.from_id >= MIN_USER_ID && from_name =~ NAME_REGEXP
      # dmail.update_column(:is_spam, true)
      # dmail.spam!

      if !spammers.include?(dmail.from_id)
        new_spammers.add(dmail.from_id)
      end
    end
  end
end

new_spammers.size
=> 757

new_new_spammers = Set.new(Dmail.where("created_at >= ? and from_id >= ? and title in (?) and from_id not in (?)", MIN_DATE, MIN_USER_ID, BAD_TITLES, (spammers + new_spammers).to_a).pluck(:from_id))
new_new_spammers.size
=> 6

combined_spammers = spammers + new_spammers + new_new_spammers
User.without_timeout do
  combined_spammers.each do |uid|
    user = User.find(uid)
    tag_change_count = PostArchive.where(updater_id: uid).count
    vote_count = PostVote.where(user_id: uid).count
    comment_count = Comment.where(creator_id: uid).count
    dmail_count = Dmail.where(from_id: uid).count

    if tag_change_count + vote_count + comment_count > 0
      puts "#{user.name},#{uid},#{tag_change_count},#{vote_count},#{comment_count},#{dmail_count}"
    end
  end
end

Unsurprisingly nothing matches. Will probably just run this soon:

combined_spammers.each do |uid|
  unless Ban.where(user_id: uid).exists?
    Ban.create(duration: 10000, reason: "Spam (automated ref f6147ace)", user_id: uid)
    puts "banned #{uid}"
    sleep 1
  end
end

kittey · 2017-09-15T23:09:09Z

With the Akismet integration, will all dmails be subjected to it or only dmails sent by regular members and below?

I’m not too keen on sharing my dmails with some third party and potentially turning them into false positives, especially as I regularly share links to external sites with some users via dmail.

As BrokenEagle said in the OP, Gold+ users are unlikely to actually send spam. My guess would be that most dmails are sent by Gold+ users, especially builders, so this could be a cost issue too, as Automattic charges per Akismet API call.

BrokenEagle · 2017-09-16T04:05:49Z

There's some conversation going on in topic #14440 about the spam bot. The Admins say that each account has a unique IP address which I am unable to verify for myself. They and other users are also calling for a Captcha, although some users have argued that they're not that effective, and I'll admit that my knowledge is relatively limited on the topic.

RenimLS · 2017-09-16T04:25:34Z

Out of a sample of 60 of the bot accounts. 14 of them used repeated IPs, the others did not share an IP with any other account. So roughly only a quarter of accounts made by the on-going spam bot currently share an IP with another account.

hakusaro · 2017-09-16T04:57:40Z

Honey potting is another idea, maybe not just dmail creation but account creation as well.

It's worth pointing out that targeted attacks against the software would render this moot. Coincidentally, though, honey pot users that always appear somewhere near the top of user lists that are auto injected in could act as honeypots. If a direct message is sent to one, some, or all of the honey pot users, you'd know it was spam.

If this is a targeted attack against danbooru itself, though, it'll be a game of cat and mouse until one side can't play anymore or gets bored and leaves.

nonamethanks · 2017-09-16T10:39:46Z

NAME_REGEXP = /^[a-z0-9]+\d{3,}$/

Just a note, this regex does not include accounts such as
http://danbooru.donmai.us/users/531722
http://danbooru.donmai.us/users/531740
http://danbooru.donmai.us/users/531712
http://danbooru.donmai.us/users/531703
http://danbooru.donmai.us/users/531660
http://danbooru.donmai.us/users/531656

As I said in my previous post, the spammer sometimes adds a letter in the middle of the numbers.

Type-kun · 2017-09-16T13:20:46Z

Just a note, this regex does not include accounts such as

It does, all those usernames have 3 digits at the end. From what I observed so far, the common pattern is female name in lowercase + [a-z0-9]{2} + \d{3}, and I'd say it's safe to ban or delete every user matching this pattern since september. We can apologize to false positives if need be :3

Also I hate to bring the bad news, but IP filter doesn't really help - https://danbooru.donmai.us/users

I'm afraid we have to add the signup captcha after all, though I honestly don't know how effective it would be, given that solving it costs only a few cents on certain web platforms.

The worst thing is we can't rely on name pattern to be the same in the future. It'll help us track down already-present bots, but I could probably write a generator with much less suspicious pattern in 10 minutes if I actually analyzed the target site, so...

nonamethanks · 2017-09-16T13:23:39Z

Ah seems like you're right, I was using a wrongly-compiled regex testing.

Type-kun · 2017-09-16T13:47:35Z

@r888888888 is there any pattern in spam message recipients?

Also, here's what I propose.

If a user has no votes/comments/tag changes/dmails, sends a dmail, and akismet decides it's a spam, then don't register the dmail and immediately permaban the user.
If a user has no votes/comments/tag changes but has other dmails, then they are permabanned once 2 spam dmails in a row are detected
If a user is member-level, has votes/comments/tag changes, and 2 spam dmails are detected in a row, then they should be automatically banned for a short period
If a user is gold+, they are excluded from control and can be dealt with manually.

This should both stop the spam influx and account for false positives.

Also, for automatic permabans, I would really rather opt for silent hard-removal of such users, including connected dmails and records. They'll be cluttering mod logs and feedbacks otherwise.

Type-kun · 2017-09-16T15:02:05Z

Ok, here's two lists. First, the users who match the pattern I described precisely, with girl name + appendix
http://puu.sh/xB9DM/e77e67f86e.txt
and the partial match list that matches /^[a-z0-9]+\d{3,}$/ pattern, yet first part wasn't in the girl name list:
http://puu.sh/xB9KY/c8fa277658.txt

in the id:name format. I can mass-ban the first list, but I need an ok from @r888888888 first, because it'll clutter the feedbacks and mod actions incredibly.

r888888888 · 2017-09-16T19:55:23Z

I've added an invisible recaptcha to the signup process.

I'm in process of banning the accounts now. Assuming there are no false positives they can be deleted in bulk later.

BrokenEagle · 2017-09-16T22:30:53Z

Some users in topic #14440 have been complaining that they are still receiving DMail notices when they get a spam email. I know that yesterday when I received a "spam email" from DanbooruBot, I did receive a delivery notification.

Should users still receive delivery notifications then...? On one hand, it leads to users being annoyed when they think they're getting a DMail only to find out it's spam. On the other hand, if it's a legitimate DMail that was accidentally marked as spam, they may never discover it.

One idea may be to change the DMail notification if it's only spam messages that are unread. Another idea would be to nix the delivery notification, and just let the hasmail counter next to My Account be the only notification a user will receive.

Type-kun · 2017-09-17T17:17:14Z

I've banned the rest of new users who matched /^(name)[a-z0-9]{2}\d{3,}$/ pattern, since unbanned ones kept spamming users.

Captcha seems to be working for now, which is good. Note that it's now impossible to register using an old browser.

evazion · 2017-11-28T21:01:44Z

https://danbooru.donmai.us/forum_topics/14440?page=3#forum_post_139904

Reopening because there was another small batch of spam. The spam filter did its job in marking these messages as spam, but the user was still able to keep spamming until they were banned.

I propose that if a user sends a lot of spam within a short timeframe they are automatically banned. Say, more than 20 spam dmails within 24 hours should trigger a 3 day ban.

EDIT: closing, was raised in its own issue.

BrokenEagle changed the title ~~Limit ability to create spam in forum posts/comments~~ Limit ability to create spam in general Sep 14, 2017

r888888888 added a commit that referenced this issue Sep 15, 2017

add 24 hour window for repeating an ip addr for account creation (ref #…

4c565b4

…3301)

r888888888 added a commit that referenced this issue Sep 15, 2017

don't run spam checks on gold account users (ref #3301)

1e41336

r888888888 closed this as completed Oct 12, 2017

evazion reopened this Nov 28, 2017

evazion closed this as completed Nov 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit ability to create spam in general #3301

Limit ability to create spam in general #3301

BrokenEagle commented Sep 14, 2017

nonamethanks commented Sep 14, 2017 •

edited

r888888888 commented Sep 14, 2017 •

edited

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 14, 2017

r888888888 commented Sep 14, 2017

Type-kun commented Sep 14, 2017

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 14, 2017

Type-kun commented Sep 14, 2017 •

edited

evazion commented Sep 14, 2017

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 15, 2017

r888888888 commented Sep 15, 2017 •

edited

r888888888 commented Sep 15, 2017

r888888888 commented Sep 15, 2017 •

edited

kittey commented Sep 15, 2017

BrokenEagle commented Sep 16, 2017

RenimLS commented Sep 16, 2017

hakusaro commented Sep 16, 2017

nonamethanks commented Sep 16, 2017

Type-kun commented Sep 16, 2017 •

edited

nonamethanks commented Sep 16, 2017

Type-kun commented Sep 16, 2017

Type-kun commented Sep 16, 2017

r888888888 commented Sep 16, 2017

BrokenEagle commented Sep 16, 2017

Type-kun commented Sep 17, 2017

evazion commented Nov 28, 2017 •

edited

Limit ability to create spam in general #3301

Limit ability to create spam in general #3301

Comments

BrokenEagle commented Sep 14, 2017

nonamethanks commented Sep 14, 2017 • edited

r888888888 commented Sep 14, 2017 • edited

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 14, 2017

r888888888 commented Sep 14, 2017

Type-kun commented Sep 14, 2017

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 14, 2017

Type-kun commented Sep 14, 2017 • edited

evazion commented Sep 14, 2017

r888888888 commented Sep 14, 2017

BrokenEagle commented Sep 15, 2017

r888888888 commented Sep 15, 2017 • edited

r888888888 commented Sep 15, 2017

r888888888 commented Sep 15, 2017 • edited

kittey commented Sep 15, 2017

BrokenEagle commented Sep 16, 2017

RenimLS commented Sep 16, 2017

hakusaro commented Sep 16, 2017

nonamethanks commented Sep 16, 2017

Type-kun commented Sep 16, 2017 • edited

nonamethanks commented Sep 16, 2017

Type-kun commented Sep 16, 2017

Type-kun commented Sep 16, 2017

r888888888 commented Sep 16, 2017

BrokenEagle commented Sep 16, 2017

Type-kun commented Sep 17, 2017

evazion commented Nov 28, 2017 • edited

nonamethanks commented Sep 14, 2017 •

edited

r888888888 commented Sep 14, 2017 •

edited

Type-kun commented Sep 14, 2017 •

edited

r888888888 commented Sep 15, 2017 •

edited

r888888888 commented Sep 15, 2017 •

edited

Type-kun commented Sep 16, 2017 •

edited

evazion commented Nov 28, 2017 •

edited