Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit ability to create spam in general #3301

Closed
BrokenEagle opened this issue Sep 14, 2017 · 28 comments
Closed

Limit ability to create spam in general #3301

BrokenEagle opened this issue Sep 14, 2017 · 28 comments

Comments

@BrokenEagle
Copy link
Collaborator

Brought up on the Danbooru forum. Basically, there has been an uprise in spambot activity lately. It's given the moderators a lot of additional work compared to the relative ease with which a user can sign up and create multiple new accounts.

Therefore, it might be beneficial to add some additional restrictions for Member-level users for those particular functions, since there wouldn't be as much to fear from Gold+ users since they have something invested in the site.

Some ideas include...

  1. Forbid write API access to comments, forum_topics, and forum_posts
  2. Add Recaptcha verification before allowing a user to post
  3. Better verifications present in the signup process
@nonamethanks
Copy link
Member

nonamethanks commented Sep 14, 2017

FYI, the spam is currently going through DMails, not forum. And the guy created hundreds (literally) of accounts in one day to use, all named with the same pattern of one female name followed by a string of digits and possibly one or two letters in the middle.

http://danbooru.donmai.us/users?page=2

They start here with "heatherq8611" all the way to http://danbooru.donmai.us/users?page=10
And then they resume here http://danbooru.donmai.us/users?page=16 with "sarahls350" to http://danbooru.donmai.us/users?page=48 ("kimp4979"). You can notice that the second batch is 600 accounts, about 7-8 per minute towards the end.
There's obviously innocent users in the middle, but the damage's already done, all the guy has to do is start a random number of those accounts to spam everyone with DMails.

The feedback page http://danbooru.donmai.us/user_feedbacks already shows that some (an infinitely small part of them all) have already been banned.

@r888888888
Copy link
Collaborator

r888888888 commented Sep 14, 2017

Some facts:

User.where("created_at > ? and name ~ '^[a-z]+[0-9]{3,}$'", 3.months.ago).count
=> 1384
SELECT network(set_masklen(last_ip_addr, 24)) as ip, count(*) FROM users WHERE (created_at > '2017-07-14 19:19:07.588835' and name ~ '^[a-z]+[0-9]{3,}$') group by ip order by count(*) desc;

        ip        | count 
------------------+-------
 172.68.11.0/24   |   121
 172.68.245.0/24  |   118
 172.68.246.0/24  |   104
 172.68.244.0/24  |    87
 172.68.10.0/24   |    84
 172.68.46.0/24   |    22
 162.158.59.0/24  |    22
 162.158.58.0/24  |    21
 108.162.215.0/24 |    21
 172.68.47.0/24   |    18
select count(*) from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$';

 count 
-------
   523
select count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$');

 count 
-------
 51608
select body from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') order by random() limit 10;

                                                    body                                                    
------------------------------------------------------------------------------------------------------------
 hey Veoh89.  My private webcam see here http://bit.ly/2vTv9Ki
 hey BestTeitoku.  My new hot video here http://bit.ly/2vTv9Ki
 hi Tunec.  My hot videos look here http://bit.ly/2vTv9Ki
 hey Korom1004.  New sex site! my profile here http://bit.ly/2vTv9Ki
 hi ShinyaKiritou.  I'm bored! want sex. All my hot photos and videos look my profile http://bit.ly/2vTv9Ki
 hey DaaNIK.  My private photos look here http://v.ht/hboxxxx
 hey darkrai389.  My hot webcam see here http://bit.ly/2vTv9Ki
 hi Feli223.  My new private video here http://v.ht/hboxxxx
 hi overdraiv.  I'm rarely here, my complete profile here http://v.ht/hboxxxx
 hi Sadloserkunt.  My collection of hot photos here http://bit.ly/2vTv9Ki

@r888888888
Copy link
Collaborator

I would argue a link outside the site is unusual and definitely a signal that a message is spam. But these messages look pretty predictable. Maybe just using a standard spam filter would do a lot to eliminate these. Akismet?

@BrokenEagle
Copy link
Collaborator Author

Another idea besides those I mentioned above would be to limit the creation of one account per IP address per day. It wouldn't make things impossible for spammers, but it would make it harder, enough to perhaps discourage them, while at the same time not affecting normal users.

@r888888888
Copy link
Collaborator

Honey potting is another idea, maybe not just dmail creation but account creation as well.

@Type-kun
Copy link
Collaborator

Some ideas after a brief discord discussion:

  • Show a captcha for new users
  • Limit creation of user accounts from a single IP
  • Don't allow external http links in messages unless user has shown some activity on the site, like uploading or commenting
  • Somehow report to mods/admins if a user tries to send a message containing http link.

I'm against a captcha (in my experience, they don't really help while greatly annoying users), but the rest might be a good idea.

Also, 51608 dmails? Holy crap. Can you select message count grouping by user, where username matches the pattern, having http:// in the body of their messages? We'll get a quick list of who to ban this way.

@r888888888
Copy link
Collaborator

select from_id, count(*) from dmails where from_id in (select id from users where created_at > '2017-07-14 19:19:07.588835' and (last_ip_addr <<= '172.68.11.0/24' or last_ip_addr <<= '172.68.245.0/24' or last_ip_addr <<= '172.68.246.0/24' or last_ip_addr <<= '172.68.244.0/24') and name ~ '[0-9]{3,}$') and body like '%http%' group by from_id order by count(*) desc limit 10;

 from_id | count 
---------+-------
  529839 |   101
  529247 |   100
  529539 |   100
  529358 |   100
  529723 |   100
  529726 |   100
  529758 |   100
  529550 |   100
  529390 |   100
  529790 |   100

This isn't complete. It's probably easier just to do a firewall ban.

@BrokenEagle
Copy link
Collaborator Author

I would argue a link outside the site is unusual and definitely a signal that a message is spam.

Somehow report to mods/admins if a user tries to send a message containing http link.

Are there metrics for external links...? I know that I occasionally send DMails with external links when I'm trying to help a user out, such as sending screenshots or directing them to other helpful sites.

@BrokenEagle BrokenEagle changed the title Limit ability to create spam in forum posts/comments Limit ability to create spam in general Sep 14, 2017
@Type-kun
Copy link
Collaborator

Type-kun commented Sep 14, 2017

Banning the entire subnets is pretty harsh, it'll net us a lot of false positives. If you give me a full list of IDs I can just run them through a script and ban everyone. (Of course you can do the same in 3 lines in rails console, but still :3 I can get to it tomorrow, if need be.)

Are there metrics for external links...? I know that I occasionally send DMails with external links when I'm trying to help a user out, such as sending screenshots or directing them to other helpful sites.

Yeah, it's important that the algorithm should only limit and monitor users who have had little other activity before. If user has posted in the forum, or commented, or successfully uploaded something, the limits should be lifted.

@evazion
Copy link
Member

evazion commented Sep 14, 2017

        ip        | count 
------------------+-------
 172.68.11.0/24   |   121
 172.68.245.0/24  |   118
 172.68.246.0/24  |   104
 172.68.244.0/24  |    87
 172.68.10.0/24   |    84
 172.68.46.0/24   |    22
 162.158.59.0/24  |    22
 162.158.58.0/24  |    21
 108.162.215.0/24 |    21
 172.68.47.0/24   |    18

Those are all Cloudflare IP ranges. Nginx needs to be configured to respect the X-Forwarded-For header passed by Cloudflare.

https://www.cloudflare.com/ips/
https://support.cloudflare.com/hc/en-us/articles/200170706-How-do-I-restore-original-visitor-IP-with-Nginx-

@r888888888
Copy link
Collaborator

With the Akismet integration I'll propose the following changes:

  • A new is_spam boolean on dmails
  • This boolean is initialized on create
  • A suspected spam dmail will never send an email to the recipient
  • Spam emails are filtered out by default (treated the same way as deleted emails)
  • There is a new folder for suspected spam that you can review
  • If the recipient receives spam they think is ham they can mark it as such
  • If the recipient receives a message that they think is spam they can mark it as such

@BrokenEagle
Copy link
Collaborator Author

I think doing the one account per IP per day might also be a good measure to put in place.

The spammer is still creating accounts, and has created 100's over just the last hour.

http://danbooru.donmai.us/users?limit=1000&page=b530087

No harm on regular users, but perhaps enough of a hindrance on malicious users to discourage them to go elsewhere.

@r888888888
Copy link
Collaborator

r888888888 commented Sep 15, 2017

Some ideas on running a script to ban suspected spam accounts:

  • The user name pattern seems to be [a-z]+\d{3,}
  • Some common titles that should be autospammed:
    • My collection
    • hi
    • My private videos
    • My video
    • hey
    • My webcam
    • My dirty fantasies
    • My new video
    • My hot photos
    • My hot webcam
    • All your desires
    • My hot videos
    • my profile
    • record from my webcam
    • my hot webcam

I can generate a list of users. Ideas on what analytics to run would be helpful.

@r888888888
Copy link
Collaborator

Legitimate bot users will have to be whitelisted.

@r888888888
Copy link
Collaborator

r888888888 commented Sep 15, 2017

MIN_USER_ID = 528958
MIN_DATE = "2017-09-01"
NAME_REGEXP = /^[a-z0-9]+\d{3,}$/
BAD_TITLES = ["My collection", "hi", "My private videos", "My video", "hey", "My webcam", "My dirty fantasies", "My new video", "My hot photos", "My hot webcam", "All your desires", "My hot videos", "my profile", "record from my webcam", "my hot webcam"]

spammers = Set.new(Dmail.where("dmails.from_id >= ? and dmails.created_at >= ? and is_spam = ?", MIN_USER_ID, MIN_DATE, true).joins("join users on users.id = dmails.from_id").where("users.name ~ '^[a-z0-9]+[0-9]{3,}$'").pluck("users.id").map(&:to_i).uniq)
spammers.size
=> 1159
User.without_timeout do 
  Dmail.where("created_at >= ? and is_spam = ?", MIN_DATE, false).find_each do |dmail|
    from_name = dmail.from_name
    if dmail.from_id >= MIN_USER_ID && from_name =~ NAME_REGEXP
      # dmail.update_column(:is_spam, true)
      # dmail.spam!

      if !spammers.include?(dmail.from_id)
        new_spammers.add(dmail.from_id)
      end
    end
  end
end

new_spammers.size
=> 757
new_new_spammers = Set.new(Dmail.where("created_at >= ? and from_id >= ? and title in (?) and from_id not in (?)", MIN_DATE, MIN_USER_ID, BAD_TITLES, (spammers + new_spammers).to_a).pluck(:from_id))
new_new_spammers.size
=> 6
combined_spammers = spammers + new_spammers + new_new_spammers
User.without_timeout do
  combined_spammers.each do |uid|
    user = User.find(uid)
    tag_change_count = PostArchive.where(updater_id: uid).count
    vote_count = PostVote.where(user_id: uid).count
    comment_count = Comment.where(creator_id: uid).count
    dmail_count = Dmail.where(from_id: uid).count

    if tag_change_count + vote_count + comment_count > 0
      puts "#{user.name},#{uid},#{tag_change_count},#{vote_count},#{comment_count},#{dmail_count}"
    end
  end
end

Unsurprisingly nothing matches. Will probably just run this soon:

combined_spammers.each do |uid|
  unless Ban.where(user_id: uid).exists?
    Ban.create(duration: 10000, reason: "Spam (automated ref f6147ace)", user_id: uid)
    puts "banned #{uid}"
    sleep 1
  end
end

@kittey
Copy link
Contributor

kittey commented Sep 15, 2017

With the Akismet integration, will all dmails be subjected to it or only dmails sent by regular members and below?

I’m not too keen on sharing my dmails with some third party and potentially turning them into false positives, especially as I regularly share links to external sites with some users via dmail.

As BrokenEagle said in the OP, Gold+ users are unlikely to actually send spam. My guess would be that most dmails are sent by Gold+ users, especially builders, so this could be a cost issue too, as Automattic charges per Akismet API call.

@BrokenEagle
Copy link
Collaborator Author

There's some conversation going on in topic #14440 about the spam bot. The Admins say that each account has a unique IP address which I am unable to verify for myself. They and other users are also calling for a Captcha, although some users have argued that they're not that effective, and I'll admit that my knowledge is relatively limited on the topic.

@RenimLS
Copy link

RenimLS commented Sep 16, 2017

Out of a sample of 60 of the bot accounts. 14 of them used repeated IPs, the others did not share an IP with any other account. So roughly only a quarter of accounts made by the on-going spam bot currently share an IP with another account.

@hakusaro
Copy link
Contributor

Honey potting is another idea, maybe not just dmail creation but account creation as well.

It's worth pointing out that targeted attacks against the software would render this moot. Coincidentally, though, honey pot users that always appear somewhere near the top of user lists that are auto injected in could act as honeypots. If a direct message is sent to one, some, or all of the honey pot users, you'd know it was spam.

If this is a targeted attack against danbooru itself, though, it'll be a game of cat and mouse until one side can't play anymore or gets bored and leaves.

@nonamethanks
Copy link
Member

NAME_REGEXP = /^[a-z0-9]+\d{3,}$/

Just a note, this regex does not include accounts such as
http://danbooru.donmai.us/users/531722
http://danbooru.donmai.us/users/531740
http://danbooru.donmai.us/users/531712
http://danbooru.donmai.us/users/531703
http://danbooru.donmai.us/users/531660
http://danbooru.donmai.us/users/531656

As I said in my previous post, the spammer sometimes adds a letter in the middle of the numbers.

@Type-kun
Copy link
Collaborator

Type-kun commented Sep 16, 2017

Just a note, this regex does not include accounts such as

It does, all those usernames have 3 digits at the end. From what I observed so far, the common pattern is female name in lowercase + [a-z0-9]{2} + \d{3}, and I'd say it's safe to ban or delete every user matching this pattern since september. We can apologize to false positives if need be :3

Also I hate to bring the bad news, but IP filter doesn't really help - https://danbooru.donmai.us/users

I'm afraid we have to add the signup captcha after all, though I honestly don't know how effective it would be, given that solving it costs only a few cents on certain web platforms.

The worst thing is we can't rely on name pattern to be the same in the future. It'll help us track down already-present bots, but I could probably write a generator with much less suspicious pattern in 10 minutes if I actually analyzed the target site, so...

@nonamethanks
Copy link
Member

Ah seems like you're right, I was using a wrongly-compiled regex testing.

@Type-kun
Copy link
Collaborator

@r888888888 is there any pattern in spam message recipients?

Also, here's what I propose.

  • If a user has no votes/comments/tag changes/dmails, sends a dmail, and akismet decides it's a spam, then don't register the dmail and immediately permaban the user.
  • If a user has no votes/comments/tag changes but has other dmails, then they are permabanned once 2 spam dmails in a row are detected
  • If a user is member-level, has votes/comments/tag changes, and 2 spam dmails are detected in a row, then they should be automatically banned for a short period
  • If a user is gold+, they are excluded from control and can be dealt with manually.

This should both stop the spam influx and account for false positives.

Also, for automatic permabans, I would really rather opt for silent hard-removal of such users, including connected dmails and records. They'll be cluttering mod logs and feedbacks otherwise.

@Type-kun
Copy link
Collaborator

Ok, here's two lists. First, the users who match the pattern I described precisely, with girl name + appendix
http://puu.sh/xB9DM/e77e67f86e.txt
and the partial match list that matches /^[a-z0-9]+\d{3,}$/ pattern, yet first part wasn't in the girl name list:
http://puu.sh/xB9KY/c8fa277658.txt

in the id:name format. I can mass-ban the first list, but I need an ok from @r888888888 first, because it'll clutter the feedbacks and mod actions incredibly.

@r888888888
Copy link
Collaborator

I've added an invisible recaptcha to the signup process.

I'm in process of banning the accounts now. Assuming there are no false positives they can be deleted in bulk later.

@BrokenEagle
Copy link
Collaborator Author

Some users in topic #14440 have been complaining that they are still receiving DMail notices when they get a spam email. I know that yesterday when I received a "spam email" from DanbooruBot, I did receive a delivery notification.

Should users still receive delivery notifications then...? On one hand, it leads to users being annoyed when they think they're getting a DMail only to find out it's spam. On the other hand, if it's a legitimate DMail that was accidentally marked as spam, they may never discover it.

One idea may be to change the DMail notification if it's only spam messages that are unread. Another idea would be to nix the delivery notification, and just let the hasmail counter next to My Account be the only notification a user will receive.

@Type-kun
Copy link
Collaborator

I've banned the rest of new users who matched /^(name)[a-z0-9]{2}\d{3,}$/ pattern, since unbanned ones kept spamming users.

Captcha seems to be working for now, which is good. Note that it's now impossible to register using an old browser.

@evazion
Copy link
Member

evazion commented Nov 28, 2017

https://danbooru.donmai.us/forum_topics/14440?page=3#forum_post_139904

Reopening because there was another small batch of spam. The spam filter did its job in marking these messages as spam, but the user was still able to keep spamming until they were banned.

I propose that if a user sends a lot of spam within a short timeframe they are automatically banned. Say, more than 20 spam dmails within 24 hours should trigger a 3 day ban.

EDIT: closing, was raised in its own issue.

@evazion evazion reopened this Nov 28, 2017
@evazion evazion closed this as completed Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants