Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to show flagged posts that got not deleted yet (or post automatically) #226

Closed
ByteCommander opened this issue Sep 30, 2016 · 25 comments
Labels
status: completed That probably took longer than we said it would. type: feature request Shinies.

Comments

@ByteCommander
Copy link
Member

ByteCommander commented Sep 30, 2016

Sometimes spam reports about smaller sites get less attention than they need, either if they're followed by many other reports or if only few people are online at the time.

I would suggest that Smokey should keep a list of all reported posts that got positive feedback or no feedback at all yet and are not yet removed from the site. A command like !!/pending would then show a list of all those reports that still need more flags or feedback. Example:

"Skin care tips" by "SpamUser" on webmsaters.stackexchange.com [MS] (reported 12 minutes ago, 1 tp, 0 naa, 0 fp, post score -3)
"Best essay writing service" by "Writer" on graphicdesign.stackexchange.com [MS] (reported 6 minutes ago, no feedback yet, post score -1)

This would be very helpful to make sure no reports slip through and to verify if anything needs more flags after a bunch of reports appeared without having to walk through the links manually.

Additionally, it might be useful to not only post this report on demand but also automatically for posts in the list that were reported more than e.g. 10 minutes ago.

@ArtOfCode-
Copy link
Member

Might make sense to do this as an MS API route, with Smokey just requesting that when the command gets hit.

@csnardi
Copy link
Contributor

csnardi commented Sep 30, 2016

I don't really like this idea. Anything we make would be highly highly inaccurate -- deletion data is recorded a lot of the time, but not if a restart occurs, not if the post was deleted too quickly, and not if the post was deleted 20+ minutes after being reported. Also, if a post is not deleted after 10-15 minutes, it's probably fairly borderline. Most blatant spam will be deleted within 10-15 minutes; I'm not sure we really want to encourage spam flags on non-blatant spam.

@AWegnerGitHub
Copy link
Member

I agree. I think this is going to generate a lot of noise and false
positives

On Sep 30, 2016 2:19 PM, "hichris1234" notifications@github.com wrote:

I don't really like this idea. Anything we make would be highly highly
inaccurate -- deletion data is recorded a lot of the time, but not if a
restart occurs, not if the post was deleted too quickly, and not if the
post was deleted 20+ minutes after being reported. Also, if a post is not
deleted after 10-15 minutes, it's probably fairly borderline. Most blatant
spam will be deleted within 10-15 minutes; I'm not sure we really want to
encourage spam flags on non-blatant spam.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#226 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGKb52opmN9vUtSgVPbY7KtclAMMJg_yks5qvWC0gaJpZM4KK-qx
.

@ByteCommander
Copy link
Member Author

@hichris1234 I can't really follow your technical argument, what is the problem with keeping a list of reports, their feedback, and polling the deletion status of spam candidates?

And again, there are not only the reports that get overseen because they are old, but also during rush hour, when Smokey reports a dozen posts in short time.

I see your doubts because it might encourage robo-flagging of false positives, maybe the query should only be exposed on Metasmoke then and not as Smokey command. That would limit the number of people who see the results to a smaller circle of mostly experienced and responsible spam flaggers.

@angussidney
Copy link
Member

This is a great idea; I've been thinking about something similar myself for a couple of weeks, but wasn't sure how it would be implemented. Though the command idea makes a lot of sense.

AFAIK, MS already keeps a record of post deletion data, so it shouldn't be too hard to implement

@csnardi
Copy link
Contributor

csnardi commented Oct 1, 2016

@ByteCommander We do have deletion data to some extent. It's just not completely accurate -- it was never designed to be. I'm not too convinced that this is a problem we need to solve. I think having a command like this would be noisy -- and do note that only about 30 meaningful reports/day haven't been deleted after 5 minutes.

@Undo1
Copy link
Member

Undo1 commented Oct 1, 2016

As hard data, we have metasmoke deletion log records for 910 of the last 1000 posts. It'd be fairly easy to make that >99%; we could duplicate the deletion websocket on metasmoke among other possibilities. I wouldn't let technical considerations kill this, we can address those.

As for social issues, I'm not yet qualified to comment. Need to look at it more.

@csnardi
Copy link
Contributor

csnardi commented Oct 1, 2016

Deletion log records? What exactly does that mean? Because we just throw up our arms after 20 minutes and say "this post wasn't deleted": https://github.com/Charcoal-SE/SmokeDetector/blob/master/deletionwatcher.py#L62. Even though probably some of those posts were deleted.

@Undo1
Copy link
Member

Undo1 commented Oct 1, 2016

@hichris1234 Means 'there's a DeletionLog attached to the Post'. Of the last 1k true positives, 768 have a DeletionLog indicating it was deleted.

We could overcome that by running websockets on metasmoke, of course, and it wouldn't be too hard if we wanted to do it.

@ArtOfCode-
Copy link
Member

Seems to me that we're slowly growing into a pattern of rejecting otherwise-good ideas because our current technical status doesn't support them. Come on, we're programmers, technical limitations really don't matter. If someone's had a good idea, let's evaluate it based on its potential benefit/risk to us, rather than "oh but we can't do that because X" - we can do it, someone just needs to write a bit of code.

And now down off my soapbox...

I think this idea is a good one. I went ahead and implemented the API route on metasmoke to get the data; that's deployed and (I think) working. I also had a go at the Smokey command, but it looks like I've misunderstood how commands work so that just fails. If we're doing it, someone else is going to need to do it.

I see the potential benefit as greater coverage for making sure spam gets deleted. I see the potential risk as there being a slightly increased potential for robo-flagging; I don't think that's a big problem because Smokey in itself is highly at risk of robo-flagging, but that's never been a problem for us.

@csnardi
Copy link
Contributor

csnardi commented Oct 1, 2016

Technical limitations don't matter, sure, but we have to evaluate the cost of implementing an accurate system. Personally, I think that cost is more to us than the benefit of this command would be.

Currently, any way this is implemented will be inaccurate. We'd have to check every single post that we didn't record as deleted to see if it was deleted -- probably on the metasmoke side. Is it worth that cost?

And is this a problem we really need to be solving? I don't think so. Honestly, the easiest way to implement this is just click the last 10 reports and see if they need more flags. That's what I do -- it's quick, easy, and doesn't generate much noise. And if something is long-lasting and obvious spam, well, you can post a message saying that. And, as I mentioned above, only about 30 reports aren't deleted within the first 5-7 minutes. This command would only be useful to a very limited amount of reports -- and I can't see the usefulness of running it every so often rather than just spending 10 seconds clicking each report.

This isn't an issue of technical limitations. This is an issue of is it worth it and should we do it -- which for me, is personally a no.

@Undo1
Copy link
Member

Undo1 commented Oct 1, 2016

@hichris1234 I'd disagree - it'd be cheap (from an AWS-resources point of view) on the metasmoke side, and I've been looking for something to work on in metasmoke anyway so my time is free.

I see this being useful during the peak spam time, which happens to be the time when I'm sleepy. Sleepy me doesn't like clicking on the last 10 reports just to get 404s on most of them. I also don't like doing that while I'm on a crazy-restrictive datacap like I am now.

Anyway, for me, it seems worth it. Laziness is good.

@ByteCommander
Copy link
Member Author

Laziness is the programmer's greatest motivation. 😉

I agree with @Undo1 here, if a few dozen people have to click through the last 10 reports to get a 404 most of the time just to catch posts that potentially need more attention, I see lots of potential for laziness there.

@ArtOfCode- You said you already implemented something in Metasmoke - can we see something on the site already or is it just an API function without UI?

@ArtOfCode-
Copy link
Member

@ByteCommander It was an API route; it's also rather broken at the moment (read: queries were taking 3 minutes to execute).

@Wrzlprmft
Copy link

Today, a blatant spam post on Graphic Design survived for two hours due to being buried under reports from bigger sites, and this is not the first time I observed something like this. I usually note these cases when reviewing on the respective sites. This may happen more often than we are aware of.

@hichris1234 : 30 reports per day is a lot given that we have about a hundred reports per day (at least according to Metasmoke).

To avoid nuking false positives, we can exclude everything that got any feedback other than spam/abusive, i.e., FP, NAA, vandalism, …

@NobodyNada
Copy link
Member

To get around deletion data on Metasmoke being incomplete, we can always do another API request to check if the post has been deleted immediately before posting it to chat.

@ghost
Copy link

ghost commented Jan 4, 2017

@Wrzlprmft my idea is seperate smoke room per site

@Undo1
Copy link
Member

Undo1 commented Jan 4, 2017

@markyi370 We can already do that, and we do for some sites that request it.

@Wrzlprmft
Copy link

@markyi370 How would this solve this issue?

@teward teward closed this as completed Jan 4, 2017
@teward teward reopened this Jan 4, 2017
@teward
Copy link
Member

teward commented Jan 4, 2017

@markyi370 That doesn't solve the core problem. And we already implement this, i.e. SOCVR and the notices for Ask Ubuntu being CC'd to the Ask Ubuntu General Room, and other cases as well. This doesn't solve the core issue though.

@ghost
Copy link

ghost commented Jan 27, 2017

Which is better, using MS or having Smokey manually keep the database itself? I say that keeping it on Smokey is better because it is then local.

The only issue with Smokey that I see is that if we push something to the blacklist, when it restarts, it will lose that list.

Solution: Keep the list in a file, manually adding and removing entries as needed.

I'm no Python expert, but maybe something like this?

http://stackoverflow.com/q/1989251/6754053

Just my thoughts on the matter.

@NobodyNada
Copy link
Member

@markyi370 The blacklist is already stored on Smokey and on GitHub. Are you suggesting also storing the reported posts on Smokey? I don't think that would be a good idea, because:

  • Metasmoke needs to access the data, and it needs to be able to do so quickly. If the database is moved to Smokey, MS will have to query SD every time a page is accessed or an API request is made. That will make page loading times much longer.

  • Smokey runs on lots of different machines. Right now, it's running on Aurora, but sometimes it runs on ArtOfCode's EC2, sometimes on Undo's EC2, sometimes on Undo's Pi, and probably several others that I'm missing. If we were to store the database on SD, we'd have to copy the database over to each of the machines, while with the current system we can just give each machine a MS token. Also, if one Smokey went down, we couldn't launch a backup without coordinating with the owner of the downed Smokey, since we'd need to copy the database.

  • Smokey rarely needs database access. When it does, for example to find non-deleted true positives, it can just query MS -- it doesn't have much of a time constraint (chat has a latency of several seconds already).

@ghost
Copy link

ghost commented Feb 1, 2017

So we keep the database on MS, and fetch data on request.

@teward
Copy link
Member

teward commented Feb 1, 2017

@markyi370 We already do this. I'm confused what you're suggesting we do here - the deletion watcher and such all sits on SmokeDetector...

@angussidney
Copy link
Member

The 'Autoflagging Information and More' userscript takes care of this in a less noisy fashion, so this request is now redundant.

@angussidney angussidney added status: completed That probably took longer than we said it would. and removed status: planned 6-8 weeks. labels Feb 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: completed That probably took longer than we said it would. type: feature request Shinies.
Development

No branches or pull requests

9 participants