Recommend term deletion/modification once a specific TP/FP ratio has been reached #31

ArcticEcho · 2014-11-04T22:00:24Z

For example, whenever someone FP/TPs a report, Pham could quickly check each found blacklist term's TP/FP ratio to determine whether it is returning an unusually high number of FPs. If, say the term's ratio is <1:5, Pham would suggest that the term needs editing (to try to improve its "accuracy"). And if the term's ratio is above 1:10, Pham would suggest term deletion.

Any thoughts?

Unihedro · 2014-11-05T03:58:48Z

So we'll have Recommend Review: term (term here) message?

ArcticEcho · 2014-11-05T10:19:28Z

I was kinda thinking of The following term(s): (term(s) here), attract a high number of FPs. Term review is recommended. and The following term(s): (term(s) here), attract a *very* high number of FPs. Term deletion is recommended. Come to think of it, a ratio of 1 to 5, and 1 to 10 are probably a little too high. Perhaps 1 to 3, and 1 to 6, instead?

honnza · 2014-11-05T10:36:20Z

The correct ratios depend on the amount of spam during the measurement
period. A better solution would be based on sensitivity and specificity
(say, sens + spec < 100%?)

On Wed, Nov 5, 2014 at 11:19 AM, Sam notifications@github.com wrote:

I was kinda thinking of The following term(s): (term(s) here), attract a
high number of FPs. Term review is recommended. and The following
term(s): (term(s) here), attract a very high number of FPs. Term deletion
is recommended. Come to think of it, a ratio of 1 to 5, and 1 to 10 are
probably a little too high. Perhaps 1 to 3, and 1 to 6, instead?

—
Reply to this email directly or view it on GitHub
#31 (comment)
.

ArcticEcho · 2014-11-05T11:04:46Z

Yes, using the sens/spec metrics would seem to be a much better solution. The exact ratios may need some adjusting, shall we start with <25% = recommend deletion, <50% = recommend review?

Unihedro · 2014-11-05T11:09:03Z

<offtopic>

How come what you wrote in your comment was different from what I received through mail?

</offtopic>

honnza · 2014-11-05T11:13:03Z

spec + sens is always <= 200%

.. or should be, if the estimates were at least somewhat sensible, which
I'm not quite sure of, yet. Let's fix the stats first before implementing a
stat-based term review system

On Wed, Nov 5, 2014 at 12:04 PM, Sam notifications@github.com wrote:

Yes, using the sens/spec metrics would seem to be a much better solution.
The exact ratios may need some adjusting, but shall we start with <100% =
recommend deletion, <200% = recommend review?

—
Reply to this email directly or view it on GitHub
#31 (comment)
.

ArcticEcho · 2014-11-05T11:16:17Z

@Vincentyification I edited my comment after discovering a bug with the sens/spec calculations.

ArcticEcho · 2014-11-05T11:16:49Z

@honnza Agreed.

ghost · 2014-12-31T12:06:46Z

Would it be a good idea to keep a list of removed terms? This way, if a user attempts to re-add a term on said list, Pham would reply with how it was removed, the term's stats (preserved from time of deletion), and listen for a y/n command whether to add it back.

ArcticEcho · 2014-12-31T12:14:31Z

Good idea. I'll add that too.

ArcticEcho added Feature Request Low Priority labels Nov 4, 2014

ArcticEcho self-assigned this Nov 4, 2014

ArcticEcho added the Deferred label Dec 6, 2014

ArcticEcho added the Pham label Jan 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommend term deletion/modification once a specific TP/FP ratio has been reached #31

Recommend term deletion/modification once a specific TP/FP ratio has been reached #31

ArcticEcho commented Nov 4, 2014

Unihedro commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

honnza commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

Unihedro commented Nov 5, 2014

honnza commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

ghost commented Dec 31, 2014

ArcticEcho commented Dec 31, 2014

Recommend term deletion/modification once a specific TP/FP ratio has been reached #31

Recommend term deletion/modification once a specific TP/FP ratio has been reached #31

Comments

ArcticEcho commented Nov 4, 2014

Unihedro commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

honnza commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

Unihedro commented Nov 5, 2014

honnza commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

ArcticEcho commented Nov 5, 2014

ghost commented Dec 31, 2014

ArcticEcho commented Dec 31, 2014