Introducing project Ghamhilator! #57

ArcticEcho · 2015-01-22T22:51:16Z

Today I've just started work on an NLP-based version of Pham, called Gham (as you can probably guess this bot will run under the account Gham). Gham's ultimate goal is to first use NLP (i.e., primarily a POS tagger) to build "models" (linguistic patterns) of spam, offensive & low quality posts which can then later be used to identify such posts (I aim for this entire process to be automated, but he will accept FP/TPs).

The exact inner workings of Gham have not yet been "set in stone", so feel free to put forward any ideas/suggestions.

Unihedro · 2015-01-23T09:59:37Z

Wow, this is a spectacular announcement! As spam is frequently "decorated" and made harder to be caught in specialized terms, this will be a massive breakthrough as it more effectively captures low quality posts. One thing though - How are you going to scrap Mathjax code and other code blocks like the chess widgets on Chess.SE? And (probably) more importantly, how will you handle foreign sites?

ArcticEcho · 2015-01-23T10:19:37Z

Thanks! Well this where I'll be adding another project which only focuses on fetching/parsing posts (real-time) and locally broadcasting the data to Gham and Pham (effectively splitting what Pham already does into a separate project (called Yham!)).

As for Mathjax/chess widgets, by default Yham fetches the post's HTML, which is useful for Pham, but not so much for Gham (as he can only analyse English words, so foreign sites will also forfeit Gham's scope). Having said that, I may be able to get my hands on a few foreign language POS tagger models; although, I doubt the extra effort of adding even more models for sites that don't actually attract many "bad" posts will pay off.

honnza · 2015-01-23T10:21:26Z

Yham sounds terrible. I'd suggest "Yam".

On Fri, Jan 23, 2015 at 11:19 AM, Sam notifications@github.com wrote:

Thanks! Well this where I'll be adding another project which only focuses
on fetching/parsing posts (real-time) and locally broadcasting the data to
Gham and Pham (effectively splitting what Pham already does into a separate
project (called Yham!)).

As for Mathjax/chess widgets, by default Yham fetches the post's HTML,
which is useful for Pham, but not so much for Gham (as he can only analyse
English words, so foreign sites will also forfeit Gham's scope). Having
said that, I may be able to get my hands on a few foreign language POS
tagger models; although, I doubt the extra effort of adding even more
models for sites that don't actually attract many "bad" posts will pay off.

—
Reply to this email directly or view it on GitHub
#57 (comment)
.

ArcticEcho · 2015-01-23T10:24:45Z

[status-accepted]

thomas-daniels · 2015-01-23T15:31:28Z

That sounds cool!

ArcticEcho · 2015-07-12T22:36:10Z

In light of further discussion and testing, PoS tagging doesn't currently appear to be the most effective way to classify LQ posts. As such, all PoS tagging functionality will now be replaced with a weighted cue-based classification algorithm.

ArcticEcho · 2015-11-27T21:43:22Z

All NLP-based classification is now being moved to Pham. For now, we'll leave Gham to rest.

ArcticEcho added Low Priority Discussion labels Jan 22, 2015

ArcticEcho self-assigned this Jan 22, 2015

ArcticEcho added Yam Gham labels Jan 23, 2015

ArcticEcho added the in progress label May 3, 2015

ArcticEcho removed in progress Low Priority labels Jul 12, 2015

ArcticEcho closed this as completed Nov 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing project Ghamhilator! #57

Introducing project Ghamhilator! #57

ArcticEcho commented Jan 22, 2015

Unihedro commented Jan 23, 2015

ArcticEcho commented Jan 23, 2015

honnza commented Jan 23, 2015

ArcticEcho commented Jan 23, 2015

thomas-daniels commented Jan 23, 2015

ArcticEcho commented Jul 12, 2015

ArcticEcho commented Nov 27, 2015

Introducing project Ghamhilator! #57

Introducing project Ghamhilator! #57

Comments

ArcticEcho commented Jan 22, 2015

Unihedro commented Jan 23, 2015

ArcticEcho commented Jan 23, 2015

honnza commented Jan 23, 2015

ArcticEcho commented Jan 23, 2015

thomas-daniels commented Jan 23, 2015

ArcticEcho commented Jul 12, 2015

ArcticEcho commented Nov 27, 2015