Introducing project Ghamhilator! #57
Comments
Wow, this is a spectacular announcement! As spam is frequently "decorated" and made harder to be caught in specialized terms, this will be a massive breakthrough as it more effectively captures low quality posts. One thing though - How are you going to scrap Mathjax code and other code blocks like the chess widgets on Chess.SE? And (probably) more importantly, how will you handle foreign sites? |
Thanks! Well this where I'll be adding another project which only focuses on fetching/parsing posts (real-time) and locally broadcasting the data to Gham and Pham (effectively splitting what Pham already does into a separate project (called Yham!)). As for Mathjax/chess widgets, by default Yham fetches the post's HTML, which is useful for Pham, but not so much for Gham (as he can only analyse English words, so foreign sites will also forfeit Gham's scope). Having said that, I may be able to get my hands on a few foreign language POS tagger models; although, I doubt the extra effort of adding even more models for sites that don't actually attract many "bad" posts will pay off. |
Yham sounds terrible. I'd suggest "Yam". On Fri, Jan 23, 2015 at 11:19 AM, Sam notifications@github.com wrote:
|
|
That sounds cool! |
In light of further discussion and testing, PoS tagging doesn't currently appear to be the most effective way to classify LQ posts. As such, all PoS tagging functionality will now be replaced with a weighted cue-based classification algorithm. |
All NLP-based classification is now being moved to Pham. For now, we'll leave Gham to rest. |
Today I've just started work on an NLP-based version of Pham, called Gham (as you can probably guess this bot will run under the account Gham). Gham's ultimate goal is to first use NLP (i.e., primarily a POS tagger) to build "models" (linguistic patterns) of spam, offensive & low quality posts which can then later be used to identify such posts (I aim for this entire process to be automated, but he will accept FP/TPs).
The exact inner workings of Gham have not yet been "set in stone", so feel free to put forward any ideas/suggestions.
The text was updated successfully, but these errors were encountered: