A scraper and classifier for snarky SO comments.
Snark by the hour

Inspired by Davy M.'s post on Stack Overflow Meta as a reaction on SO's June Update. I've decided to make the Snark by the hour feed happen. It seems like a fun thing to make and I really couldn't agree more with Davy's words:

My idea is a Snark by the Hour live feed so I can enjoy the sarcasm and snark of my favorite Stack Overflow users in real time, but that's just me.

If you happen to have a snarkiness-classifier (with a training set) laying around, feel free to let me now in an issue, or better yet; throw me a pull request. (Yes, kind people of SO, I am hinting at you guys here.)


  • Gather some training data
    • Code data collector
      • Hopefully improve this if this API issue ever gets fixed. Or find a work around ? Which is not too hard for comments that have been added since the collectors last run.
    • Extract possible features
      • Determine sarcasm. (Waiting for my pull request to AniSkywalker/SarcasmDetection to be merged.)
    • Clean-up code and make seperate modules for collection and feature extraction
    • Grab data from API
    • Manually rate a metric shit ton of comments.
  • Come up with a snarkiness classifier
    • Visualize data.
    • Select relevant features
    • Determine appropriate classifier
  • Make some web interface to show this potentially hilarious data
  • Write a classifier for questions that are likely to yield snarky comments
