webmining
We use NLTK, scikit
We used some basic routines in this project.
Regex tokenizer Cut the tweeters into words.
TF-IDF feature extractor We use basic TF-IDF for this.
Additional feature We added some words that we thought might be useful, like "rainy", "windy"...
It's a regression problem so we used several regression methods, including: Ridge regression SVM regression (RBF kernel, linear) Random forest
See submission/