Slightly better than random #4

mdagost · 2013-12-13T16:06:35Z

It doesn't look like there's a ton of info in the features in this dataset. I'd be interested to know how Grouper's best algorithmn does with these features. I tried lots of different things (feature engineering, various linear models, SVM's, random forests, etc.) Most were around 56% accurate on cross-validation sets. The best random forest model was about 57% accurate and finds, probably unsurprisingly, that the Facebook activity variables are most predictive of whether the users become Facebook friends.

My code has an ipython notebook for exploratory plotting, model building, and variable importance. It uses grid search and cross-validation to find the best type of model. You can run the regular python code to train that model:

/bin/bash cleanDataSets.sh
virtualenv .
pip install -r requirements.txt
python grouper_model.py

Final predictions are in test_data_withpreds.csv

Michelangelo D'Agostino added 2 commits December 9, 2013 17:04

An initial ipython notebook.

9ec78e7

Committing all the final files.

fab3713

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly better than random #4

Slightly better than random #4

mdagost commented Dec 13, 2013

Slightly better than random #4

Are you sure you want to change the base?

Slightly better than random #4

Conversation

mdagost commented Dec 13, 2013