public
Description: A simple machine learned tagger trained on BOSS web / delicious data
Homepage: http://zooie.wordpress.com
Clone URL: git://github.com/zooie/tagger.git
zooie (author)
Fri Oct 09 12:13:52 -0700 2009
commit  1a476c505933b53728387d17c63a251dd42393c4
tree    1b669b2a3fd9da059096cb9cd784ef86bdc59496
parent  519026a29582a1c8f2154b685b57c32d7c86cc56
tagger /
name age message
file README Fri Oct 09 12:13:52 -0700 2009 crawl is optional disclaimer [zooie]
file autosvm.py Fri Oct 09 09:05:25 -0700 2009 All code now references featurize for feature g... [zooie]
file classify.py Fri Oct 09 11:51:32 -0700 2009 Caveat doc included [zooie]
file conf.py Fri Oct 09 09:05:25 -0700 2009 All code now references featurize for feature g... [zooie]
file crawl_delicious.py Fri Oct 09 11:32:25 -0700 2009 Fresh commit [zooie]
file featurize.py Fri Oct 09 09:05:25 -0700 2009 All code now references featurize for feature g... [zooie]
file gen_training_test_set.py Fri Oct 09 09:05:25 -0700 2009 All code now references featurize for feature g... [zooie]
file libsvm-2.89.tar.gz Wed Oct 07 20:00:46 -0700 2009 Initial commit [zooie]
file tags.txt Wed Oct 07 20:00:46 -0700 2009 Initial commit [zooie]
file test_data.txt Fri Oct 09 11:32:25 -0700 2009 Fresh commit [zooie]
file training_data.txt Fri Oct 09 11:32:25 -0700 2009 Fresh commit [zooie]
file vector_data.cpickle Fri Oct 09 11:32:25 -0700 2009 Fresh commit [zooie]
README
@author: Vik Singh (viksi@yahoo-inc.com)

A simple BOSS example for Yahoo! Hack Day NYC
A machine learned tagger trained on BOSS web / delicious data

Read this if you want to learn more and especially check out the caveats section if
you're planning to use this code for more practical purposes

http://zooie.wordpress.com/2009/10/09/build-an-automatic-tagger-in-200-lines-with-boss/


# Install libsvm

tar -xzvf libsvm-2.89.tar.gz
cd libsvm-2.89
make

cd ..

# Optional: Crawl fresh delicious data via BOSS (a previous crawl already included)

python crawl_delicious.py

# Generate a binary training set via two tags (pick from tags.txt)

python gen_training_test_set.py microsoft google

# Learn from the resulting training_data.txt and predict on test_data.txt

python autosvm.py training_set.txt test_set.txt

# Prints out the accuracy of the learner and saves model + prediction files in timestamped folder