Fetching contributors…
Cannot retrieve contributors at this time
100 lines (74 sloc) 4.37 KB
Copyright (c) 2007-2010 The Kaphan Foundation
This file contains instructions for running and configuring the Winnow content recommendation engine.
For information on building and installing the engine see the INSTALL file.
Please see the pages of the wiki our github repository http://github.com/WinnowTag/winnow/wiki
for detailed explanations of Winnow's architecture and operation.
Creating an item cache
Before running Winnow, you'll need to create the persistent item cache. This is a directory
"item_cache" containing the three sqlite3 databases atom.db, tokens.db and catalog.db. See the
"Item Cache" document page in the github repository wiki for more information.
To initialize the database schemas, use sqlite3 to execute the SQL shown below in each
corresponding newly created sqlite3 database file.
After you've used the classifier on a sufficient number of items, you'll need to set up the random
background. The classifier accuracy will be poor until you do this. See the section "Setting up
random background" at the end of this file.
TODO: Create a script that does this.
TODO: Provide a reasonably small item_cache already populated and with random background
initialized that can be used for testing and initial development.
CREATE TABLE entry_atom (id integer not null primary key, atom BLOB);
CREATE TABLE entry_tokens (id integer not null primary key, tokens BLOB);
CREATE TABLE entries (id integer NOT NULL PRIMARY KEY, full_id text, updated real, created_at real, last_used_at real);
CREATE TABLE "random_backgrounds" (
"entry_id" integer NOT NULL PRIMARY KEY,
constraint "random_backgrounds_entry_id" foreign key ("entry_id")
references "entries" ("id")
CREATE TABLE "tokens" (
"id" integer NOT NULL PRIMARY KEY,
"token" text NOT NULL UNIQUE
CREATE UNIQUE INDEX entries_full_id on entries(full_id);
CREATE INDEX entries_last_used_at on entries(last_used_at);
CREATE INDEX entries_updated on entries(updated);
CREATE TRIGGER random_backgrounds_entry_id
SELECT RAISE(ROLLBACK, 'delete on table "entries" violates foreign key constraint "random_backgrounds_entry_id"')
WHERE (SELECT entry_id FROM random_backgrounds WHERE entry_id = OLD.id) IS NOT NULL;
Running the Classifier
The classifier can be run using the "classifier" executable. If you performed the "make install"
of the build instructions this is likely in your PATH.
Run "classifier -h" to see a list of supported arguments.
A typical usage will be to start the classifier on a specified port and item cache directory
and a positive cutoff threshold of 0.9. You probably also want a log file to be generated. This
will be enough for most development cases.
This can be done using the following command:
classifier --db <item_cache_dir> --port 8008 -t 0.9 --log-file classifier.log
You can then tail -f classifier.log to see the log messages the classifier produces. Once a message
appears in the log saying the classifier has completed initialization you can then access it's http
server on port 8008. Initialization can sometimes take a few minutes depending on the size of the
item cache and whether the file system has cached item cache file access.
Once things are up and running Winnow will be able to talk to the classifier using the standard
classifier UI elements. You can also do a simple test by entering
http://localhost:8009/classifier.xml into a web browser to make sure everything is working.
Shutting it down
The classifier is shutdown by sending it a SIGTERM signal. This can be done using CTRL-C or kill <pid>.
Setting up the random background
After you have classified more than 1000-5000 items, set up the random background. This is done by
executing the following SQL in the item.db sqlite3 database:
...where N is the size that you want. We recommend between 1000 and 5000 items depending upon the
size of the corpus that you classify. For winnowtag.org, which works with over 3/4 million items,
we use a random background of 5000 items.
Until you set up the random background, Winnow's classification accuracy will be poor.