public
Description: Graph-theoretic approach to the Github Contest
Homepage: http://healthyalgorithms.wordpress.com
Clone URL: git://github.com/aflaxman/ppr-github-contest.git
name age message
file .gitignore Loading commit data...
file COPYING
file README
file contest.py
file ppr.py
file results.txt
README
============
Introduction
============

I always wanted to enter the NetFlix challenge, but I never got
serious about it, and now it's done.  Too bad.  But the Github Contest
is not done, and it has appealing open-source moral underpinnings.

Contest Description:
    http://contest.github.com/

How this entry is doing:
    http://contest.github.com/p/aflaxman/ppr-github-contest

Leader board:
    http://contest.github.com/leaderboard

========
Approach
========

I don't have too much time to devote to this, but I do have some, and
I already started a project that can quickly be repurposed, a
personalized pagerank (PPR) approach.  Originally, it was an attempt
to predict the cause-of-death from verbal autopsy data.  In that
setting, I found that PPR predictions did slightly worse than random
forests, but definitely better than they have any right to do.  In the
code repository recommendation setting, we shall see.


=======
Roadmap
=======

Since I don't have too much time to devote to this, I'm going to start
with a small experiment to see if the calculation is feasible with the
computer resources I have available right now.  If it's not, I can
give up.

* Step 1:  Can I calculate PPR on this .5M edge graph?  Yes, it takes 3 minutes
* Step 2:  Is this approach any good?  Make a prediction of new repos from ppr

=======
License
=======

The Github Contest requires the sourcecode for all entries to be
released under an OSI compliant license.  I think that Affero GPL fits
best with the stated goals of the contest, it is what I plan to use:
http://www.opensource.org/licenses/agpl-v3.html

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

    Copyright 2009 Abraham Flaxman