This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
| name | age | message | |
|---|---|---|---|
| |
.gitignore | ||
| |
COPYING | ||
| |
README | ||
| |
contest.py | ||
| |
ppr.py | ||
| |
results.txt |
README
============
Introduction
============
I always wanted to enter the NetFlix challenge, but I never got
serious about it, and now it's done. Too bad. But the Github Contest
is not done, and it has appealing open-source moral underpinnings.
Contest Description:
http://contest.github.com/
How this entry is doing:
http://contest.github.com/p/aflaxman/ppr-github-contest
Leader board:
http://contest.github.com/leaderboard
========
Approach
========
I don't have too much time to devote to this, but I do have some, and
I already started a project that can quickly be repurposed, a
personalized pagerank (PPR) approach. Originally, it was an attempt
to predict the cause-of-death from verbal autopsy data. In that
setting, I found that PPR predictions did slightly worse than random
forests, but definitely better than they have any right to do. In the
code repository recommendation setting, we shall see.
=======
Roadmap
=======
Since I don't have too much time to devote to this, I'm going to start
with a small experiment to see if the calculation is feasible with the
computer resources I have available right now. If it's not, I can
give up.
* Step 1: Can I calculate PPR on this .5M edge graph? Yes, it takes 3 minutes
* Step 2: Is this approach any good? Make a prediction of new repos from ppr
=======
License
=======
The Github Contest requires the sourcecode for all entries to be
released under an OSI compliant license. I think that Affero GPL fits
best with the stated goals of the contest, it is what I plan to use:
http://www.opensource.org/licenses/agpl-v3.html
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Copyright 2009 Abraham Flaxman








