Today we’re announcing our 2009 GitHub Contest. Since the Netflix prize is now over, we figured you guys needed something to do. Here is your chance to contribute to the open source canon, make GitHub better, and possibly win two of the best prizes probably ever offered by a contest: a bottle of Pappy Van Winkle and a large GitHub account for life! We would estimate the value here, but, honestly, they’re priceless. Also, hopefully have some fun.
So, the problem is that we want to recommend repositories to you when you log into GitHub that you’ll love. How do we find the perfect projects for you? I wanted to just look at networks of what people were watching and figure out what you might like by what your friends liked. In researching collaborative filtering and recommendation systems papers I found little that is really helpful for this sort of problem, oddly, and very little open source code. Most papers I found online (for free, because I’m cheap – why aren’t all academic papers free and open, btw?) are explicit rating system based (like the Netflix prize – figuring out what you would rate something on a 1-X scale based on previous ratings) not item-based collaborative filters for binary implicit voting (like recommending new items based on past purchasing history) which seems way more useful to most websites to me.
Anyhow, so we figured perhaps you can do this better than we can. I extracted a dataset of all the repository watches in our database – close to half a million – and withheld a sample of them. I then created a test file listing the users I held watches back from. If you can write a program to analyze our dataset and best guess the watches we held back, you win our amazing prizes.
To enter the contest, check out our contest website. Basically you just put your guesses into a file named ‘results.txt’ and push it to a public GitHub project that has “http://contest.github.com” as a post-receive hook. On each push, our site will see if you’ve changed your ‘results.txt’ file then download and score it if you have. At the end of the contest, your source code has to be released under an OSI compatible license so nobody ever has to worry about this problem again. Whoever has the highest score at noon PST on Aug 30, 2009 wins. Good luck!