gazer

This project aims to analyze followers base of a GitHub repository and suggest related projects. It is using information about shared number of stars to calculate similarity index between two projects.

Try it yourself

Hosted version of the app is available here: http://www.yasiv.com/github/#/ It already knows about 15,162 popular projects. If your project had more than 200 stars on Nov 30th, 2014 most likely you will get suggestions immediately. Otherwise the site will build similarities in the real time. Make sure to sort by "Similarity coefficient" when application completes gathering information.

Offline index is produced by ghindex

Hows and Whys

It's kind of experiment of my own curiosity. I wanted to find a mobile UI library for the web. After googling around I found a library, but I wanted to see more related projects. GitHub did not provide this feature, so I developed a simple metric to calculate similarity of two projects.

// Metric 1: Similarity measure of two projects A and B.
similarity = 2 * sharedStarsCount(A, B)/(numberOfStars(A) + numberOfStars(B));

While this is very naive formula, in practice it gives interesting results. For example, among top related projects for my graph drawing library vivagraph.js (650 stars):

strathausen/dracula - JavaScript browser based layout and representation of connected graphs. (274 stars)
jacomyal/sigma.js - an open-source lightweight JavaScript graph drawing library (1,395 stars)
samizdatco/arbor - a graph visualization library using web workers and jQuery (1,221 stars)
dhotson/springy - A force directed graph layout algorithm in JavaScript (639 stars)
uskudnik/GraphGL - A network visualization library (70 stars)

For popular projects, with more than 2-3k stars, the metric [1] can be polluted by other popular projects (like Backbone, or Bootstrap): We, developers, all tend to like beautiful code. Surprisingly, the amount of "popular noise" can be significantly reduced by analyzing limited subset of random stargazers. Metric [1] can be rewritten as

// Metric 2: Similarity measure of two popular projects A and B
weight = randomSubsetSize/numberOfStars(A);
similarity 2 * sharedStarsCount(A, B)/(weight * (randomSubsetSize + numberOfStars(B)));

Here is an example of angular.js (11K followers) analysis on random subset of 500 followers:

angular-ui/angular-ui - AngularUI - The companion suite for AngularJS (2,119 stars)
angular/angular-seed - Seed project for angular apps. (1,998 stars)
jmcunningham/AngularJS-Learning - A bunch of links to blog posts, articles, videos, etc for learning AngularJS (2,465 stars)
angular-ui/bootstrap - Native AngularJS (Angular) directives for Twitter's Bootstrap. Small footprint (5kB gzipped!), no 3rd party JS dependencies (jQuery, bootstrap JS) required! (1,320 stars)
mgcrea/angular-strap - Bootstrap directives for Angular (1,177 stars)

Caveats

Github does not have a bulk API, which makes processing of popular projects extremely time consuming. It could be mitigated by serving precomputed suggestions. If suggestions are not available, the time can further be reduced by limiting amount of stars to analyze (see metric [2]).
Number of requests to GitHub API is rate limited (60 per hour). Sign in to the application with OAuth to increase rate limit up to 5,000 requests per hour.
Analyzing randomized subset may produce different ranking and pick different projects as the best match. But you will notice the same projects are being picked between multiple runs of the algorithm. Pay attention to those.
The algorithm will not work for projects with small amount of stars. I'm still not sure what is the lower bound here (100 stars?). For projects with 500+ stars quite often results are interesting.

Local build

git clone https://github.com/anvaka/gazer.git
cd gazer
npm install
bower install
grunt server

What do you think?

I would love to hear your feedback!

If you know how to make distance calculation better - I'm very open to incorporate your metrics.

This is my first angular app, so I'm still learning.

If you work at GitHub - I would love to see this feature implemented by you, guys :)

Do not hesitate to open an issue or submit a pull request :).

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
app		app
test		test
.bowerrc		.bowerrc
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.jshintrc		.jshintrc
Gruntfile.js		Gruntfile.js
LICENSE		LICENSE
README.md		README.md
bower.json		bower.json
karma-e2e.conf.js		karma-e2e.conf.js
karma.conf.js		karma.conf.js
package-lock.json		package-lock.json
package.json		package.json

License

anvaka/gazer

Folders and files

Latest commit

History

Repository files navigation

gazer

Try it yourself

Hows and Whys

Caveats

Local build

What do you think?

About

Resources

License

Stars

Watchers

Forks

Languages