Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit follower gathering option #4

Open
addyosmani opened this issue Jun 20, 2013 · 1 comment
Open

Limit follower gathering option #4

addyosmani opened this issue Jun 20, 2013 · 1 comment

Comments

@addyosmani
Copy link

In the below screenshot a lot of time is spent in the gathering followers step. What is actually happening in this process? Are we making bulk requests for just a list of followers from the API or are we fetching individual records for each of those followers?

In either case I would probably also enforce an option to limit this. Again maybe down to 300-500. It will affect the results, but should speed up overall analysis time.

screen shot 2013-06-20 at 18 53 01

@anvaka
Copy link
Owner

anvaka commented Jun 21, 2013

In this step it gathers all login names of users who gave a star to the project. Later the algorithm shuffles array of found logins to get random distribution and takes only 500 of them. These 500 are used for analysis of their starred projects.

While it's tempting to get only 5 random pages of project's followers, I think it will not work. I noticed a project can get lots of stars during some sort of event (e.g. when project was featured in news, podcast, etc.) - this gives uneven distribution of followers.

But, my old implementation was sequential - it processed only one page of login names at once. I converted it into parallel now and overall processes does not take insane amount of time anymore.

It is still not perfect, and could be done better. For example, having a server side with cached list of followers for popular repositories would make this step very fast. Something to think about :)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants