Limit follower gathering option #4

addyosmani · 2013-06-20T17:54:48Z

In the below screenshot a lot of time is spent in the gathering followers step. What is actually happening in this process? Are we making bulk requests for just a list of followers from the API or are we fetching individual records for each of those followers?

In either case I would probably also enforce an option to limit this. Again maybe down to 300-500. It will affect the results, but should speed up overall analysis time.

anvaka · 2013-06-21T09:58:53Z

In this step it gathers all login names of users who gave a star to the project. Later the algorithm shuffles array of found logins to get random distribution and takes only 500 of them. These 500 are used for analysis of their starred projects.

While it's tempting to get only 5 random pages of project's followers, I think it will not work. I noticed a project can get lots of stars during some sort of event (e.g. when project was featured in news, podcast, etc.) - this gives uneven distribution of followers.

But, my old implementation was sequential - it processed only one page of login names at once. I converted it into parallel now and overall processes does not take insane amount of time anymore.

It is still not perfect, and could be done better. For example, having a server side with cached list of followers for popular repositories would make this step very fast. Something to think about :)..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit follower gathering option #4

Limit follower gathering option #4

addyosmani commented Jun 20, 2013

anvaka commented Jun 21, 2013

Limit follower gathering option #4

Limit follower gathering option #4

Comments

addyosmani commented Jun 20, 2013

anvaka commented Jun 21, 2013