You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the below screenshot a lot of time is spent in the gathering followers step. What is actually happening in this process? Are we making bulk requests for just a list of followers from the API or are we fetching individual records for each of those followers?
In either case I would probably also enforce an option to limit this. Again maybe down to 300-500. It will affect the results, but should speed up overall analysis time.
The text was updated successfully, but these errors were encountered:
In this step it gathers all login names of users who gave a star to the project. Later the algorithm shuffles array of found logins to get random distribution and takes only 500 of them. These 500 are used for analysis of their starred projects.
While it's tempting to get only 5 random pages of project's followers, I think it will not work. I noticed a project can get lots of stars during some sort of event (e.g. when project was featured in news, podcast, etc.) - this gives uneven distribution of followers.
But, my old implementation was sequential - it processed only one page of login names at once. I converted it into parallel now and overall processes does not take insane amount of time anymore.
It is still not perfect, and could be done better. For example, having a server side with cached list of followers for popular repositories would make this step very fast. Something to think about :)..
In the below screenshot a lot of time is spent in the gathering followers step. What is actually happening in this process? Are we making bulk requests for just a list of followers from the API or are we fetching individual records for each of those followers?
In either case I would probably also enforce an option to limit this. Again maybe down to 300-500. It will affect the results, but should speed up overall analysis time.
The text was updated successfully, but these errors were encountered: