-
Notifications
You must be signed in to change notification settings - Fork 2
Extract email addresses from commit log #6
Comments
I extracted 226 names and emails from the commit history (many of them belonged to the same person, eyeballed estimate is actually 150 individuals). I was able to link 79 of top contributors to those results. For the ones I wasn't able to link, many had commit activity prior to the event time-frame I analyzed (pre 2016). I saw at least one case where I had done a case-sensitive check of name or login. The other non-matches may have just been folks that didn't make the top contributor list based on my initial exploration of that metric. So further analysis needed but still it's a start! |
Here's a better way using the user's profile: Get a list of repos for each user (not able to filter out where fork=true, see if this is something we need to do): https://api.github.com/users/0rchard/repos |
This method seems to be returning a decent amount of identifying data, but there is a bit of overhead on account of it requiring several API calls. I will need to implement something to check for that and then to be able to pick up where it left off down the road. In the meantime I'm just dumping all the responses I get so we don't lose any data. |
I found this for SystemML when I was tracking down a Github user that had zero information associated with them: https://systemml.apache.org/community-members I think for each organization of interest, some time may need to be spent to manually create info on any lists of contributors like this that show a company affiliation. I know other communities have similar lists in various formats. |
This is starting to become an issue now and should definitely find its way into the pipeline at some point... (making api calls async) countering-bean-counting/bonnyci_shuffleboard#11 |
Before this issue can be closed, these things need to be addressed:
|
I had pulled a list of "top contributors" for the mxnet project based on overall event frequency and event type diversity previously and I need to upload that R notebook to this repo. I ran the updated shuffleboard script in chunks (due to github api limit of 5000 requests per hour) for this list of contributors (about 900 or so) to pull names + emails from commit history. I need to combine these into a single CSV and then see how well we did with identifying these contributors. The R notebook should show a) proportion of contributors with name info in their profile, b) proportion with company info, c) proportion with name/email pulled from commits, d) stretch goal: proportion with work email pulled from commit. This should also consider whether they already had company info in their profile. |
Also moved to milestone 2 folder where it belongs Part of Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Just need to finish up the section on emails analysis, write up final conclusions, and put together a slide deck summary of my findings. |
Also moved to milestone 2 folder where it belongs Part of Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Also moved to milestone 2 folder where it belongs Part of Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Also moved to milestone 2 folder where it belongs Part of Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Finished email analysis! Need to write up final conclusions and put together a slide deck with my findings. |
Also moved to milestone 2 folder where it belongs Part of Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Also moved to milestone 2 folder where it belongs Closes Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
Also moved to milestone 2 folder where it belongs Closes Issue #6 Signed-off-by: Augustina Ragwitz <augustina.ragwitz@ibm.com>
I will mark this as complete and open a new issue for the additional discoveries. |
How many contributors with minimal github info are able to be identified this way? Do the email addresses improve other identification results?
See: countering-bean-counting/bonnyci_shuffleboard#85
The text was updated successfully, but these errors were encountered: