Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are developers with high participation/degree more likely to have missed vulnerabilities? #245

Closed
kbaumzie opened this issue Mar 4, 2016 · 5 comments
Assignees

Comments

@kbaumzie
Copy link
Collaborator

kbaumzie commented Mar 4, 2016

Used vuln_misses for this. Take a look at using Spearman's rank correlation coefficient. Figure out what these mean, and then report them here. Search for "spearman" in our code base see how we use it.

@kbaumzie kbaumzie self-assigned this Mar 6, 2016
@kbaumzie
Copy link
Collaborator Author

kbaumzie commented Mar 6, 2016

During a Google Tech talk that I have just attended this past week, a Google developer was talking about the procedure for committing, owning, and participating on code and code reviews. One interesting thing he noted was that when a developer leaves Google (quits, etc.) someone takes over the ownership of their files. I am not sure how this is could affect our data or even how to measure who is eligible for taking over ownership of a file. If this is the case, would ownership of a new file increase their degree?

This also makes me question who carries the blame for the vulnerabilities missed on each of these files if they were once owned by a different developer?

@kbaumzie
Copy link
Collaborator Author

Take a look at Pearson (less sensitive to outliers)
High degree -> high betweenness

Do a rake run with R on the console

@kbaumzie
Copy link
Collaborator Author

kbaumzie commented Apr 6, 2016

image

Correlations have been found to be strong with betweenness, degree, and closeness. This challenges what we have been researching where we have now found that being more central will yield a higher count of vulnerability misses. My next steps will be to address perc_vuln_misses (percentage of vulnerabilities missed) to actually see missed vulnerabilities per developer, per period --> vuln_misses/participation.

After this, we should include vuln_misses in our code reviews table by count and by boolean. Be careful not to double count the same vulnerability twice (use distinct). This allows us to look at other metrics in the given code review.

@kbaumzie
Copy link
Collaborator Author

Currently referencing an incorrect variable name in our developer_snapshots table in file dev_analysis.rb.
perc_missed_vuln
Should be changed to perc_vuln_misses after @sso7159 refactors this change in devCollaboration.py.

Included changes to perc_missed_vuln:

spearman_percVM_deg <- cor(dev_snap$perc_missed_vuln, dev_snap$degree, method="spearman")
spearman_percVM_sher <- cor(dev_snap$perc_missed_vuln, dev_snap$sheriff_hrs, method="spearman")
spearman_percVM_close <- cor(dev_snap$perc_missed_vuln, dev_snap$closeness, method="spearman")
spearman_percVM_bet <- cor(dev_snap$perc_missed_vuln, dev_snap$betweenness, method="spearman")

Experiencing an error when running the file via rake run:dev where R is showing that the correlation is not returning a number: NaN. Any thoughts as to why this is happening? My understanding of Spearman correlation is that it can easily correlate two different things hence percentage vs. float value.

@andymeneely
Copy link
Owner

@kbaumzie kbaumzie closed this as completed Sep 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants