Skip to content
sso7159 edited this page Feb 8, 2016 · 1 revision

During the break and first week of school I made corrections to the devCollaborations.py file that allowed us to properly query how many files a developer committed to during a time period (correlating to one release cycle) that became vulnerable in the next year ahead. I also added correlations in our devAnalysis.rb file that looked for correlations between the number of vulnerabilities a developer missed and their sheriff hours, degree, closeness, and betweenness values.

The final query changed a bit as well:

SELECT DISTINCT ON(code_reviews_cvenums.cvenum_id, code_reviews.issue, missed_code_reviews.issue, participants.dev_id)
      code_reviews_cvenums.cvenum_id,
      code_reviews.issue,
      fix_commits.created_at,
      missed_code_reviews.issue,
      missed_commits.created_at,
      participants.dev_id,
      missed_commit_filepaths.filepath
FROM code_reviews
     INNER JOIN code_reviews_cvenums ON code_reviews.issue = code_reviews_cvenums.code_review_id
     INNER JOIN commits AS fix_commits ON code_reviews.commit_hash = fix_commits.commit_hash
     INNER JOIN commit_filepaths AS fix_commit_filepaths ON fix_commits.commit_hash = fix_commit_filepaths.commit_hash
     INNER JOIN commit_filepaths AS missed_commit_filepaths ON missed_commit_filepaths.filepath = fix_commit_filepaths.filepath
     INNER JOIN commits AS missed_commits ON missed_commits.commit_hash = missed_commit_filepaths.commit_hash AND missed_commits.created_at > (fix_commits.created_at - interval '1 year') AND missed_commits.created_at < fix_commits.created_at
     INNER JOIN code_reviews AS missed_code_reviews ON missed_code_reviews.commit_hash = missed_commits.commit_hash
     INNER JOIN participants ON participants.issue = missed_code_reviews.issue
WHERE missed_code_review.created >= '" + early boundary + "' AND missed_code_review.created < '" + late boundary + "' +
ORDER BY code_reviews_cvenums.cvenum_id, code_reviews.issue, missed_code_reviews.issue, participants.dev_id;

The correlations we got from our analysis were:

--Spearman on missed vulnerabilities VS shrf_hrs/deg/close/bet--

missed vuln vs degree: 0.6765216208336005
missed vuln vs sheriff hours: 0.413822605601193
missed vuln vs closeness: 0.706683945951501
missed vuln vs betweenness: 0.6846610002162218