Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release of 133 statements (strict subset) #371

Merged
merged 1 commit into from
Jul 24, 2023

Conversation

matteogreek
Copy link
Collaborator

@matteogreek matteogreek commented Jul 4, 2023

Background

While operating Eclipse Steady internally at SAP, the SAP Security Research team collected a dataset of approximately 1400 vulnerability statements of which a first dataset was published in 2019 as part of project KB (and described in MSR 2019).

Our goal is to disclose an additional batch of vulnerability statements from the SAP-internal dataset and to make them available to the community. To do so, we used Prospector to search for fix commits for the vulnerabilities corresponding to those internal data and we compared the findings of the tool with the fix commits we had identified through manual search.

Objective

With this PR we release 133 new statements for which the results found by Prospector (according to the criteria detailed below) matched the fix commits that appeared in the statements of our private dataset. Differently from #369 and #370, the commits of these new statements were a strict subset of Prospector's findings.

Analysis Process

The process begins by executing Prospector to automatically identify fix commits for every vulnerability listed in our private dataset, using the vulnerability identifier and the URL of the vulnerable project's GitHub repository as input parameters. The internal dataset was used for both input parameters.

Upon completion of Prospector execution, an evaluation was performed examining all results from Prospector's findings, extracting candidate fix commits based on the rules that matched.
The ranking system of Prospector evaluates each candidate fix commit based on predefined rules, assigning a relevance value to each. To ensure the highest level of confidence in identifying the commit as an effective patch, the statements released with this PR only contain fix commits that matched at least one high-relevance rule.

It is important to note that Prospector introduces the concept of twin commit. Twin commits can be categorized as an equivalent fix commit from a different, parallel branch. These twin commits refer to changes that are made on one branch and then applied to other branches that support different versions of the project.
To better understand the impact of identifying twin commits when comparing the results gathered from Prospector with the internal dataset, we proceeded with two distinct evaluative measures.

  1. In the initial evaluation, the list of fix commits found by Prospector is composed of all high-confidence commits and their corresponding twins. The extracted commits were later used for the comparison with the internal dataset without distinguishing between twins and candidate fix commits.
  2. Later, we chose to replicate the commit extraction methodology, taking commits that matched at least a strong Prospector rule as before, but this time excluding all twin commits.

After having extracted all high-confidence commits from Prospector's findings and grouped them for each vulnerability, we compared the results with our internal dataset. We decided to release as valid statements those that aligned with at least one of the following three validation criteria.

  1. Exact match: Prospector results matched exactly the fix commits that appeared in the internal dataset.
  2. Exact match - twins excluded: Like the previous case but this time excluding twin commits as part of the set of fix commits.
  3. Strict subset: The commits of the internal dataset were a strict subset of Prospector results.

Results

In contrast to instances where Prospector identified an exact correspondence with the commits in the dataset, we also aim to publish new statements in which the dataset's commits compose a strict subset of the high-confidence fix commits identified by Prospector. This approach allows the disclosure of 133 additional statements validated through Prospector. For the initial publication of these new statements, we have decided to release those containing the commits present in the dataset exclusively, without adding any supplementary ones uncovered through Prospector.

@copernico copernico merged commit 3cba542 into SAP:vulnerability-data Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants