Add support for collecting fix commits and (PRs and issues)#1
Add support for collecting fix commits and (PRs and issues)#1ziadhany wants to merge 8 commits intoaboutcode-data:mainfrom
Conversation
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Make sure the pipline throw error if the no token inserted Update the pipeline to use repo secrets avoid env secrets for github actions Signed-off-by: ziad hany <ziadhany2016@gmail.com>
|
@keshav-space, please have a look when you have a time. I've run the pipelines and generated the data, see: |
Add more target repo for fix commits collection Signed-off-by: ziad hany <ziadhany2016@gmail.com>
|
@TG1999 I just added a test, please have a look once you have some time. |
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
pombredanne
left a comment
There was a problem hiding this comment.
Can you elaborate on your approach and design?
Why would this code not be part of VulnerableCode? I am not sure I understand the working logic here? My understanding was that we have potential improvers:
- input with advisories in VCIO and then we need to extract commits and patches from the reference URLs
- input with fixed PURLs for an advisory in VCIO and then we need to determine if we can get a commit out of it
- input with fixed PURLs for an advisory in VCIO and then we need to find in the commit logs (or PRs, or issues) if we have a a good fix commit
... and some improver/importers, that scout VCS commit logs and PRs of a specific package possibly between two versions to find if one if a fix commit or patch and either improve (where we focused on a specific advisory, or would create a new advisory from the commit data.
In that sense, the list of targets is NOT fixed, but instead something that is dynamically computed from the actual data in VCIO?
|
@pombredanne The main idea behind this design is that cloning repositories, parsing commit messages, and querying the GitHub/GitLab APIs (which involves handling rate limits) take time. Running these tasks directly in VCIO would overwhelm the pipeline workers. Because of this, I thought it would be better to create a mirror and import this data using a single pipeline in VulnerableCode: There is also a problem with getting dynamic Git repos targets : some Git repos are just vulnerability data sources or exploit repo url, not actual source code. It is not easy to differentiate between a Git repo that contains source code and one that merely contains vulnerability data. We also have other PRs addressing related parts of this workflow: Extracting commits and patches from reference URLs: An API to query using commit_id, purl, or vcs_url: However, we do not currently have a way to scout VCS commit logs and PRs for a specific package (e.g., between two versions) to determine if one is a valid fix commit or patch. I think i should think about this more. |
Issue:
Related PRs: