Skip to content

Add support for collecting fix commits and (PRs and issues)#1

Open
ziadhany wants to merge 8 commits intoaboutcode-data:mainfrom
ziadhany:vcs-collector
Open

Add support for collecting fix commits and (PRs and issues)#1
ziadhany wants to merge 8 commits intoaboutcode-data:mainfrom
ziadhany:vcs-collector

Conversation

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Make sure the pipline throw  error if the no token inserted

Update the pipeline to use repo secrets avoid env secrets for github actions

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany
Copy link
Copy Markdown
Author

@keshav-space, please have a look when you have a time. I've run the pipelines and generated the data,

see:
https://github.com/ziadhany/vulnerablecode-vcs-collector

Copy link
Copy Markdown

@TG1999 TG1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

Add more target repo for fix commits collection

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany ziadhany requested a review from TG1999 April 10, 2026 21:59
@ziadhany
Copy link
Copy Markdown
Author

@TG1999 I just added a test, please have a look once you have some time.

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Copy link
Copy Markdown

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on your approach and design?

Why would this code not be part of VulnerableCode? I am not sure I understand the working logic here? My understanding was that we have potential improvers:

  • input with advisories in VCIO and then we need to extract commits and patches from the reference URLs
  • input with fixed PURLs for an advisory in VCIO and then we need to determine if we can get a commit out of it
  • input with fixed PURLs for an advisory in VCIO and then we need to find in the commit logs (or PRs, or issues) if we have a a good fix commit

... and some improver/importers, that scout VCS commit logs and PRs of a specific package possibly between two versions to find if one if a fix commit or patch and either improve (where we focused on a specific advisory, or would create a new advisory from the commit data.

In that sense, the list of targets is NOT fixed, but instead something that is dynamically computed from the actual data in VCIO?

@ziadhany
Copy link
Copy Markdown
Author

@pombredanne The main idea behind this design is that cloning repositories, parsing commit messages, and querying the GitHub/GitLab APIs (which involves handling rate limits) take time. Running these tasks directly in VCIO would overwhelm the pipeline workers.

Because of this, I thought it would be better to create a mirror and import this data using a single pipeline in VulnerableCode:

There is also a problem with getting dynamic Git repos targets : some Git repos are just vulnerability data sources or exploit repo url, not actual source code. It is not easy to differentiate between a Git repo that contains source code and one that merely contains vulnerability data.

We also have other PRs addressing related parts of this workflow:

Extracting commits and patches from reference URLs:

An API to query using commit_id, purl, or vcs_url:

However, we do not currently have a way to scout VCS commit logs and PRs for a specific package (e.g., between two versions) to determine if one is a valid fix commit or patch.

I think i should think about this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants