Search before asking
What happened
Hello,
We have noticed an issue with the gitextractor plugin where some commits are being skipped, supposedly because there is no parent commit. However, we can see on GitHub that these commits do have parent commits.
This results in missing commits in the repo_commits, commits, and commit_parents tables, which are essential for refdiff to associate all PRs with our deployment events (we're using the webhook method), especially if the missing commit is the reference of our deployment event.
The behavior we're observing is causing mislinked commits and ultimately messing up our LTC metrics.
This issue was reproducible in multiple DevLake instances.
Logs:
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit fc5331abf6a85be3812a17843a6a5d95330ca7dc because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit a9c9ad96860358ef6a1f32798d2a8456cbfc854a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 1415b0bfa73946aac039282040dfb7c2100d9a8a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 961c42266124b29836ebb085b20af1ce2b61f6d3 because it has no parent commit"
A similar issue has been reported here.
What do you expect to happen
The gitextractor should extract all commits and ingest them into the necessary tables.
How to reproduce
- Using v1.0.1@e061ef2
- Create a Project
- Add a data source
- Create a webhook
- Collect data from last 6 months
- Check the logs from the gitextractor plugin
Anything else
No response
Version
v1.0.1@e061ef2
Are you willing to submit PR?
Code of Conduct
Search before asking
What happened
Hello,
We have noticed an issue with the gitextractor plugin where some commits are being skipped, supposedly because there is no parent commit. However, we can see on GitHub that these commits do have parent commits.
This results in missing commits in the repo_commits, commits, and commit_parents tables, which are essential for refdiff to associate all PRs with our deployment events (we're using the webhook method), especially if the missing commit is the reference of our deployment event.
The behavior we're observing is causing mislinked commits and ultimately messing up our LTC metrics.
This issue was reproducible in multiple DevLake instances.
Logs:
A similar issue has been reported here.
What do you expect to happen
The gitextractor should extract all commits and ingest them into the necessary tables.
How to reproduce
Anything else
No response
Version
v1.0.1@e061ef2
Are you willing to submit PR?
Code of Conduct