-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Which interface did you use?
Web UI
Repository URL (if public)
https://github.com/coderamp-labs/gitingest
Git host
GitHub (github.com)
Other Git host
No response
Repository visibility
public
Commit, branch, or tag
branch
Did you ingest the full repository or a subdirectory?
full repository
Operating system
Not relevant (Web UI)
Browser (Web UI only)
Not relevant (CLI / PyPI)
Other browser
No response
Gitingest version
No response
Python version
No response
Bug description
Bug Description
The current implementation for processing single files (blobs) in src/gitingest/ingestion.py does not validate if the requested file actually exists in the specific Git reference (branch, tag, or commit) provided in the query.
As noted in the codebase TODO: # TODO: We do this wrong! We should still check the branch and commit!, the system currently reads whatever is present in the local filesystem after a shallow clone, which could lead to serving incorrect file versions if a specific commit hash is requested but not strictly verified.
Steps to reproduce
- Provide a URL for a specific file blob pointing to a branch or commit where that file does not yet exist.
- Observe that if the file exists on the default branch that was shallow-cloned, the system may ingest it without verifying it belongs to the requested reference.
Expected behavior
The system should use Git to verify that the specific subpath exists within the requested commit or branch before attempting to read it from the disk.
Actual behavior
The code only checks if not path.is_file(): on the local filesystem, ignoring the query.commit or query.branch values.
Steps to reproduce
1. Open Gitingest Web UI.
2. Input a URL pointing to a single file blob on a specific branch or commit where that file does not exist (e.g., https://github.com/user/repo/blob/feature-branch/new-file.py).
3. If the file exists on the default branch, the system may ingest it without verifying it strictly belongs to the requested 'feature-branch' reference.Expected behavior
The system should strictly validate that the requested file path exists within the specific Git reference (branch, tag, or commit) provided in the query using Git verification (like git rev-parse) before processing.
Actual behavior
The system currently checks for file existence on the local filesystem of the shallow clone. If the file exists locally (perhaps from a different branch), it is processed without confirming it belongs to the requested Git state.
Additional context, logs, or screenshots
This issue addresses the existing TODO in src/gitingest/ingestion.py: "# TODO: We do this wrong! We should still check the branch and commit!"