-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement iter_gittree()
and support it in ls_file_collections
#580
Conversation
This is the first iterator that is not simply reporting on some file system that is identifiable via a single location parameter. Instead, it requires a repository location and a second "tree-ish" identifier. I decided to skip the implementation of a file-pointer inclusion into the returned items. Two reasons: (1) to make this efficient it would need a lazy-execution of `git cat-file`, and (2) I cannot think of a use case for content hashing beyond identification, and that use case is perfectly addressed by the reported `gitsha` already. For the integration with `ls-file-collection` I decided to stay close to the behavior of `git ls-tree` itself. The collection identifier parameter is used for the tree-ish, while the repository selection is done via the working directory. This includes constraining reports to subdirectories (in non-bare repos), like `git ls-tree` does too. Closes: datalad#349
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #580 +/- ##
==========================================
- Coverage 92.71% 92.48% -0.23%
==========================================
Files 147 149 +2
Lines 10922 10775 -147
Branches 1630 1618 -12
==========================================
- Hits 10126 9965 -161
- Misses 619 633 +14
Partials 177 177 ☔ View full report in Codecov by Sentry. |
Some performance figures (again a Git repo with 36k tracked files):
~6x slower than the plain Git command at this point. |
Given the underwhelming performance I did some profiling and found that half the runtime it spent on converting path strings to
turns into
with - name=PurePosixPath(path),
+ #name=PurePosixPath(path),
+ name=path, |
Handles situations like tabs in path names.
This is the first iterator that is not simply reporting on some file system that is identifiable via a single location parameter. Instead, it requires a repository location and a second "tree-ish" identifier.
I decided to skip the implementation of a file-pointer inclusion into the returned items. Two reasons: (1) to make this efficient it would need a lazy-execution of
git cat-file
, and (2) I cannot think of a use case for content hashing beyond identification, and that use case is perfectly addressed by the reportedgitsha
already.For the integration with
ls-file-collection
I decided to stay close to the behavior ofgit ls-tree
itself. The collection identifier parameter is used for the tree-ish, while the repository selection is done via the working directory. This includes constraining reports to subdirectories (in non-bare repos), likegit ls-tree
does too.Closes: #349
TODO: