Introduce AnnexRepo.get_file_annexinfo(), deprecate AnnexRepo.get_file_key()#6104
Introduce AnnexRepo.get_file_annexinfo(), deprecate AnnexRepo.get_file_key()#6104bpoldrack merged 3 commits intodatalad:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6104 +/- ##
==========================================
+ Coverage 89.80% 89.83% +0.02%
==========================================
Files 319 319
Lines 42451 42477 +26
==========================================
+ Hits 38124 38159 +35
+ Misses 4327 4318 -9
Continue to review full report at Codecov.
|
|
So this works! Will now proceed with removing remaining usage of |
|
note: well, we just had discussion on how deprecations/removal could be affecting the downstream code etc - there seems to be a good number (didn't group to see which repos) of uses of this method, e.g. openneuro https://github.com/OpenNeuroOrg/datalad-service/blob/4f501d3a808cca1ad04fc5d808f49bc0c68614f1/datalad_service/migrate/versions.py . I am not saying that it should not be deprecated if so needed, but also do not see an immediate need. |
|
The need comes from the need to get rid of Right now, I am exploring what migration path makes sense. The usage in OpenNeuro code is a common one (ie., operation on a single file). Once I cataloged all our usage, I will know whether my current plan hold. This plan is to implement something like The OpenNeuro usage transition would then become:
Instead of handling three types of exceptions (not in annex, not in git, not there), the full set up file properties can always be obtained for any known file and acted on, including the key. All of what the openneuro loop body is doing, is to implement Alternatively, we could consider just removing the normalize path decorator. But this would either require to deprecate all its usage at once (this is already taking years, and will still take more), or to silently break the behavior of the method. I consider my proposal less problematic, right now. |
This is the companion of get_content_annexinfo() for single file queries. This is a common use case, not only in the tests. This is (for now), not channeling all functionality that get_content_annexinfo() provides (amending Git properties, etc), but the API could be extended, if this turns out to be useful in the future. Importantly, with this method in use, in many places dedicated additional calls to file_has_content() can now be reduced to a single call to this new method.
This is furthering the goal of finally getting rid of `normalize_paths()` datalad#4595 This is a second and different attempt to this. The last one was datalad#5069
Except for its own tests.
|
FTR: This would also simplify some code on #6105 |

This is another step in a slow paradigm change to assemble more information with fewer calls to Git. Here it is
get_file_key()which is often used in conjunction withfile_has_content()(or a localis_available()which does the same thing with a different implementation), orget_contentlocation(), orget_key|file_backend().All this information is readily provided via
get_content_annexinfo(). However, in most places, a query for a single files is made (that this can be wasteful is a problem, but also a reality right now). The newget_file_annexinfo()makes the single file query use case more convenient, but otherwise relies onget_content_annexinfo()as the work horse.Beyond this addition the previous
get_file_key()was reimplemented usingget_content_annexinfo()alone. Simultaneously it is being deprecated, because better means exist now, to perform its tasks. For example the OpenNeuro usage referred to in #6104 (comment) is fully replaced by a singleget_file_annexinfo()call now (not just theget_file_key()method call, but the entire loop body). Moreover, the behavior of that function is extremely complex (different error behavior, dependent on parameter types). It's exception behavior (3 different exceptions for why a key could not be obtained) also led to abuse of this method for type-checking of repository file types (call that didn't even bother checking the return value of aget_*method). And lastly, it usesnormalize_paths.TODO:
get_file_annexinfo()pulling together test code for all the related methods listed above.Developer notes:
This is furthering the goal of finally getting rid of
normalize_paths()#4595This is a second and different attempt to this. The last one was #5069
Final progress ping to #3333