gitrepo: Expose methods for direct git calls #3791

Our approach of providing lots of methods that wrap different Git subcommands leads to us adding one-off methods, many of which do the bare minimum for the use case we have at that time. The minimal implementation is a good thing if the code is meant to be used internally because there's no point in trying to support and test knobs we don't need, but it doesn't make for a nice external interface, and it leads to churn and incompatibilities when we later need to extend the method for another use case. Let's instead expose a set of methods wrap simple calls based on the return value. Introducing these will hopefully have the following benefits: * It is a set of commands that we can offer to third-party callers if they need a command that we don't have but still want to call git through us. * Likewise, it is a set of commands we can use to avoid adding one-off methods. * We already use _git_custom_command() in many spots outside of gitrepo.py to avoid adding more methods to GitRepo. These new methods would simplify these calls. * If we decide to deprecate a method or change it in an incompatible way, we can point to these methods as an alternative. The arguments of these new methods are pretty minimal compared to what _git_custom_command() accepts. The idea is to keep them simple until we need to expose more options, given we can extend the keyword arguments in a backward compatible way. Between the new methods, the arguments are largely consistent, but call_git() has `expect_fail` while call_git_{oneline,items_} do not because we don't have a spot where that'd be used yet in our code base. Note that call_git_items_() is a generator. While this isn't particularly useful at the moment because str.split{,lines} already put the entire list into memory, it keeps open the possibility of changing the internal implementation to an approach the doesn't load the entire output in memory. All the methods call _git_custom_command() with check_fake_dates=True. This will be unnecessary for nearly all repositories, but not doing so risks leaking dates in repositories configured to use fake dates. The other option would be to add a check_fake_dates parameter to these methods, which is ugly because this is an obscure parameter that most callers should not have to worry about. Unconditionally using check_fake_dates=True costs an attribute lookup and a .config.getbool() call on the first use and then a _fake_dates_enabled attribute lookup on all remaining uses for that instance. Here are times of an example command on the datalad repo: % python -m timeit -n100 \ -s "from datalad.support.gitrepo import GitRepo" \ -s "repo = GitRepo('.')" -- "repo.call_git(['ls-files'])" Results of two runs with `check_fake_dates=False`: 1) 100 loops, best of 5: 3.87 msec per loop 2) 100 loops, best of 5: 3.87 msec per loop Results of two runs with `check_fake_dates=True`: 1) 100 loops, best of 5: 3.97 msec per loop 2) 100 loops, best of 5: 3.96 msec per loop Closes datalad#3789.

This catches most _git_custom_command() usage outside of {git,annex}repo.py. utils_testrepos.py still has some.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gitrepo: Expose methods for direct git calls #3791

gitrepo: Expose methods for direct git calls #3791

Commits on Oct 19, 2019