Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-annotate datalad/support/gitrepo.py #7341

Merged
merged 3 commits into from
Apr 25, 2023

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Mar 20, 2023

Part of #6884.

@jwodder jwodder added semver-internal Changes only affect the internal API CHANGELOG-missing When a PR's description does not contain a changelog item, yet. labels Mar 20, 2023
@github-actions github-actions bot removed the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Mar 20, 2023
@codecov
Copy link

codecov bot commented Mar 20, 2023

Codecov Report

Patch coverage: 94.62% and project coverage change: +0.08 🎉

Comparison is base (13f5649) 88.30% compared to head (4ef4a33) 88.38%.

Additional details and impacted files
@@            Coverage Diff             @@
##            maint    #7341      +/-   ##
==========================================
+ Coverage   88.30%   88.38%   +0.08%     
==========================================
  Files         327      327              
  Lines       44629    44869     +240     
  Branches        0     5916    +5916     
==========================================
+ Hits        39409    39658     +249     
+ Misses       5220     5196      -24     
- Partials        0       15      +15     
Impacted Files Coverage Δ
datalad/support/gitrepo.py 92.03% <94.31%> (+0.29%) ⬆️
datalad/support/path.py 97.39% <100.00%> (ø)
datalad/typing.py 100.00% <100.00%> (ø)

... and 51 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few possible suggestions, otherwise looks great, thank you @jwodder !

@@ -214,8 +253,12 @@ def _normalize_path(base_dir, path):
return relpath(path, start=base_dir)


class _WithPath(Protocol):
path: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to move it to datalad/typing.py as plain WithPath and may be even WithPathT since it seems you use those whenever construct is not clearly a type, but could be some class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there functions in other files that operate on any type that has a path attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess so, in particular in annexrepo.py since that one is close companion of this one. But also lots of files listed in hit for git grep -l '\.path\>'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • There are only two top-level functions in annexrepo.py, and they both operate on IO handles rather than WithPath.
  • A large number of hits for that regex doesn't mean that WithPath will be useful; most of those can be expected to operate on specific classes.

@@ -397,21 +440,23 @@ class GitProgress(WitlessProtocol):
CHECKING_OUT: ("Check out", "Things"),
}

__slots__ = ('_seen_ops', '_pbars', '_encoding')
__slots__ = ('_unprocessed', '_seen_ops', '_pbars')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that it didn't error out before somehow since I was expecting that if __slots__ specified then assignment of unknown ones to lead to an error. oh well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitProgress inherits from WitlessProtocol, which does not use __slots__, and so GitProgress will still have a __dict__ attribute and support assignment of arbitrary attributes.

commitishes = commitishes + [self.get_active_branch()]
branch = self.get_active_branch()
if branch is None:
raise ValueError("Single commitish provided and no active branch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Note: in prior cases we would have got TypeError though but since it is indeed illegit and unlikely invocation , should be ok. Just FTR

❯ python -c 'from datalad.support.gitrepo import GitRepo; print(GitRepo(".").call_git_oneline(["merge-base", "master", None]))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yoh/proj/datalad/datalad-maint/datalad/dataset/gitrepo.py", line 550, in call_git_oneline
    lines = list(self.call_git_items_(args, files=files,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/datalad/datalad-maint/datalad/dataset/gitrepo.py", line 517, in call_git_items_
    for file_no, line in self._generator_call_git(
  File "/home/yoh/proj/datalad/datalad-maint/datalad/dataset/gitrepo.py", line 345, in _generator_call_git
    generator = self._git_runner.run(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/datalad/datalad-maint/datalad/runner/runner.py", line 206, in run
    results_or_iterator = threaded_runner.run()
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/datalad/datalad-maint/datalad/runner/nonasyncrunner.py", line 343, in run
    return self._locked_run()
           ^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/datalad/datalad-maint/datalad/runner/nonasyncrunner.py", line 403, in _locked_run
    self.process = Popen(self.cmd, **kwargs)         # nosec
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.11/subprocess.py", line 1834, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

if state in ('clean', 'modified', 'deleted', None):
# assign previous gitsha to any record
# state==None can only happen for subdatasets that
# already existed, so also assign a sha for them
assert from_sha is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure it must be not a None, but ok - let's hope it is so


class GitAddOutput(TypedDict):
file: str
success: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to myself and other @datalad/developers -- TypedDict is what would save our sanity in formalizing our return record dictionaries, which we produce using https://github.com/datalad/datalad/blob/HEAD/datalad/interface/results.py#L51 . TypedDict allows for NotRequired (not sure why not Optional) annotation for keys which do not have to be present.

if 'error-messages' in r else None,
#message='\n'.join(r['error-messages'])
#if 'error-messages' in r else None,
message=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: indeed r cannot have error-messages in its return since it is

    @staticmethod
    def _process_git_get_output(stdout, stderr=None):
        """Given both outputs (stderr is ignored atm) of git add - process it

        Primarily to centralize handling in both indirect annex and direct
        modes when ran through proxy
        """
        return [{u'file': f, u'success': True}
                for f in re.findall("'(.*)'[\n$]", ensure_unicode(stdout))]

so functionally this change would be correct and without side-effect.

@yarikoptic
Copy link
Member

ok, let's proceed! I had another look and although there are some changes to code, I think overall it just would make it more robust and prevent some loose/buggy behavior possible (but unlikely to be triggered) before.

@yarikoptic yarikoptic merged commit 5658f1c into datalad:maint Apr 25, 2023
@jwodder jwodder deleted the type-gitrepo branch May 12, 2023 19:38
@yarikoptic-gitmate
Copy link
Collaborator

PR released in 0.18.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-internal Changes only affect the internal API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants