New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF/ENH: get_size_from_key to match specification #4081
Conversation
Suck in implementation from git-annex-ria-remote. Former implementation couldn't correctly deal with chunks for example. See https://git-annex.branchable.com/internals/key_format/ for specification.
Codecov Report
@@ Coverage Diff @@
## master #4081 +/- ##
==========================================
+ Coverage 89.71% 89.77% +0.05%
==========================================
Files 274 275 +1
Lines 36711 37014 +303
==========================================
+ Hits 32934 33228 +294
- Misses 3777 3786 +9
Continue to review full report at Codecov.
|
When looking at datalad/git-annex-ria-remote#19, I recall working through this pretty closely. I'll lazily transfer my approval rather than closely looking at it again :]
My main comment is that this sort of method is very amenable to comprehensive unit tests. It'd be great to include some.
datalad/support/annexrepo.py
Outdated
# has no 2nd field in the key | ||
|
||
# TODO: this method can be more compact. we don't need particularly | ||
# elaborated error distinction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this is referring to. I spot only one exception that's raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't remember. Might be a leftover from an earlier version. Will have a closer look when I write the tests for it and delete, if I can't decode that comment again (otherwise clarify or resolve it, of course) ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below. @yarikoptic kind of found what this was about. ;-)
datalad/support/annexrepo.py
Outdated
S = int(field[1:]) if field[1:].isdigit() else None | ||
elif field.startswith('C'): | ||
# we have a chunk, this is it's number: | ||
C = int(field[1:]) if field[1:].isdigit() else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I missing some difference or all 3 lines are identical and just assign to different variables? if there is no difference, then I would have used something like
if field and field[0] in 'sSC':
value = int(field[1:]) if field[1:].isdigit() else None
if field[0] == 's':
s = value
elif ...
to address code duplication. or might have even got brave for something like
if field and field[0] in 'sSC':
locals().update({field[0]: int(field[1:]) if field[1:].isdigit() else None})
;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! And that's exactly what the mysterious comment was about. The case distinction is needed only if we want to be more specific in "parsing" errors. Since we don't (I think) in can be "compacted" this way. Doing it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: locals()
can't be updated. So, changed to a dict comprehension but updating local variables "manually".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR:
locals()
can't be updated.
d'oh. indeed. why did I think they could?
demo
$> cat /tmp/11.py
def func():
c = 1
locals().update({'c': 2})
print("c=%s" % c)
func()
$> python /tmp/11.py
c=1
if C <= int(s / S): | ||
return S | ||
else: | ||
return s % S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a nice little function to unittest ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Plus: Fix a minor bug, where an invalid key would result in None instead of ValueError
datalad/support/annexrepo.py
Outdated
# don't lookup the dict for the same things several times; | ||
# Is there a faster (and more compact) way of doing this? Note, that | ||
# locals() can't be updated. | ||
s = parsed['s'] if 's' in parsed.keys() else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsed.get('s')
?
I know this was in response to a review comment from Yarik, but I must admit that I don't find the newer version to be an improvement in terms of readability. Perhaps a compromise between this and the previous form would be an internal function that encapsulates int(field[1:]) if field[1:].isdigit() else None
while keeping the expanded ifelif
branches.
Anyway, it's a minor thing either way, and I'm fine with how it is. (I'm just glad you didn't use locals()
:] )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsed.get('s')
WTH? Why didn't I think of it? :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to get
, but otherwise will leave as is. Kind of agree with the readability aspect, but it's not too bad and TBH I don't really like additional function calls for that.
Suck in implementation from git-annex-ria-remote.
Former implementation couldn't correctly deal with chunks for example.
See https://git-annex.branchable.com/internals/key_format/ for specification.