Skip to content

push: Assume 0 bytes pushed if git-annex does not provide bytesize#7049

Merged
yarikoptic merged 2 commits intodatalad:maintfrom
yarikoptic:bf-push-bytesize
Sep 21, 2022
Merged

push: Assume 0 bytes pushed if git-annex does not provide bytesize#7049
yarikoptic merged 2 commits intodatalad:maintfrom
yarikoptic:bf-push-bytesize

Conversation

@yarikoptic
Copy link
Copy Markdown
Member

Otherwise we can get KeyError and underlying code can provide records without bytesize. Reported in https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570 with

datalad==0.15.4
git==2.35.0
git-annex==8.20210903

first I failed to reproduce with the following script following user report with --fast
#!/bin/bash
# https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570
set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eu

datalad create -f .

git annex addurl --fast --file sample3.csv https://filesamples.com/samples/document/csv/sample3.csv
datalad save sample3.csv

git remote add --fetch gin git@gin.g-node.org:/yarikoptic/test-push.git
datalad -l debug push --to gin
but then reproduced nicely with --relaxed
#!/bin/bash
# https://neurostars.org/t/datalad-push-causes-keyerror-bytesize-keyerror/23570
set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eu

datalad create -f .

git annex addurl --relaxed --file sample3.csv https://filesamples.com/samples/document/csv/sample3.csv
datalad save sample3.csv

ls -l 

git remote add --fetch gin git@gin.g-node.org:/yarikoptic/test-push.git
datalad push --to gin --force=all

and verified that with this PR we do not crash. Given that git-annex unlikely to even provide progress report if not enough time passed, I think I would not bother crafting a unittest which would trigger this condition atm.

Otherwise we can get KeyError and underlying code can
provide records without bytesize
@yarikoptic yarikoptic added cmd-push semver-patch Increment the patch version when merged labels Sep 20, 2022
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Sep 20, 2022
Inspired by a fix we needed for push:
datalad#7049

I looked into status to realize that we do not even consider those files
for which git-annex would not report bytesize.  This commit makes them
considered in count but would assume 0 bytesize for "recorded" etc.

It would not address the issue that we would still not
consider/interrogate sizes for those files even if their content
available:

	$> datalad status --annex=all
	2 annex'd files (0.0 B/0.0 B present/total size)

	$> ls -lL *
	-r-------- 1 yoh yoh   0 Sep 20 15:52 123.dat
	-r-------- 1 yoh yoh 723 Sep 20 15:57 sample3.csv

where sample3.csv is the relaxed one.
@yarikoptic yarikoptic added the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 20, 2022
@github-actions github-actions Bot removed the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 20, 2022
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 20, 2022

Codecov Report

Base: 45.45% // Head: 88.36% // Increases project coverage by +42.90% 🎉

Coverage data is based on head (69786a6) compared to base (3c220d6).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##            maint    #7049       +/-   ##
===========================================
+ Coverage   45.45%   88.36%   +42.90%     
===========================================
  Files         354      354               
  Lines       58559    46336    -12223     
  Branches     6615        0     -6615     
===========================================
+ Hits        26618    40943    +14325     
+ Misses      31779     5393    -26386     
+ Partials      162        0      -162     
Impacted Files Coverage Δ
datalad/core/distributed/push.py 91.22% <100.00%> (+30.16%) ⬆️
datalad/cmdline/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/interface/run_procedure.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/cmdline/helpers.py 0.00% <0.00%> (-78.27%) ⬇️
datalad/tests/test_tests_utils.py 0.00% <0.00%> (-69.13%) ⬇️
datalad/support/nda_.py 0.00% <0.00%> (-54.55%) ⬇️
datalad/tests/utils.py 20.50% <0.00%> (-36.46%) ⬇️
datalad/distribution/uninstall.py 63.26% <0.00%> (-28.58%) ⬇️
datalad/api.py 75.60% <0.00%> (-17.73%) ⬇️
datalad/__main__.py 39.65% <0.00%> (-13.80%) ⬇️
... and 277 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@yarikoptic
Copy link
Copy Markdown
Member Author

uff all travis fails seems to be due to unrelated

____________________________ test_run_datalad_help _____________________________
[gw0] linux -- Python 3.8.6 /tmp/dl-miniconda-q9fqns6s/bin/python
    @assert_cwd_unchanged
    def test_run_datalad_help():
        out, err = check_run_and_get_output("datalad --help")
        ok_startswith(out, "Usage: ")
        # There could be a warning from coverage that no data was collected, should be benign
        lines = [l for l in err.split(os.linesep) if ('no-data-collected' not in l) and l]
>       eq_(lines, [])
../datalad/tests/test_installed.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
first = ["/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 ...ported version!", '  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn\'t match a supported "']
second = [], msg = None
    def assert_equal(first, second, msg=None):
        if msg is None:
>           assert first == second
E           assert ['/tmp/dl-min... supported "'] == []
E             Left contains 2 more items, first extra item: "/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (5.0.0)/charset_normalizer (2.0.4) doesn't match a supported version!"
E             Full diff:
E               [
E             -  ,
E             +  '/tmp/dl-miniconda-q9fqns6s/lib/python3.8/site-packages/requests/__init__.py:102: '
E             +  'RequestsDependencyWarning: urllib3 (1.26.8) or chardet '
E             +  "(5.0.0)/charset_normalizer (2.0.4) doesn't match a supported version!",
E             +  '  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) '
E             +  'doesn\'t match a supported "',
E               ]
../datalad/tests/utils_pytest.py:107: AssertionError

so may be it was too early for 0f27cd5 in #7029 ?

@yarikoptic
Copy link
Copy Markdown
Member Author

no objections, the fix is trivial, let's proceed and retry releasing again!

@yarikoptic yarikoptic added the release Create a release when this pr is merged label Sep 21, 2022
@yarikoptic
Copy link
Copy Markdown
Member Author

@jwodder -- note that here as well we didn't have github actions ran and changelog entry was pushed by github-actions bot although I thought we should have switched to that yarikoptic-gitmate token/account... will need to check/keep an eye...

@yarikoptic yarikoptic merged commit 5128733 into datalad:maint Sep 21, 2022
@yarikoptic-gitmate
Copy link
Copy Markdown
Collaborator

PR released in 0.17.6

@yarikoptic yarikoptic deleted the bf-push-bytesize branch October 14, 2022 12:59
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Jun 9, 2023
Inspired by a fix we needed for push:
datalad#7049

I looked into status to realize that we do not even consider those files
for which git-annex would not report bytesize.  This commit makes them
considered in count but would assume 0 bytesize for "recorded" etc.

It would not address the issue that we would still not
consider/interrogate sizes for those files even if their content
available:

	$> datalad status --annex=all
	2 annex'd files (0.0 B/0.0 B present/total size)

	$> ls -lL *
	-r-------- 1 yoh yoh   0 Sep 20 15:52 123.dat
	-r-------- 1 yoh yoh 723 Sep 20 15:57 sample3.csv

where sample3.csv is the relaxed one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmd-push release Create a release when this pr is merged semver-patch Increment the patch version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants