Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apparently publish doesn't provide --jobs to annex copy #40

Open
yarikoptic opened this issue Sep 27, 2019 · 3 comments · Fixed by datalad/datalad#4206
Open

apparently publish doesn't provide --jobs to annex copy #40

yarikoptic opened this issue Sep 27, 2019 · 3 comments · Fixed by datalad/datalad#4206

Comments

@yarikoptic
Copy link
Member

For better (faster) or for worse (discovers bugs in datalad/git-annex, might trigger site connection limits) by default we parallelize get. But apparently we do not do that for publish (code: https://github.com/datalad/datalad/blob/master/datalad/support/annexrepo.py#L2929). I think we should be consistent and provide similar --jobs for annex copy as well

@mih
Copy link
Member

mih commented Oct 1, 2019

Ping datalad/datalad#3415

mih referenced this issue in mih/datalad Mar 2, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 3, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 6, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 8, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 8, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 9, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
mih referenced this issue in mih/datalad Mar 10, 2020
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
@yarikoptic
Copy link
Member Author

this issue is a twin-brother of datalad/datalad#4704 (push) which presumably (according to Closure above by @mih) was addressed in push. Since publish is still around, reopening this issue.

-J is simply seems not to be passed into git-annex copy invocation:

$ datalad publish --to=datalad-public -r -J4

results in

yoh      2174564 45.6  0.0 1074080556 51508 pts/6 Sl+ 12:29   0:05  |               \_ /usr/lib/git-annex.linux/exe/git-annex --library-path /usr/lib/git-annex.linux//usr/lib/x86_64-linux-gnu/gconv:/usr/lib/git-annex.linux//usr/lib/x86_64-linux-gnu/audit:/usr/lib/git-annex.linux//etc/ld.so.conf.d:/usr/lib/git-annex.linux//lib64:/usr/lib/git-annex.linux//usr/lib/x86_64-linux-gnu:/usr/lib/git-annex.linux//lib/x86_64-linux-gnu: /usr/lib/git-annex.linux/shimmed/git-annex/git-annex copy -c annex.dotfiles=true -c remote.datalad-public.annex-ssh-options=-o ControlMaster=auto -S /home/yoh/.cache/datalad/sockets/b0da704a -c annex.retry=3 --json --json-error-messages --json-progress --to=datalad-public --auto --fast

so there is no -J option provided. datalad 0.13.0.dev44 .

@yarikoptic yarikoptic reopened this Jul 14, 2020
@mih
Copy link
Member

mih commented Oct 18, 2021

Moving this to -deprecated

@mih mih transferred this issue from datalad/datalad Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants