Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subdataset cannot be retrieved #7274

Closed
anikfal opened this issue Jan 30, 2023 · 3 comments · Fixed by #7280
Closed

subdataset cannot be retrieved #7274

anikfal opened this issue Jan 30, 2023 · 3 comments · Fixed by #7280

Comments

@anikfal
Copy link

anikfal commented Jan 30, 2023

In server:

datalad siblings
.: here(+) [git]
.: gitlab_abc(-) [https://gitlab.jsc.fz-juelich.de/detect/detect_z03_z04/abc.git (git)]

In my local, after cloning from Gitlab:

git remote add abc_jureca ssh://nikfal1@jureca.fz-juelich.de:/path/to/ABC

datalad siblings
.: here(+) [git]
.: origin(-) [git@gitlab.jsc.fz-juelich.de:detect/detect_z03_z04/abc.git (git)]
.: abc_jureca(+) [ssh://nikfal1@jureca.fz-juelich.de:/path/to/ABC (git)]

In the superdataset datalad get ... can retrieve the data. However, in subdataset (mysub), the error below occurs:

cd subdataset
datalad get -n .

install(error): /home/anikfal/test_detect/abc/mysub (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:

  • git@gitlab.jsc.fz-juelich.de:detect/detect_z03_z04/abc.git/mysub
    CommandError: 'git -c diff.ignoreSubmodules=none clone --progress git@gitlab.jsc.fz-juelich.de:detect/detect_z03_z04/abc.git/mysub /home/anikfal/test_detect/abc/mysub' failed with exitcode 128 [err: 'Cloning into '/home/anikfal/test_detect/abc/mysub'...
    remote:
    remote: ========================================================================
    remote:
    remote: The namespace you were looking for could not be found.
    remote:
    remote: ========================================================================
    remote:
    CommandError: 'ssh -o ControlPath=/home/anikfal/.cache/datalad/sockets/4f241b93 -o SendEnv=GIT_PROTOCOL git@gitlab.jsc.fz-juelich.de 'git-upload-pack '"'"'detect/detect_z03_z04/abc.git/mysub'"'"''' failed with exitcode 1
    fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.']]

@adswa
Copy link
Member

adswa commented Jan 31, 2023

Thanks for the issue, and presenting it in the office hour!
We distilled it down to at least a documentation failure. While get's docstring says

A template string assigned to such a variable can utilize the Python format
mini language and may reference a number of properties that are inferred
from the parent dataset's knowledge about the target subdataset. Properties
include any submodule property specified in the respective `.gitmodules`
record. For convenience, an existing `datalad-id` record is made available
under the shortened name ID.

suggesting that path should be a valid template, adding a {path} template causes a KeyError.
The relevant piece of code is

    sm_candidate_props = {
        k[10:].replace('datalad-id', 'id'): v
        for k, v in sm.items()
        if k.startswith('gitmodule_')
    }

With

(Pdb) p sm_candidate_props
{'url': './HCP1200/180230', 'branch': 'master', 'id': '1ec203bc-27c9-11ea-8fcd-002590496000', 'name': '180230', 'remoteurl-origin': 'git@github.com:datalad-datasets/human-connectome-project-openaccess.git'}
(Pdb) sm.items()
dict_items([('action', 'subdataset'), ('type', 'dataset'), ('status', 'ok'), ('path', '/home/adina/repos/data/human-connectome-project-openaccess/HCP1200/180230'), ('gitshasum', '76fc97851ed6d61f7b149f64a415a4dd6a725270'), ('gitmodule_url', './HCP1200/180230'), ('gitmodule_branch', 'master'), ('gitmodule_datalad-id', '1ec203bc-27c9-11ea-8fcd-002590496000'), ('gitmodule_name', '180230'), ('state', 'absent'), ('parentds', '/home/adina/repos/data/human-connectome-project-openaccess'), ('contains', ['/home/adina/repos/data/human-connectome-project-openaccess/HCP1200/180230']), ('refds', '/home/adina/repos/data/human-connectome-project-openaccess')])

Instead of using the path key from .gitmodules the template thus would be name. This is at least a documentation issue that should be fixed in get's docstring and the handbook.

@bpoldrack
Copy link
Member

To add to this: It's not just a documentation issue. path is supposed and expected to be in that record. I'm looking into why it's missing. It should be coming from subdatasets.

bpoldrack added a commit to bpoldrack/datalad that referenced this issue Feb 7, 2023
template

The `path` property was treated as a special case in
`GitRepo.get_submodules_` presumably to not yield redundant information,
because what is returned is a dict where the path is the key and the
other properties from `.gitmodules` make up the value (another dict),
prefixed with `gitmodule_`.

However, actual usage from `get` via `subdatasets` suggests, that the
easiest is to just yield this "redundant" record. Once as the reported
property exactly like `url`, `datalad-id` and whatever else, and in
addition let the path remain the key. Throughout that chain the latter
gets possibly turned into an absolute path anyway (which is not what we
want to report here - it's just for internal use).

Closes datalad#7274
bpoldrack added a commit to bpoldrack/datalad that referenced this issue Feb 7, 2023
template

The `path` property was treated as a special case in
`GitRepo.get_submodules_` presumably to not yield redundant information,
because what is returned is a dict where the path is the key and the
other properties from `.gitmodules` make up the value (another dict),
prefixed with `gitmodule_`.

However, actual usage from `get` via `subdatasets` suggests, that the
easiest is to just yield this "redundant" record. Once as the reported
property exactly like `url`, `datalad-id` and whatever else, and in
addition let the path remain the key. Throughout that chain the latter
gets possibly turned into an absolute path anyway (which is not what we
want to report here - it's just for internal use).

Closes datalad#7274
asmacdo pushed a commit to asmacdo/datalad that referenced this issue Mar 16, 2023
template

The `path` property was treated as a special case in
`GitRepo.get_submodules_` presumably to not yield redundant information,
because what is returned is a dict where the path is the key and the
other properties from `.gitmodules` make up the value (another dict),
prefixed with `gitmodule_`.

However, actual usage from `get` via `subdatasets` suggests, that the
easiest is to just yield this "redundant" record. Once as the reported
property exactly like `url`, `datalad-id` and whatever else, and in
addition let the path remain the key. Throughout that chain the latter
gets possibly turned into an absolute path anyway (which is not what we
want to report here - it's just for internal use).

Closes datalad#7274
@yarikoptic-gitmate
Copy link
Collaborator

Issue fixed in 0.18.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants