Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate get error result (reporting) #5537

Closed
mih opened this issue Mar 30, 2021 · 4 comments
Closed

Duplicate get error result (reporting) #5537

mih opened this issue Mar 30, 2021 · 4 comments
Labels

Comments

@mih
Copy link
Member

mih commented Mar 30, 2021

Get a file with invalid filename (extension missing):

$ datalad clone git@github.com:datalad-datasets/human-connectome-project-openaccess.git hcp
$ cd hcp
$ datalad -f json_pp get HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii
# and again when the subdatasets are present
$ datalad -f json_pp get HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii
{
  "action": "get",
  "message": "path does not exist",
  "path": "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii",
  "refds": null,
  "status": "impossible"
}
{
  "action": "get",
  "message": "path does not exist",
  "path": "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii",
  "refds": null,
  "status": "impossible"
}

Unclear why a second attempt is made, when first attempt finds "path does not exist".

@mih mih added severity-normal standard severity cmd-get labels Mar 30, 2021
@mih
Copy link
Member Author

mih commented Mar 30, 2021

The reason is that the main subdatasets call in get https://github.com/datalad/datalad/blob/maint/datalad/distribution/get.py#L847-L860 yields two results with candidate subdatasets, in both of which an annex-get is attempted.

iLOOPDS {'action': 'subdataset', 'type': 'dataset', 'status': 'ok', 'gitshasum': 'f3ce9b6abdec317b03e13ef546a92b60ddbe9a1c', 'path': '/tmp/hcp/HCP1200/705341', 'gitmodule_url': './HCP1200/705341', 'gitmodule_branch': 'master', 'gitmodule_datalad-id': '5c5cf266-21d1-11ea-a4b1-002590496000', 'gitmodule_name': '705341', 'parentds': '/tmp/hcp', 'contains': ['/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii'], 'refds': '/tmp/hcp'}
get(impossible): /tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii [path does not exist]
iLOOPDS {'action': 'subdataset', 'type': 'dataset', 'status': 'ok', 'gitshasum': '7da55b1c3eabe7d9ef8ec211629faa0dc0e9712c', 'path': '/tmp/hcp/HCP1200/705341/MNINonLinear', 'gitmodule_url': './MNINonLinear', 'gitmodule_datalad-id': '693838c4-21d1-11ea-871b-002590496000', 'gitmodule_name': 'MNINonLinear', 'parentds': '/tmp/hcp/HCP1200/705341', 'contains': ['/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii'], 'refds': '/tmp/hcp'}
get(impossible): /tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii [path does not exist]

As seen above, the subdatasets are tested in the "wrong" order, seemingly.

The reason given in the code:

                # always come from the top to get sensible generator behavior
                bottomup=False,

does not really explain anything. However, the thing makes more sense when coming from a situation where no subdatasets are installed. Although still not completely.

I guess a unittest for this behavior is in order.

@mih
Copy link
Member Author

mih commented Mar 30, 2021

More detail: The error is coming out of _install_targetpath() at https://github.com/datalad/datalad/blob/master/datalad/distribution/get.py#L929 where it is still about obtaining the necessary subdatasets, not about getting file content. So one approach could be to add a check at https://github.com/datalad/datalad/blob/master/datalad/distribution/get.py#L939 whether we have already reported on a particular path not being present. Or maybe even better to prevent calling _install_targetpath() a second time, when there is not hope for improvement. However, that might be tricky to find out, we might need to track installation of relevant intermediate subdatasets.

@mih
Copy link
Member Author

mih commented Nov 1, 2021

Still happening.

bpoldrack added a commit to jsheunis/datalad that referenced this issue Oct 21, 2022
to make sure `get` does not yield redundant `impossible` results if a
not existing path is given.
adswa pushed a commit to adswa/datalad that referenced this issue Oct 26, 2022
to make sure `get` does not yield redundant `impossible` results if a
not existing path is given.
@yarikoptic
Copy link
Member

seems to be fixed and just not closed by #7093 because against maint, uff

❯ datalad clone git@github.com:datalad-datasets/human-connectome-project-openaccess.git hcp
install(ok): /tmp/hcp (dataset)                                                                                                  
❯ cd hcp
DATA_USE_AGREEMENT.md  HCP1200/  README.md
❯ datalad -f json_pp get HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii
{                                                                                                                                
  "action": "install",
  "message": [
    "Installed subdataset in order to get %s",
    "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii"
  ],
  "path": "/tmp/hcp/HCP1200/705341",
  "source": {
    "default_destpath": "5c5cf266-21d1-11ea-a4b1-002590496000",
    "giturl": "http://store.datalad.org/5c5/cf266-21d1-11ea-a4b1-002590496000",
    "source": "ria+http://store.datalad.org#5c5cf266-21d1-11ea-a4b1-002590496000",
    "type": "ria",
    "version": null
  },
  "status": "ok",
  "type": "dataset"
}
[INFO   ] scanning for annexed files (this may take some time)                                                                   
{
  "action": "install",
  "message": [
    "Installed subdataset in order to get %s",
    "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii"
  ],
  "path": "/tmp/hcp/HCP1200/705341/MNINonLinear",
  "source": {
    "default_destpath": "693838c4-21d1-11ea-871b-002590496000",
    "giturl": "http://store.datalad.org/693/838c4-21d1-11ea-871b-002590496000",
    "source": "ria+http://store.datalad.org#693838c4-21d1-11ea-871b-002590496000",
    "type": "ria",
    "version": null
  },
  "status": "ok",
  "type": "dataset"
}
{
  "action": "get",
  "message": "path does not exist",
  "path": "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii",
  "refds": "/tmp/hcp",
  "status": "impossible"
}
datalad -f json_pp get   5.14s user 1.39s system 55% cpu 11.692 total
❯ datalad -f json_pp get HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii
{
  "action": "get",
  "message": "path does not exist",
  "path": "/tmp/hcp/HCP1200/705341/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii",
  "refds": "/tmp/hcp",
  "status": "impossible"
}
❯ datalad --version
datalad 0.17.8+39.ga8b4ae996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants