Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when kachery sharing old analysis nwb files #914

Closed
samuelbray32 opened this issue Apr 3, 2024 · 3 comments · Fixed by #918
Closed

Error when kachery sharing old analysis nwb files #914

samuelbray32 opened this issue Apr 3, 2024 · 3 comments · Fixed by #918
Labels
bug Something isn't working infrastructure Unix, MySQL, etc. settings/issues impacting users

Comments

@samuelbray32
Copy link
Collaborator

samuelbray32 commented Apr 3, 2024

Describe the bug

  • When sharing with kachery in spyglass the receiving client downloads analysis nwb files in the location returned by AnalysisNwbfile.get_abs_path().

  • Problem: Older analysis files not stored in a subdirectory have a datajoint filepath in the database without the subdirectory as well

    • This means that in the remote client, the datajoint filepath used to find the file during fetch_nwb does not have the subdirectory in it
    • This is a different place than where their file was downloaded by kachery, raising a file not found error

Solution?
@CBroz1, do you know if the datajoint filepath for an entry can be determined without raising a file not found error? If so we can use this to define where the files should be saved to when downloading from kachery to ensure consistency with the source database.

To Reproduce
On a remote clien connected to the franklab databaset:

from spyglass.linearization.v0 import IntervalLinearizedPosition
from spyglass.common import PositionIntervalMap
key = {"nwb_file_name": nwb_file_name, "interval_list_name": interval_list_name}
pos_interval = (PositionIntervalMap & key).fetch1("position_interval_name")
lin_pos_key = {"nwb_file_name": nwb_file_name,
           "interval_list_name": pos_interval,
           "position_info_param_name":"default_decoding"
           }
(IntervalLinearizedPosition & lin_pos_key).fetch_nwb()
Error Stack
{
	"name": "FileNotFoundError",
	"message": "[Errno 2] No such file or directory: '/Users/samuelbray/Documents/analysis/j1620210710_FRL083NP3E.nwb'",
	"stack": "---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb Cell 15 line 1
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=9'>10</a> pos_interval = (PositionIntervalMap & key).fetch1(\"position_interval_name\")
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=10'>11</a> lin_pos_key = {\"nwb_file_name\": nwb_file_name,
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=11'>12</a>            \"interval_list_name\": pos_interval,
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=12'>13</a>            \"position_info_param_name\":\"default_decoding\"
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=13'>14</a>            }
---> <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=15'>16</a> (IntervalLinearizedPosition & lin_pos_key).fetch_nwb()

File ~/Documents/frank_lab/code/spyglass/src/spyglass/utils/dj_mixin.py:128, in SpyglassMixin.fetch_nwb(self, *attrs, **kwargs)
    120 def fetch_nwb(self, *attrs, **kwargs):
    121     \"\"\"Fetch NWBFile object from relevant table.
    122 
    123     Implementing class must have a foreign key reference to Nwbfile or
   (...)
    126     precedence.
    127     \"\"\"
--> 128     return fetch_nwb(self, self._nwb_table_tuple, *attrs, **kwargs)

File ~/Documents/frank_lab/code/spyglass/src/spyglass/utils/dj_helper_fn.py:189, in fetch_nwb(query_expression, nwb_master, *attrs, **kwargs)
    185     if not os.path.exists(file_path):
    186         # retrieve the file from kachery. This also opens the file and stores the file object
    187         get_nwb_file(file_path)
--> 189 rec_dicts = (
    190     query_expression * tbl.proj(nwb2load_filepath=attr_name)
    191 ).fetch(*attrs, \"nwb2load_filepath\", **kwargs)
    193 if not rec_dicts or not np.any(
    194     [\"object_id\" in key for key in rec_dicts[0]]
    195 ):
    196     return rec_dicts

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:231, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    229 attributes = [a for a in attrs if not is_key(a)]
    230 ret = self._expression.proj(*attributes)
--> 231 ret = ret.fetch(
    232     offset=offset,
    233     limit=limit,
    234     order_by=order_by,
    235     as_dict=False,
    236     squeeze=squeeze,
    237     download_path=download_path,
    238     format=\"array\",
    239 )
    240 if attrs_as_dict:
    241     ret = [
    242         {k: v for k, v in zip(ret.dtype.names, x) if k in attrs}
    243         for x in ret
    244     ]

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:291, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    288     raise e
    289 for name in heading:
    290     # unpack blobs and externals
--> 291     ret[name] = list(map(partial(get, heading[name]), ret[name]))
    292 if format == \"frame\":
    293     ret = pandas.DataFrame(ret).set_index(heading.primary_key)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:64, in _get(connection, attr, data, squeeze, download_path)
     61 adapt = attr.adapter.get if attr.adapter else lambda x: x
     63 if attr.is_filepath:
---> 64     return adapt(extern.download_filepath(uuid.UUID(bytes=data))[0])
     65 if attr.is_attachment:
     66     # Steps:
     67     # 1. get the attachment filename
     68     # 2. check if the file already exists at download_path, verify checksum
     69     # 3. if exists and checksum passes then return the local filepath
     70     # 4. Otherwise, download the remote file and return the new filepath
     71     _uuid = uuid.UUID(bytes=data) if attr.is_external else None

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/external.py:330, in ExternalTable.download_filepath(self, filepath_hash)
    324 file_exists = Path(local_filepath).is_file() and (
    325     not _need_checksum(local_filepath, size)
    326     or uuid_from_file(local_filepath) == contents_hash
    327 )
    329 if not file_exists:
--> 330     self._download_file(external_path, local_filepath)
    331     if (
    332         _need_checksum(local_filepath, size)
    333         and uuid_from_file(local_filepath) != contents_hash
    334     ):
    335         # this should never happen without outside interference
    336         raise DataJointError(
    337             f\"'{local_filepath}' downloaded but did not pass checksum.\"
    338         )

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/external.py:128, in ExternalTable._download_file(self, external_path, download_path)
    126     self.s3.fget(external_path, download_path)
    127 elif self.spec[\"protocol\"] == \"file\":
--> 128     safe_copy(external_path, download_path)
    129 else:
    130     assert False

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/utils.py:115, in safe_copy(src, dest, overwrite)
    113 dest.parent.mkdir(parents=True, exist_ok=True)
    114 temp_file = dest.with_suffix(dest.suffix + \".copying\")
--> 115 shutil.copyfile(str(src), str(temp_file))
    116 temp_file.rename(dest)

File ~/miniforge3/envs/spyglass/lib/python3.9/shutil.py:264, in copyfile(src, dst, follow_symlinks)
    262     os.symlink(os.readlink(src), dst)
    263 else:
--> 264     with open(src, 'rb') as fsrc:
    265         try:
    266             with open(dst, 'wb') as fdst:
    267                 # macOS

FileNotFoundError: [Errno 2] No such file or directory: '/Users/samuelbray/Documents/analysis/j1620210710_FRL083NP3E.nwb'"
}

Additional context
Add any other context about the problem here.

Note that the file is downloaded in a subdirectory. After running the above this statement executes

from spyglass.common import AnalysisNwbfile
analysis_file = (IntervalLinearizedPosition & lin_pos_key).fetch1("analysis_file_name")
path = AnalysisNwbfile().get_abs_path(analysis_file)

import os
assert os.path.exists(path)
@samuelbray32 samuelbray32 added bug Something isn't working infrastructure Unix, MySQL, etc. settings/issues impacting users labels Apr 3, 2024
@samuelbray32
Copy link
Collaborator Author

One solution is to catch this error and move the downloaded file to the correct location if so. Here's an example:

import os
import shutil

table = IntervalLinearizedPosition & lin_pos_key

try:
    (table).fetch_nwb()
except FileNotFoundError as e:
    # get the location stored as a datajoint filepath
    dj_path = str(e).split(': ')[1].replace("'","")
    print(dj_path)
    # get the location where AnalysisNwbfile.kachery would have stored it
    analysis_file = (table).fetch1('analysis_file_name')
    current_path = AnalysisNwbfile().get_abs_path(analysis_file)
    assert os.path.exists(current_path)
    # move the file to the datajoint location
    # this will change the output of future calls to AnalysisNwbfile().get_abs_path(analysis_file)
    shutil.move(current_path,dj_path)
    table.fetch_nwb()

Once this has executed once, future calls of fetch_nwb for the analysis file will work fine. One option would to put a version of this check in AnalysisNwbfileKachery to solve it in the background when the file is downloaded.

@edeno, do you have a sense if this will show up often enough that we should put this into spyglass or should this issue just be the solution for people on a case-by-case basis?

@edeno
Copy link
Collaborator

edeno commented Apr 5, 2024

I think it would be okay to have in AnalysisNwbfileKachery for now. It would be good to have it pretty well documented in the code why the check is happening.

@samuelbray32 samuelbray32 mentioned this issue Apr 5, 2024
5 tasks
@samuelbray32
Copy link
Collaborator Author

Came up with a cleaner solution in the PR above. Fixing the abs_path returned by AnalysisNwb.get_abs_path() to agree with datajoint entries. Since this is what's used to define where kachery saves the file to in the first place it fixes the issue before it happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infrastructure Unix, MySQL, etc. settings/issues impacting users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants