Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File management #481

Closed
dimitri-yatsenko opened this issue Aug 6, 2018 · 1 comment
Closed

File management #481

dimitri-yatsenko opened this issue Aug 6, 2018 · 1 comment
Labels

Comments

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Aug 6, 2018

Purpose

Enable the management of data as collections of files browsable and accessible bypassing DataJoint and organized in a way that makes sense to the user. DataJoint must maintain data integrity.

Operation

File tracking keeps the filenames and organization visible to and controllable by external users. The files are identified by their paths relative to the repository path. The datatype should be either file or file-suffix where the suffix can be up to eight character long.

Unlike external storage and attachments, file tracking does not rely on separate subfolders for each schema. Files must be accessible under the same locations by multiple schemas. Therefore, for tracked files, datajoint will create a local subfolder .datajoint where it will contain log files describing the tracking info. For example, for the file tracked at ephys/day1/file001.dat, datajoint will create the log file ephys/day1/.datajoint/file001.dat.log

The log file contains the information about the storing schema and the storing configuration.

@schema
class Ephys:
    definition = """
    -> Session
    ---
    ephys_file  :  filepath@ephys  # in-place path
    """

Insert

Inserting performs these tasks:

  1. If the source file is not in the target repository, copy it there
  2. Once the file is in the repository, update the log file to indicate how it is tracked: server, schema, and checksum (when reasonable).
  3. Insert the relative path into the target table
Ephys.insert1(1, '/sessions/2018-08-06/rec001.dat')

The file can be specified as the 2-tuple with the source and destination locations.

Ephys.insert1(1, ['c:/tmp/rec001.dat', '/sessions/2018-08-06/rec001.dat'])

Fetch

Fetch performs the following operations:

  1. Copy the file from repository to the download folder, if necessary.
  2. Return the full file path as the regular attribute value.
@dimitri-yatsenko dimitri-yatsenko added this to the Release 0.13 milestone Aug 6, 2018
@dimitri-yatsenko
Copy link
Member Author

Deletes

The deletes do not delete the files and only update the corresponding log files. A separate cleanup utility will enable deleting the files that are not tracked by datajoint.

@dimitri-yatsenko dimitri-yatsenko changed the title File tracking File management Dec 6, 2018
@eywalker eywalker removed this from the Release 0.13 milestone Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants