Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
181 lines (139 sloc) 7.72 KB

Uploading binary data in ITK

Since every local Git repository contains a copy of the entire project history, it is important to avoid adding large binary files directly to the repository. Large binary files added and removed throughout a project's history will cause the repository to become bloated, take up too much disk space, require excessive time and bandwidth to download, etc.

A solution to this problem which has been adopted by ITK is to store binary files, such as images, in a separate location outside the Git repository, then download the files at build time with CMake.

A "content link" file contains an identifying SHA512 hash. The content link is stored in the Git repository at the path where the file would exist, but with a .sha512 extension appended to the file name. CMake will find these content link files at build time, download them from a list of server resources, and create symlinks or copies of the original files at the corresponding location in the build tree.

See also our Data guide for more information.

Note: for historical reasons, before SHA512 hash files were used in ITK, MD5 hash content link files were used.

Adding images as input to ITK sources

ITK examples and ITK class tests (see Section 9.4 of the ITK Software Guide) rely on input and baseline images (or data in general) to demonstrate and check the features of a given class. Hence, when developing an ITK example or test, images will need to be added to the Git repository.

When using images for an ITK example or test images, the following principles need to be followed:

  1. Images should be small.
    • The source tree is not an image database, but a source code repository.
    • Adding an image larger than 50 Kb should be justified by a discussion with the ITK community.
  2. Regression (baseline) images should not use Analyze format unless the test is for the AnalyzeImageIO and related classes.
  3. Images should use non-trivial Metadata.
    • Origin should be different form zeros.
    • Spacing should be different from ones, and it should be anisotropic.
    • Direction should be different from identity.

Using Girder to get the SHA512 hash

Prerequisites

The data.kitware.com server is an ITK community resource where any community member can upload binary data files. There are three methods available to upload data files:

  1. The UploadBinaryData.sh shell script.
  2. The Girder web interface.
  3. The girder-cli command line executable that comes with the girder-client Python package.

Before uploading data, please visit data.kitware.com and register for an account.

Once files have been uploaded to your account, they will be publicly available and accessible since data is content addressed. Specifically, the hashsum_download plugin in Girder looks through all public (or private if authenticated) data for files with the given hash. Thus, so as long as the file is publically available somewhere on data.kitware.com, ITK will be able to retrieve the corresponding file.

At release time, the release manager will upload and archive repository data references in the ITK collection and other redundant storage locations.

Upload via the shell script

The UploadBinaryData.sh script will authenticate to data.kitware.com, upload the file to your user account's Public folder, and create a *.sha512 CMake ExternalData content link file. After the content link has been created, you will need to add the *.sha512 file to your commit.

When ./Utilities/SetupForDevelopment.sh is executed, as described in CONTRIBUTING.md, authentication to Girder is configured in Git. If the Git girder.api-key config or GIRDER_API_KEY environmental variable is not set, a prompt will appear for your username and password. The API key can be created in the data.kitware.com user account web browser interface.

To upload new binary testing data:

  1. Place the binary file at the desired location in the Git repository.
  2. Run the git data-upload alias, and pass in the binary file(s) as arguments. E.g. cd ITK; git data-upload ./Modules/Core/Common/test/Input/cthead1.png.
  3. In the corresponding test/CMakeLists.txt file, use the itk_add_test macro and reference the relative file path with DATA and braces. E.g.: DATA{Input/cthead1.png}.
  4. Re-build ITK, the ITKData target specifically, and the testing data will be downloaded into the build tree. The path in the build tree is used in test execution.

Upload via the web interface

  1. After logging in, you will be presented with the welcome page. Click on the personal data space link.
  2. Next, select the Public folder of your personal data space.
  3. Click the green upload button.
  4. Click the Browse or drop files to select the files to upload.
  5. Click Start Upload to upload the file to the server.
  6. Next, proceed to Download the Content Link.

Upload via Python script

A Python script to upload files from the command line, girder-cli, is available with the girder-client Python package. To install it, type:

   $ python -m pip install girder-client

To upload files with the girder-cli script, we need to obtain an API key and a parent folder id from the web interface.

  1. After logging in, select My account from the user drop down.
  2. Next, select the API keys tab.
  3. Create a new API key if one is not available by clickin on Create new key.
  4. The show link will show the key, which can be copied into the command line.
  5. Next, select My Folders from the user drop down.
  6. Next, select the Public folder of your personal data space.
  7. Click the i button for information about the folder.
  8. The Unique ID can be copied into the command line.

Use both the API key and the folder ID when calling girder-cli. For example,

   $ girder-cli \
       --api-key 12345ALongSetOfCharactersAndNumbers \
       --api-url https://data.kitware.com/api/v1 \
       upload \
       58becaee8d777f0aefede556 \
       /tmp/cthead1.png

Next, proceed to Download the Content Link.

Download the content link

  1. Click on the file that has been uploaded.
  2. Click on the i button for further information.
  3. Finally, click on the Download key file icon to download the key file.

Move the content link file to the source tree at the location where the actual file is desired in the build tree. Stage the new file to your commit:

   $ git add -- path/to/file.sha512