-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Loader Support #5
Labels
Milestone
Comments
@tvercaut can you review this please? |
@QianyeYang could you please also review this and comment if any? Thanks! |
NMontanaBrown
added a commit
that referenced
this issue
Jun 15, 2020
NMontanaBrown
added a commit
that referenced
this issue
Jun 15, 2020
NMontanaBrown
added a commit
that referenced
this issue
Jun 15, 2020
NMontanaBrown
added a commit
that referenced
this issue
Jun 15, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 15, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 15, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 15, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 15, 2020
Closed
ucl-candi
added
help wanted
Extra attention is needed
question
Further information is requested
labels
Jun 16, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 16, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 21, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 21, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 21, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 21, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 21, 2020
mathpluscode
added a commit
that referenced
this issue
Jun 22, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
s-sd
pushed a commit
that referenced
this issue
Jul 2, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Data Loader Support
To facilitate the user experience, we plan to prepare some default data loaders for different use scenarios. Currently, Nifti and H5 formats are supported. For different types of use cases and image formats, a customised data loader is needed (add a link to the tutorial).
Data Format
There are some prerequisites on the data:
(width, height, depth)
; label has shape(width, height, depth)
or(width, height, depth, num_labels)
.Supported scenarios
Unpaired images (e.g. single-modality inter-subject registration)
Grouped unpaired images (e.g. single-modality intra-subject registration)
Paired images (e.g. two-modality intra-subject registration)
Sampling during training
Sampling for multiple labels
In any case when corresponding labels are available and there are multiple types of labels, e.g. the segmentation of different organs in a CT image, two options are available:
When using multiple labels, it is the user's responsibility to ensure the labels are ordered, such that
label_idx
are the corresponding types in(width, height, depth, label_idx)
- the same type of landmark or ROI - between all labelsSampling for multiple subjects each with multiple images
When multiple subjects each with multiple images are available, multiple different sampling methods are supported:
a) moving image always has a smaller index, e.g. at an earlier time;
b) moving image always has a larger index, e.g. at a later time; or
c) no constraint on the order.
For the first two options, the intra-subject images will be ascending-sorted by name to represent ordered sequential images, such as time-series data
*Multiple label sampling is also supported once image pair is sampled; In case there are no consistent label types defined between subjects, an option is available to turned off label contribution to the loss for those inter-subject image pairs.
Examples (folder structure and filename requirement)
In the following, we take train directory as an example to list how the files should be stored.
Nifti Data Format
Assuming each
.nii.gz
file contains only one tensor, which is either image or label.Unpaired data
This is the simplest case. Data are assumed to be stored under
train/images
andtrain/labels
directories.Nifti Case 1-1 Images only
We only have images without any labels and all images are considered to be independent samples. So all data should be stored under
train/images
, e.g.:(It is also ok if the data are further grouped into different directories under
images
as we will directly scan all nifti files undertrain/images
.)Nifti Case 1-2 Images with labels
In this case, we have both images and labels. So all images should be stored under
train/images
and all labels should be stored undertrain/labels
. The corresponding image file name and label file name should be exactly the same, e.g.:Grouped unpaired images
Nifti Case 2-1 Images only
We have images without any labels, but images are grouped under different subjects/groups, e.g. time-series observations for each subject/group. For instance, the data set can be the CT scans of multiple patients (subjects/groups) where each patient has multiple scans acquired at different time points. So all data should be stored under
train/images
and the leaf directories (directories that do not have sub-directories) must represent different subjects/groups, e.g.:(It is also ok if the data are grouped into different directories, but the leaf directories will be considered as different subjects/groups.)
Nifti Case 2-2 Images with labels
We have both images and labels. So all images should be stored under
train/images
and all labels should be stored undertrain/labels
. The leaf directories will be considered as different subjects/groups and the corresponding image file name and label file name should be exactly the same, e.g.:Paired images
In this case, images are paired, for example, to represent a multimodal moving and fixed image pairs to register. Data are assumed to be stored under
train/moving_images
,train/fixed_images
,train/moving_labels
, andtrain/fixed_labels
directories.Nifti Case 3-1 Images only
We only have paired images without any labels. So all data should be stored under
train/moving_images
,train/fixed_images
and the images corresponding to the same subject should have exactly the same name, e.g.:(It is ok if the data are further grouped into different directories under
train/moving_images
andtrain/fixed_images
as we will directly scan all nifti files under them.)Nifti Case 3-2 Images with labels
We have both images and labels. So all data should be stored under
train/moving_images
,train/fixed_images
,train/moving_labels
, andtrain/fixed_labels
. The images and labels corresponding to the same subjects/groups should have exactly the same names, e.g.:H5 Data Format
Each
.h5
file is similar to a dictionary, having multiple key-value pairs. Hierarchical multi-level h5 indexing is not used. Each value is either image or label.Unpaired images
H5 Case 1-1 Images only
Each key corresponds to one image, e.g.
{"subject1": data1, "subject2": data1, ...}
. All data should be stored undertrain/images
, it can be a single h5 file or multiple h5 files e.g.:H5 Case 1-2 Images with labels
Each key corresponds to one subject. Data can be stored in two single h5 files (one for image and one for label), the keys in the files should be the same.
Grouped unpaired images
H5 Case 2-1 Images only
Similar to case 1-1 above, but the keys, in this case, have to share the same format like
subject%d-%d
where%d
represents a number. For instance,subject3-2
corresponds to the second observation for the subjects. Otherwise, the file structure is the same as case 1-1, e.g.H5 Case 2-2 Images with labels
Similar to case 1-2 and 2-1 above, the keys have to share the same format like
subject%d-%d
and the keys for images and labels should be consistent.Paired images
In this case, data are paired. Data are assumed to be stored under
train/moving_images
,train/fixed_images
,train/moving_labels
, andtrain/fixed_labels
directories.H5 Case 3-1 Images only
We only have paired images without any labels. So all data should be stored under
train/moving_images
,train/fixed_images
and the keys corresponding to the same subject should be the same, e.g.:H5 Case 3-2 Images with labels
We have both images and labels. So all data should be stored under
train/moving_images
,train/fixed_images
,train/moving_labels
, andtrain/fixed_labels
. The keys corresponding to the same subject should be the same, e.g.:The text was updated successfully, but these errors were encountered: