Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get initial traits in place for NeuroCore #1

Merged
merged 15 commits into from
Dec 27, 2019
Merged

Get initial traits in place for NeuroCore #1

merged 15 commits into from
Dec 27, 2019

Conversation

Tokazama
Copy link
Member

@Tokazama Tokazama commented Dec 17, 2019

This is the first set of traits proposed for NeuroCore. This is all based off of the Brain Imaging Data Standard so the exact traits/properties proposed should remain conceptually the same (return the same parameter). At this point the goals should be:

  • Agreeable syntax
  • Good documentation
  • Type stability whenever possible
  • Sensible defaults

Outline

  • "arrays.jl": aliases and properties directly related to arrays
  • "properties.jl": introduce getter and setter to enforce type stability and search for the properties a la ImageMeta.jl.
  • "coordinates.jl": introduces some basic types useful for keeping track of coordinate spaces. This will hopefully be more useful in referencing brain atlases. There's also some basic interface for scanner to anatomical transforms via the sform and qform methods. This is derived from NIfTI standard but has been problematic in other imaging software working together (ITK and HCP project).
  • "bids.jl": directly translated BIDS entities
  • "enums.jl": enumerations to encode some basic categories
  • "getproperty.jl": constructs the getproperty and setproperty for the BIDS entities.

Issues

  • 1. Can we just use a dictionary lookup for most of these (meta info)?
    • Properties are accessed through a getproperty interface with a dictionary backend. These map directly to internal functions. This allows for an internal function for quick access to the property without exported a ridiculous amount of methods. It also allows future packages to overload the specific method if faster access is needed for the property. This will be documented to make more clear in a subsequent PR.
  • 2. How strictly should time type be parametrized?
    • For now this is strictly enforced for all properties that will be stored in a dictionary. This could be relaxed in the future because Unitful measurements allow us to know if there is a difference between seconds, milliseconds, etc. when retrieving a property. However, the BIDS documentation is pretty clear about what is expected for this and the getter/setter methods help convert to the appropriate type automatically. If it ends up being overly restrictive it could be changed in the future.
  • 3. If portions of this belong in other packages, where should they go?
    • The actual properties and types important to the the general interface will be the focal point of this package. There are numerous data set oriented specifications in the BIDS documentation. I've removed those for now but would like to help implement them elsewhere. BIDSTools.jl may be a good place for this. There's also a possibility that a general NeuroDatadeps.jl package could be useful.

TODO

  • Switch to CoordinateTransformations.AffineMap

Copy link

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty epic starting point, particular so much useful documentation for so many attributes (is this taken from elsewhere or written just for this library?)

I do feel like it's a lot of generic functions to support all these attributes and wonder whether you've tried a more dictionary-like interface for metadata. Having a single generic function to extract a "metadata dictionary-like object", followed by use of a type stable getproperty interface on that metadata could provide a convenient syntax for many attributes without exporting such a lot of functions. This would also tie in with a more "table/data driven" approach to metadata, for example, functions to interactively list or search through all the standard metadata documentation would be invaluable.

Also I'm always a bit suspicious of lifting metadata into the type domain unless actually required ;-) In that regard, I'd be interested to know how you want to use the modality singleton types. Dispatching on values can be perfectly fine (very much depending on the context of course) and much kinder to the compiler.

I also saw that a lot of this is not neuroscience specific; rather there's many general MRI concepts here. Calling it NeruoCore is fine, but it already seems like a general tool for medical imaging. (This is true of BIDS in general; while it's ostensibly for the brain and says a lot about neuroscience-specific modalities, the BIDS conventions are relevant for a wide variety of medical imaging datasets.)

In any case, it's great to see this being worked on, cheers!

Returns affine matrix. For an instance that returns `spacedirections` this is
the corresponding tuple converted to an array.
"""
affine_matrix(x::Any) = getter(x, "affine_matrix", MMatrix{4,4,Float64,16}, i -> default_affinemat(i))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using CoordinateTransformations.AffineMap for affine maps?

Personally I consider the use of 4x4 matrices for affine transformations to be an unfortunate failure of abstraction: applying them to lenght-3 vectors involves lifting those vectors into homogenous coordinates and back again, and this lifting provides extra complication without any real value.

As far as I know, the only time homogenous coordinates are really worthwhile is

  • In computer vision applications where the set of 4x4 matrices also supports projection and there are some neat algebraic identities which can help with things like camera calibration.
  • In languages which have builtin matrices but don't have free abstractions. Like matlab.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was originally using but then there were so many weird transformations that NIfTI required that I just used a 4x4 matrix. That being said, I think you're right about this and it would be better to use CoordinateTransformations.AffineMap.

src/NeuroCore.jl Outdated
effective_echo_spacing,
effective_echo_spacing!,
total_readout_time,
total_readout_time!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this is a large number of names to export, especially given

  • A lot of these seem to be trivial wrappers around an internal dictionary-like interface.
  • There's a large number of other attributes which aren't covered here, and achieving full coverage of the very large number of DICOM attributes seems impractical.

Did you consider a more dictionary-like interface for some of these attributes? For example, perhaps allowing syntax such as

metadata(x).manufacturer

which is possible to make type stable if done carefully

(Achieving x.manufacturer is possible and tantalizing but likely undesirable due to the need to "steal" getproperty on some non-concrete supertype of x.)

I feel there may be two rough categories of metadata:

  1. Metadata which is directly required for working with the pixel data. For example, pixel spacings and coordinate systems.
  2. Metadata which is somewhat irrelevant or only indirectly relevant, such as the institution name.

The first type is a more limited set of metadata and we may plausibly get a minimal required set of functions for accessing this like ImageCore. The latter type of metadata is fairly open ended and I'd suggest it's better handled with some kind of dictionary like lookup. Does the images ecosystem also have a similar distinction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a large number of other attributes which aren't covered here, and achieving full coverage of the very large number of DICOM attributes seems impractical.

I agree. That's why I've focused on what's in BIDS. It's still a lot but theoretically this can be broken up into smaller packages if it's found to be useful (eg., BIDSImage, BIDSEeg, BIDSData, etc).

The latter type of metadata is fairly open ended and I'd suggest it's better handled with some kind of dictionary like lookup.

A large portion of this is in fact a wrapper around a dictionary lookup but makes it type stable. JuliaImages uses properties to retrieve a dictionary with keytype == String and valtype == Any. ImageMeta has a nice some nice syntax that allows img[::String] to search the underlying property structure. However, that would restrict everything to have ImageMeta as the top level wrapper where it might not be appropriate (e.g., non arrays such as graphs, fiber tracks, etc.).

One example where this could be immediately useful is if you look at MRIReco.jl where there are distinct structures for raw acquisitional data, sequence parameters, etc. But very little of that is needed in NIfTI files. There's some overlap but it's not enough to agree upon a single structure for say acquisition parameters. This example doesn't even account for GIFTI, CIFTI, BED , etc. where there the data isn't an image or is a mesh instead.

All that being said, I'm certainly open to a better solution. I just don't think there will ever be a single structure everyone can agree on which as you pointed out means using a dictionary. If there's another way of guaranteeing that I get Float64 everytime I call img["EchoTime"] then we could just require that syntax.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's another way of guaranteeing that I get Float64 everytime I call img["EchoTime"] then we could just require that syntax.

Sure. Very basic sketch:

struct Metadata
     d::Dict{Symbol,Any}
end

function Base.getproperty(m::Metadata, s::Symbol)
    if s === :EchoTime
        getfield(m, :d)[:EchoTime]::Float64 # Inferrable
    else
        getfield(m, :d)[s]  # Fallback (uninferrable)
    end
end
julia> m = Metadata(Dict(:EchoTime=>1.0, :Foo=>"hi"))
Metadata(Dict{Symbol,Any}(:EchoTime => 1.0,:Foo => "hi"))

julia> m.EchoTime
1.0

julia> m.Foo
"hi"

julia> @code_warntype (m->m.EchoTime)(m)
Variables
  #self#::Core.Compiler.Const(var"#15#16"(), false)
  m::Metadata

Body::Float64
1 ─ %1 = Base.getproperty(m, :EchoTime)::Float64
└──      return %1

julia> @code_warntype (m->m.Foo)(m)
Variables
  #self#::Core.Compiler.Const(var"#17#18"(), false)
  m::Metadata

Body::Any
1 ─ %1 = Base.getproperty(m, :Foo)::Any
└──      return %1

Then provide a generic function metadata(x) which for any x should return something with the same property-like interface as Metadata (it could literally be a concrete Metadata, or some other package-specific type).

Copy link

@c42f c42f Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly, that type assertion for ::Float64 is a simplification and you may want to dispatch to extra logic there.

The main point is that getproperty can be used to look up properties in a flexible backing dictionary and certain blessed property names can be treated in a special way which makes them inferrable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that approach a lot. I would need to create a unique array type that reimplements all of the dictionary traits in JuliaImages that using String as the keytype though. That or convince JuliaImages to change to a Symbol keytype.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, it will be interesting to see how it goes.

Note also that Metadata in my sketch could be implemented with actual fields for a subset of "expected" metadata, and a dict as a fallback for the rest. This would avoid the dict lookup for the set of blessed attributes, and would therefore be more efficient provided x keeps the Metadata in this format when it's constructed so that no work is required to compute metadata(x). Of course, you may want to call it MRIMetadata or some such, given most of these fields are very MRI-specific. Other types of x would have their own type with the same API but different blessed attributes.

struct MRIMetadata
     echo_time::Union{Float64,Nothing}
     # ...
     d::Dict{Symbol,Any}
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads me to a somewhat ignorant question. When I call sizeof on your structure it doesn't seem to matter if I use Nothing or Float64. Would it make sense to do something like this instead:

struct MRIMetadata{ET<:Union{Float64,Nothing}}
    echo_time::ET
   d::Dict{Symbol,Any}
end

This would prevent allocating memory for things that only have a couple fields that aren't nothing. I know this may be trivial right now, but if we want to have this metadata be flexible enough to work with something like individual fibers tracking algorithms, then it could get pretty expensive.

Copy link

@c42f c42f Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that parameterizing MRIMetadata on the presence or absence of metadata fields would be unnecessarily hard on the compiler and confer no practical advantage (it will need to create a new type each time the metadata changes). If you're worried about storage, you could go with a pure-Dict based solution instead. That would be simpler anyway so it might be for the best. Dict lookup is quite fast and is unlikely to be done in any inner analysis loop.

(Besides this, I would have imagined the relative sizeof(MRIMetdata) to sizeof(image_data) could easily be 1:1000 in which case it's not worth worrying about.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that parameterizing MRIMetadata on the presence or absence of metadata fields would be unnecessarily hard on the compiler and confer no practical advantage

I should probably clarify. I imagined that most users wouldn't ever construct these and that it would serve as a common way of organizing and accessing metadata for those actually creating packages.

struct Metadata{ET<:Union{Float64,Nothing}}
    echo_time::ET
   d::Dict{Symbol,Any}
end

const EmptMetadata =Metadata{Nothing}

This would allow the person implementing an EEG file reader to not worry about echo time but still use the same interface. As you said, it may be simpler to just go with a dictionary.

I would have imagined the relative sizeof(MRIMetdata) to sizeof(image_data) could easily be 1:1000 in which case it's not worth worrying about.

I agree with this in so far as it pertains to images, but this should be applicable outside of MRI as well. For example, we could have a small connectome composed of a 13x13 array across 10,000 subjects. This is definitely the extreme but I'd hate to paint myself into a corner early on.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's worrying you, I'd suggest go for the Dict. It will be less code anyway :-)

But I also don't think you're painting yourself into a corner too much by adopting one implementation or another. The public interface to this stuff will be getproperty, which is allowed to present the metadata as if it came from an actual field. But whether there's an actual field present is an implementation detail.

I suspect what's most important / potentially annoying here is to settle on appropriate names for the metadata fields because these are very much the public interface. For example echo_time vs EchoTime. Probably the only viable thing is to stick with the original field names as defined in the BIDS standard, even though they conflict with the Julia naming conventions.

src/time.jl Outdated
This parameter is REQUIRED if corresponding fieldmap data is present or the
data comes from a multi echo sequence.

please note that the DICOM term is in milliseconds not seconds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just using Unitful types for all physical measurements? It's reasonable easy (and safe!) to strip off unitful types with the newest variant of ustrip, eg, ustrip(u"s", echo_time(x)) if users don't want them.

DICOM is not very consistent about SI prefixes and I feel its fairly easy to make a mistake with these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Unitful route is probably the best way to go with these. I would like the actual units to be consistent though. Do you think it would make sense for there to be something like x -> uconvert(u"s", x) for the default here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's fine for the values to be returned in a unit which is any multiple of seconds, provided that's reflected in the type. I think this accurately reflects the way that DSP hardware works, which is to count in integer multiples of some internal clock time step.

To the user there's not really much of a cost vs always returning in seconds. They can do ustrip(u"s", x) or uconvert(u"s", x), and do arithmetic between compatible units as desired.

src/time.jl Outdated Show resolved Hide resolved
src/time.jl Outdated Show resolved Hide resolved
src/modalities.jl Outdated Show resolved Hide resolved
src/modalities.jl Outdated Show resolved Hide resolved

Returns the version of the BIDS standard that was used.
"""
bids_version(x) = getter(x, "BIDSVersion", String, x -> "1.0")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes make sense for a BIDS dataset as a whole. But maybe not much sense for individual images.

I think I'm missing quite some context about what x may be here, but I wonder whether some of these attributes belong in this library given they don't make much sense for individual images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. I've included them here for completeness until a more suitable home exists. I'd welcome any ideas on where these should belong.

@c42f
Copy link

c42f commented Dec 17, 2019

Oh, I see you've added some extra description while I was writing the above review. Perhaps some of those links answer some questions above...

src/coordinates.jl Outdated Show resolved Hide resolved
@timholy
Copy link
Member

timholy commented Dec 18, 2019

Sorry for my slow reply here.

Going through a deprecation process in JuliaImages to transfer over to Symbol keys.

Yep, that sounds like the right option.

Units

I don't know enough to comment usefully. My own bias is that it's best to use Unitful. Presumably BIDS is a file format? Thus when doing IO couldn't you convert to seconds?

A possible model is NRRD.jl which returns unitful-axes when the file is so annotated. Your BIDS reader could return all times in seconds but of course people could create their own metadata. Your writer could assume seconds if there are no units and otherwise convert to seconds before writing to disk.

If some time values come in arrays rather than being single scalars, the other issue is whether you'll ever want to use mmap to access very large arrays. In such cases, you're not able to change units to your preferred format because you have to export whatever is used for the on-disk representation. Of course you can use MappedArrays to do lazy transformations.

Other

First, kudos on a huge amount of work! I don't know DICOM and BIDS enough to be very useful here, but I would tend to lean towards using more of a dictionary/property interface than a getter/setter interface. It might save you a lot of code.

I'd also agree with the goal of avoiding type-tags where possible. For the coordinates, I can imagine the type-tags are necessary since I imagine coordinate transformation might be performance-sensitive.

@c42f
Copy link

c42f commented Dec 18, 2019

Presumably BIDS is a file format?

Actually no, it's a standard for curation of whole datasets containing multiple scans of multiple study subjects. It assumes that the unit of organization is the "session", which would typically be a scanning session in an MRI machine, resulting in multiple image types.

The BIDS format does specify that individual images be stored in the NiFTi format which has typically been used in neuroscience as conversion target from DICOM. Hierarchical metadata about individual images is stored in json sidecar files. There are conventions for storing tabular data pertaining to the study.

@Tokazama
Copy link
Member Author

Going through a deprecation process in JuliaImages to transfer over to Symbol keys.

Yep, that sounds like the right option.

I'll see if I can get a basic PR for ImageMetadata going later today.

If some time values come in arrays rather than being single scalars, the other issue is whether you'll ever want to use mmap to access very large arrays. In such cases, you're not able to change units to your preferred format because you have to export whatever is used for the on-disk representation. Of course you can use MappedArrays to do lazy transformations.

Hopefully this doesn't matter for mapping arrays because the time units are for axes or parameters. There may be some derived values in the image array (e.g., decay time in PET scans) but I would rather not open up that can of worms right now.

What I'm proposing would be that we convert the parameters (eg., echo_time, dwell_time, etc.). to seconds (as unitful types). I will probably implement the same thing with the axes in NIfTI.jl's file reading, but there's no way we could/should ever enforce all axes to be in seconds as that would mean imposing the BIDS standard on JuliaImages. I can't image the audio visual people would like that.

@c42f
Copy link

c42f commented Dec 19, 2019

There may be some derived values in the image array (e.g., decay time in PET scans) but I would rather not open up that can of worms right now.

From a pragmatic point of view I do agree with that. I still find that pervasively using Unitful can be frictionful because not all packages work well with unitful quantities. Also it can be hard on the compiler.

But for things like metadata I feel the cost is worth it because there can be such a depth and complex nonuniformity in metadata. So features which help people correctly understand and use metadata can be really worthwhile.

What I'm proposing would be that we convert the parameters (eg., echo_time, dwell_time, etc.). to seconds (as unitful types).

I still feel like this is somewhat unnecessary so I'd like to understand the rationale for the conversion. Unitful makes it very easy to correctly mix compatible units (eg, μs and s) in computations — indeed we have things like 1000_000u"μs" == 1u"s" — so I feel there's nothing preventing you from returning unitful quantities with the SI prefix which was used in the file.

@Tokazama
Copy link
Member Author

I'd like to understand the rationale for the conversion.
It would be nice to have as much consistency as possible, but the biggest thing is that I prefer to defer a lot of these more subtle decisions to those who have already spent a bunch of time deliberating over this (such as the BIDS committee).

That being said, I imagine that decision was largely driven by a need to accept a raw number with no actual units. We have the unique advantage of being able to programmatically determine the units of measurement. Theoretically developers shouldn't have too much trouble getting at the unit of time they want.

I'm starting to agree with your approach more. I'll meditate on it more today.

@c42f
Copy link

c42f commented Dec 19, 2019

I checked the BIDS specification again and saw the following:

Elapsed time SHOULD be expressed in seconds. Please note that some DICOM parameters have been traditionally expressed in milliseconds. Those need to be converted to seconds.

https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#units

So to the extent that this is actually followed by implementations, let's definitely return seconds! (I'm not suggesting we deviate from what's in the files, just that we limit unnecessary conversion when reading those files.)

Unfortunately in practice I've still seen a lot of mixed units in supposedly bids-format metadata files generated by the available tooling (possibly dcm2niix, possibly heudiconv; or possibly some of our own tooling; I'd have to check). I believe the mixed units are generally just inherited from the associated DICOM fields. For example ImagingFrequency expressed in MHz rather than Hz.

@Tokazama
Copy link
Member Author

Unfortunately in practice I've still seen a lot of mixed units in supposedly bids-format metadata files generated by the available tooling (possibly dcm2niix, possibly heudiconv; or possibly some of our own tooling; I'd have to check).

I think this is the motivation for standardizing it in some way. With the current direction doing something like meta.ImagingFrequency = u"MHz" would then return Hz with meta.ImageFrequency. The point of having setters and getters was to control that in some way.

For now I think I'll stick with strictly following the units under the BIDS standard. This is by no means a final statement on how it will be. I think we could make a good argument for relaxing this but I want to see how it does in the wild first. It will be much easier to relax this if it's found to be too burdensome than it will be to require everyone to get use to a more strict standard. So I'll start with the strict units enforced and go from there.

@timholy
Copy link
Member

timholy commented Dec 20, 2019

If you return unitful quantities then you can safely place responsibility for standardization on the consumer, and just report what's in the files. That's how I'd do it.

Zachary Christensen and others added 6 commits December 20, 2019 09:23
* BIDS entities have own dedicated file
* access to docstrings through `neurohelp` for non exported functions
* Use getproperty and setproperty to access metadata
* task_events -> events_table method
* BIDS data entities are now `BIDSMetadata`
* Added ImageMeta interface. This won't work publicly until #58 PR in
ImageMetadata goes through
* Have basics tests for most BIDS entitites now
* Enums mapping between for directions b/c many variations:
- actual numeric values used in calculating direction are
 positive and negative integers.
- strings are commonly used in JSON files
- named directions are derived from this and stored in axes
- fixed some typos
- bidsdata.jl now contains what could be the beginnings of a BIDS based
datadeps
- traits contains non BIDS entities properties
Most BIDS entities are tested. These tests will only pass with the
upcoming PR to ImageMetadata implemented (symbol keys).
@Tokazama
Copy link
Member Author

There's definitely a lot more to do but I think this is a good start for a first PR. I'll create issues for the other things that I think need to be done before a formal release can be made. I'll wait until tomorrow to merge this PR just in case someone else wants to take a look at things.

@Tokazama Tokazama merged commit bc035f3 into JuliaNeuroscience:master Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants