New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSH & amp2response #786

Merged
merged 13 commits into from Dec 8, 2016

Conversation

Projects
None yet
4 participants
@Lestropie
Member

Lestropie commented Sep 26, 2016

Thought I'd propose to include this as part of the upcoming tag, since it needs to go with a tag if it does gets pushed.

New command amp2response is a more robust mechanism for deriving response function coefficients from single-fibre-voxel DWI data. Unfortunately Dropbox isn't letting me get public links to abstract / poster PDFs in my Public directory :-/ (I'll add the links when I can, or ask me and I can email them). But basically: Instead of averaging single-fibre voxels, it collates all single-fibre-voxel data together and finds the response function coefficients using all data at once. It also explicitly forbids the RF from being negative, and forces the amplitude to strictly increase from the fibre direction out to the orthogonal plane.

So, questions:

  • Do we include it, or are people sceptical?
  • If we include it, does it become the 'recommended default'?
  • If it becomes the 'recommended default' (or not), do we leave an option in the dwi2response script to still use sh2response (or switch to amp2response) for compatibility's sake?

Edit: Could also make it deal with multi-shell data internally rather than having to run it in a loop.

Lestropie added some commits May 19, 2016

New command: amp2response
This command estimates a single-fibre response function based on a mask of single-fibre voxels and an estimated fibre direction in each voxel.
Unlike the sh2response command, which simply aligns the per-voxel signals and averages their m=0 SH coefficients, this method performs a least-squares fit to the diffusion data in all single-fibre voxels, while enforcing non-negativity and monotonicity constraints.
This commit also includes a new header file, lib/math/ZSH.h, which includes functions for dealing with Zonal Spherical Harmonics (spherical harmonics where only the m=0 terms are non-zero).
Move cartesian2spherical() and spherical2cartesian() to Math::Sphere
Previously these were provided in lib/math/SH.h, which was an unusual header file to include in cases where only these conversions were required rather than any spherical harmonic functionalty.
@@ -125,13 +126,11 @@ namespace MR
if (!init_filter.size())
init_filter = Eigen::VectorXd::Ones(3);
init_filter.conservativeResize (size_t (lmax_response/2)+1);

This comment has been minimized.

@Lestropie

Lestropie Sep 26, 2016

Member

Note: Possible use of uninitialized data here...

Lestropie added some commits Sep 26, 2016

Further incorporation of ZSH
Function converting 'SH' to rotational harmonics (RH) was actually operating on ZSH coefficients, and so has been explicitly moved.
Generation of response function coefficients based on a tensor model with a given FA was also moved for the same reason.
@Lestropie

This comment has been minimized.

Member

Lestropie commented Sep 28, 2016

Documents: Abstract Poster.

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Oct 6, 2016

Ok, there needs to be some input here; so I'm going to do an effort of sorts... I've had a good and long think about this; these are my ideas on this at the moment, as in: this is where I am with that thinking up to now. There's essentially 3 main components about the method implemented in amp2response that can (imo) independently be considered, compared to the strategy we have in place with "amp2sh --> sh2response". Here's some bullets on everything that's in my mind at the moment.

  • How the data of different voxels is combined: not separately fit by SH and subsequently averaged, but rather all fit jointly and after reorientation so the fit can immediately be by ZSH. There's no scenarios I can think of where this can introduce confounds that were not inherently already there with the previous strategy. This makes essentially no other assumptions about the size and shape of what we desire from a response: both processes lead to an axially symmetric "something", that is constrained by the lmax, i.e. there's a hard cap on the highest frequencies present. I feel the new strategy is more elegant, because it simply does it in one step. Practically, the result should even be exactly the same right? This because it's still a least squares fit, which is essentially the equivalent of averaging. So way to many words in this bullet to conclude I'm happy with this aspect/component. I happily challenge other people to spot mistakes in my reasoning and various ramblings here (but I'm quite confident about it 😛).
  • The monotonicity constraint: this is the one that I'm mostly sceptical about. Let me first clarify that I don't question that it does exactly what it says on the box, and does that in the best possible way: there's simply a constraint on it and within that constraint, it's still a least squares fit (i.e. no questionable "hacks" here). I also don't question that this can in certain scenarios be a desirable thing when the end product would be the response function: let's say I want to study the actual shape of it, I could be misled to believe it's not monotonous due to the cap in lmax in the (Z)SH basis and not using a monotonicity constraint. One could of course reason that the mistake I made in this case is on me (as a "response shape researcher"); and a way to resolve it would be to increase the lmax until the effect of this "confound" is gone or becomes negligible. I could even reason that a monotonicity constraint is itself a confound, since it makes prior assumptions on the shape. But, well, whatever: there's no harm in having this functionality at my disposal, and I'm sure there's got to be sensible uses for it when I'm doing "stuff with responses".
  • The monotonicity constraint continued: scepticality explained. What I'm essentially worried about is the question on what is the best in the context of using the obtained response for performing (any of our variants of) spherical deconvolution. One aspect of this is that we haven't validated the effect of the monotonicity constraint on the subsequently resulting FODs after SD methods. I do agree it's going to be subtle and it'll surely be very hard to tell "what is the best" just from qualitative results. But what I'm concerned about is, given that we work in an harmonic basis with an lmax cap: what's the best (most correct) representation of the response in that basis. Note that it's shape doesn't per se matter here; it just has to be the thing that gives the most accurate FOD after deconvolution. This is the same for other, more trivial, spherical functions. Take the delta function as such a more trivial example, that I think allows me to illustrate (prove?) my point more clearly. Let's say the true response were an actual delta function (not talking about SH here yet, just a real delta symmetrical function that's non-zero along a single direction/axis, non-zero everywhere else, and integrates to 1). This would mean the "FOD" after deconvolution is supposed to be exactly the same as my signal. Let's take a voxel where the actual measured signal can exactly be represented in an SH basis up to lmax 8. To represent my FOD (which equals my signal in this example) in that SH basis, the way to go is a least squares fit. That's my ground truth in this scenario. Now I'll try to obtain the exact same result with spherical deconvolution, performed in the SH basis, given the delta function (as measured from a bunch of identical exemplary voxels that have an exact delta function as the signal). The only way to obtain the most accurate (in this case exactly correct) result, is to fit the delta function least squares without constraints. This will give me the "SH delta function", that is indeed not monotonous (or non-negative!). Deconvolving that one from the measured signal combined with the SH transformation (which is what we do in our SD variants), gives me the same as performing only the SH transformation, i.e. a least squares SH fit, which was my ground truth. Anything else is less accurate: so using a non-negative monotonous representation (i.e. our apodised SH delta function) for this purpose is strictly less accurate... even though a non-negative monotonous (apodised) SH delta function shares more "shape/amplitude qualities" with a real delta function, in terms of both indeed being non-negative and monotonous. We do still desire certain qualities of the FOD though, such as non-negativity, but these are applied at the stage that we fit the FOD. These do make sense for other purposes, such as the fact that we hope to be able to segment the FOD into fixels and evaluate their total integral. But this is entirely independent of how the response function was represented.
  • I was initially less sceptical about the non-negativity constraint for the response, but essentially from a formal point of view the long above bullet point applies in the same way here...
  • From a practical point of view, I think I was initially less concerned about the non-negativity constraint, because I felt that this was not so much of an issue at the moment any way. But if we would eventually have something in place that, as a preprocessing step before SD comes into play, corrects for Rician bias, I think this does become "a thing". But again the same reasoning above applies here, and enforcing non-negativity on the response may only reduce accuracy of the FOD... right?
  • Since this pull request also has the changes that introduced better separated ZSH facilities: that's definitely a good thing. I don't have the time now to look into all the details; and I've not been working in that part of the code itself (recently) enough to check these things efficiently... @jdtournier , @bjeurissen : maybe one of you guys could give this a check? (but I do appreciate we're all very busy)
  • Conclusion: amp2response is a useful thing for fitting the response on all requested voxels at once. The ZSH implementations are super useful as well. I'm sceptical about the monotonicity and non-negativity constraints on the response, for the purposes of using these responses in an SD context. I'm not against them being offered as options in amp2response for people who want to consciously do this, probably outside of a subsequent SD context. But I'm less comfortable with them being on by default. I'm ok with amp2response being the command that's called from the dwi2response script(s) for this step. Being less comfortable with the constraints in an SD context, I don't think they should be offered as options in the dwi2response script, since the algorithms offered in this script are meant to serve the users in an SD context. There's nothing that limits the users from using dwi2response with any of the algorithms as the -voxels option and subsequently manually using these voxels combined with the amp2response command while manually switching on the options for the constraints in this command though. This requires enough conscious steps for them to be aware that they are doing this with a sensible scenario in mind.

@jdtournier , @bjeurissen : I think we really also need some input from you guys on this one though (even just to check whether I'm making sense with everything I posted here). Ultimately, when people use these things in an SD context, this will affect the quality of the results they get and report, and this may influence the public image of both CSD and MSMT-CSD as a consequence. It's in everyone's interest that the (MSMT-)CSD results are of the best quality possible.

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Oct 6, 2016

Just noticed: @Lestropie , your links are both to the abstract (the poster one as well).

Lestropie added some commits Dec 7, 2016

amp2response: -noconstraint option
Disables the non-negativity and monotonicity constraints, instead reverting to an ordinary least-squares solution for the response function ZSH coefficients.
amp2response: Clarify one row per shell in RF output includes b=0
In case somebody uses it without specifying -shell option, then tries to apply the result to dwi2fod csd, which won't work.
@Lestropie

This comment has been minimized.

Member

Lestropie commented Dec 8, 2016

Merged to tag_0.3.16 for defragmentation purposes pre-master-merge, but amp2response is still floating around on its own for now, i.e. dwi2response doesn't use it yet. So discussion is ongoing. Command is there to test if anybody wants.


Practically, the result should even be exactly the same right? This because it's still a least squares fit, which is essentially the equivalent of averaging.

I don't think the mean of the results of independent least-squares fits on independent subsets of over-determined data is precisely equivalent to the least-squares fit to all data. As an extreme example: Imagine that you take your data, bisect it straight down the middle of your domain, and do a least-squares fit to each half independently; the mean of those two results will almost certainly not be equivalent to the least-squares solution that takes all data into account. And I would think that the difference would be above and beyond that of errors introduced by matrix solves or the like. If anybody knows the theory behind it though, speak up. As I always say, my math is junk.

... Let's say the true response were an actual delta function ...

I'm honestly having a lot of trouble getting anywhere much with this example. I get the point you're making, but in all the counter arguments I've come up with none of them are definitive enough for my liking. I'll give one a try and we'll see how we go... Trying my best to limit the topic purely to the application v.s. non-application of constraints, but it's easy for the lines to blur with other factors.

Even if the 'true' response function is a true spherical delta function, and you want to extract a truncated SH delta from response function estimation, it still has to be estimated from single-fibre voxels and an A2ZSH transform. The only way you could possibly achieve this result is:

  • If the fibre direction of every single-fibre voxel coincides with a DWI gradient direction. (And the number of sampling directions must be finite, otherwise we wouldn't be talking about a truncated SH delta)

  • If there is no noise. (In addition to just noise propagation and fibre orientation estimation, we deal with magnitude data, and therefore you can't possibly fit an SH delta function to noisy data with i.i.d. residuals)

So the question then becomes: In the presence of such confounds, combining signals from multiple single-fibre voxels for RF estimation, will what you get out be closer to the desired SH delta (and therefore be better for deconvolution) if you don't apply the constraints than if you do? Personally I think no: I don't think you're going to see any negativity / non-monotonicity arising due to the actual underlying SH delta you're seeking, I think if it's there it's going to be from the fitting procedure and/or the confounds.

With a realistic diffusion MRI response function, the RF features that the constraints are influencing are physically non-realistic. Moreover, the worse the data, the more prominent these undesirable features are, and applying the constraint brings the result closer to what we get from better-quality data (with or without constraints). If the constraints were degrading response function estimation rather than improving it, I'd expect to see the converse.

So even though omitting the constraints may allow you to get closer to the ideal response function in your extreme example, I'm just not convinced that that result extends to real use cases. But like I said, I can't fully pin it down.

My suspicion is that mathematically it has to do with extending the argument from a real spherical delta to a truncated SH delta, which implies sparse sampling, but then not knowing the single fibre voxel fibre orientations a priori, so you can't align one of your image samples with the delta and have every other direction yield a value of zero even in a noiseless case. Which means you'll never be able to construct that delta function in the space of image-intensity-as-a-function-of-angle-from-fibre-direction. It's like an intrinsic limitation being imposed by the fact that the RF model is coming from the same image data you're applying SD to, rather than the limitation being caused by whether or not constraints are applied during the RF model fitting.

Plus, the effect on good quality data is very small (see below).

... But if we would eventually have something in place that, as a preprocessing step before SD comes into play, corrects for Rician bias, I think this does become "a thing". But again the same reasoning above applies here, and enforcing non-negativity on the response may only reduce accuracy of the FOD... right?

If, once we have such a mechanism, it turns out that a hard non-negativity constraint is too stringent for such data, we try regularisation instead. But you then have to justify to users why allowing the response function to be negative is preferable to not, when we have the capability of applying that constraint, and also after we've been selling hard-constraint CSD as being preferable to soft-constraint. Yes it's a burden of proof fallacy; but in the absence of a meaningful test, personally I'd trust the constrained result that conforms to realistic physical expectations, rather than deliberately retaining implausible features in case it might be preferable for SD.

Conclusion: amp2response is a useful thing for fitting the response on all requested voxels at once. ... I'm sceptical about the monotonicity and non-negativity constraints on the response, for the purposes of using these responses in an SD context ... I'm less comfortable with them being on by default.

Here's the RFs on the 'typical' dataset I tried using amp2response with & without the non-negativity / monotonicity constraints:

sh2response:                     269.042 -157.993 84.6160 -32.7326 7.74981
amp2response (no constraints):   269.066 -158.035 84.5761 -32.6549 7.62571
amp2response (with constraints): 269.065 -158.070 84.4290 -33.0143 6.92392

The difference with/without constraint is not much more than the difference between sh2response and amp2response without constraints. So I don't see that the latter change should in practise cause significantly more difference (and therefore raise more concern) than the former.

Where the constraints make a larger difference is in cases where not having such constraints produces results in poorer-quality data that are inconsistent with what we see in better-quality data; in which case it would seem unusual to me to not fix such issues, because it's highly unlikely that retaining those features would be more 'accurate'.

@Lestropie Lestropie merged commit a6c45d0 into tag_0.3.16 Dec 8, 2016

1 check failed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
@jdtournier

This comment has been minimized.

Member

jdtournier commented Dec 8, 2016

Guess it's about time I chipped into this discussion... So in no particular order:

Practically, the result should even be exactly the same right? This because it's still a least squares fit, which is essentially the equivalent of averaging.

I don't think the mean of the results of independent least-squares fits on independent subsets of over-determined data is precisely equivalent to the least-squares fit to all data.

Depends how it's done... Let's take the voxel-wise least-squares fit:

  • Z2A × z = s

where Z2A is a voxel-wise projection from zonal SH to DW signal accounting for the different main fibre direction in each voxel. Solving per voxel would give:

  • z = A2Z × s

where A2Z = Z2A-1. If we simply extend this to solve over all voxels by forming a much wider A2Zall by concatening all the A2Zi matrices, and a much longer vector sall by concatening all the si vectors:

  • zfinal=1/N A2Zall × sall

then this really would be the same as averaging the per-voxel results...

However, here we have the opportunity to solve the problem across all voxels in one go:

  • Z2Aall × z = sall

where Z2Aall is now a much taller matrix formed by concatenation of the individual Z2Ai matrices. We can solve this using standard solvers (I'd typically go for a Moore-Penrose pseudo-inverse with Cholesky decomposition since I'd expect the problem to be well-conditioned, but there are more stable solvers out there).

But the main point here really is that condition number of the Z2Aall matrix will be lower than that of its constituent Z2Ai, so we should have a much more stable fit. So for example, if you have a poorly-conditioned DW scheme, some of the Z2Ai matrices will be very poorly conditioned, just because the fibre direction lines up with a particular bad combination of DW gradient directions. In these cases, the individual per-voxel fits will introduce a massive amount of noise. Averaging will help reduce some of that, but it will nonetheless percolate through to the output. On the other hand, if this matrix is simply concatenated with the rest, other matrices will contribute complementary information, so that those directions in the solution space that were ill-conditioned when considering one of these matrices alone are now well-conditioned given the information from other matrices, and vice-versa.

To illustrate, consider matrices A = [ 1 0; 0 1e-3 ] and B = [ 1e-3 0; 0 1 ]. We want to estimate x where:

  • A x = a

  • B x = b

If you solve this as ½ ( A-1 a + B-1 b ), then any noise in a2 or b1 will be massively amplified and make the result worthless, since A-1 = [ 1 0; 0 1000 ] and similarly for B-1. On the other hand, solving both equations in one:

  • [ A; B ] x = [ a; b ]

will provide a stable solution. In fact, in this case, we can just add the two equations to see this:

  • (A+B) x = (a+b)

will be stable since (A+B) = [ 1.001 0; 0 1.0001 ], which is basically the identity...

All this to say, it's worth doing this in one go if we can, it'll help for poorly-conditioned situations, and furthermore makes it applicable in situations that would otherwise be under-determined - you could for instance use this to get a full lmax = 8 response from 12 direction data, provided you had enough fibre orientations in the single-fibre mask.


About the scepticism regarding the monotonicity and non-negativity constraints:

I think I understand the point that's being made, although I have to admit I'm not sure. It sounds like the issue relates to the fact that we have a truncated basis, which would naturally lead to oscillations and negativity even in the pure noiseless case. For instance, the delta function definitely will suffer from both of these due to Gibbs ringing - hence the apodisation. So any attempt at constraining these aspects will necessarily lead to deviations away from the perfect response, if it would otherwise have exhibited either of these traits.

So I think that's a fair argument - one that I have used in the past to defend the use of a soft constraint on the FOD: it allows some negativity in the FOD, which you would expect due to Gibbs ringing and the very sharp features in the FOD. And yet you guys have elected to use a hard constraint...

Which is where I agree with Rob: there's little point in having a more accurate reconstruction if its precision is woeful. You're much better off allowing some mild bias in the output if it brings precision down to the level where the results can actually be used meaningfully. And it seems these constraints only really make a difference when the estimation is poorly conditioned to begin with, which is exactly what we want. So that's point 1.

The next point is that whereas the FOD is expected to have sharp features, the response is expected to be relatively smooth. You'd really have to crank up the b-value to get any reasonable amount of high frequency content, and even then you'll probably hit the limit of the inherent orientation dispersion at some point. So this would argue that we should expect Gibbs ringing to be much less of a problem, and the truncated SH basis to actually provide a really good fit for a reasonable value of lmax. If this is true, we don't need to worry about whether the constraints will introduce a bias in the results due to their interaction with the SH representation (in contrast to the FOD case, where I think we really do need to worry).

Finally, these constraints simply express what we expect to see from the physics of the problem, and are I think fully justified. The response is at heart just the DW signal, so must be positive-definite - unless we're talking about porous media or something... This is true whether or not Rician bias correction has been applied. It's also the DW signal for a single, coherently-oriented (though typically dispersed) fibre bundle, and there are no situations that I can think of where you could reasonably expect non-monotonic behaviour. Every other model in the field will predict a monotonic decrease in the signal from radial to axial, and I think we can safely assume that too.

All of this of course applies to dMRI in tissue (brain or muscle), things might be different in different contexts. But given that this is our target audience, I think we're actually fully justified to turn on these constraints by default. If users do genuinely want to investigate the details of the response function, they would have the option to disable these features - but the vast majority of users will just want the best response they can get from their data without really having to worry about the details of how it was derived. Here, I think the constraints are justifiable from the physics of the problem, shouldn't interfere with the estimation process due to the limitations of the SH basis (or any other such issue), and should allow trouble-free processing of datasets that would currently prove very problematic, namely those at the low end of the quality spectrum, i.e. the clinical realm...

About that last comment, it's precisely because the users that would benefit the most from this option would be the more clinically oriented researchers that I think we should enable the constraints by default - they're the least likely to understand why their results look terrible, or to figure out that there is a solution for their data analysis...

As usual, just my 2 cents, feel free to disagree...

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Dec 9, 2016

(much rambling about not so much; good -maybe fun- if you've got lots of free time, otherwise skip to the paragraph that starts in bold text 😄)

Uh-oh, all these efforts to explain... I should have mentioned some things I figured out earlier... 😬

First things first, on the topic of how the data are combined: yep, I figured and fully admit I made a mistake in reasoning there (a while ago...) via a similar example as the one @Lestropie mentioned. I fully agree that it's not formally the same. In practice, the difference will only significantly show in case of poor conditioning, or under-determined cases (as evidenced by Rob's results for sh2response vs amp2response without constraints, in a well conditioned case). So as I mentioned before, I'm all for using that in any scenario.

Small (really not that important) comment on this though:

All this to say, it's worth doing this in one go if we can, it'll help for poorly-conditioned situations, and furthermore makes it applicable in situations that would otherwise be under-determined - you could for instance use this to get a full lmax = 8 response from 12 direction data, provided you had enough fibre orientations in the single-fibre mask.

Note that even that is not under-determined (but yes, quite poorly conditioned): we only need to estimate 5 parameters for lmax=8. The contribution of amp2response in making this over-determined in the first place is by using (constraining to) ZSH, rather than fitting the full SH first. But of course combining the fit using data from more voxels at once will of course improve the conditioning quite a bit. The next question for a user in this case would however be: what next with that shiny lmax=8 response, but still those crappy 12 direction data to deconvolve from? But ok, if the response can be better, why not. 👍

Next on both of the constraints (non-negativity and monotonicity): this shares, in a certain way, a few aspects with a discussion we had elsewhere at some point, when we reasoned about Rician bias correction and whether we should allow to have negative intensities in DWI data after such a correction. Note we also expect the signal to be positive, but we wouldn't yet correct for that (by capping data at 0 or something) at that point, because it's not the individual data points that matter, but the data (and certain statistical properties of it) at that point. I see the response function in a similar way: the way we use it, it needs to best represent the response as a whole, but not perse in certain individual aspects (e.g. non-negativity) of it. The FOD, however, is different: we actually rely on certain properties beyond the point we estimated it, and we start using it further on: e.g., we're hoping to segment it in fixels (so we're not happy with negative lobes, or other AFD lost in spurious positive lobes), we hope to get the total AFD per fixel, we hope to use individual amplitudes e.g. for probabilistic tractography... all this to say, that at that point we're not 100% after the best fit to the data, but also (at least a bit) after certain aspects... constraints.

But good, apart from this: I'm not against non-negativity and monotonicity, and I agree it's what we expect to find. What I guess I tried to illustrate with that initial convoluted example is not an extreme case, but rather an idealised (and mathematically simplified; for insight) case, where conditioning was not the issue. The only remaining "issue" then is that other often forgotten constraint: the lmax. My example was then, in the end, just about Gibbs ringing. It's just a consequence of truncation at lmax, and that's ok, as in: didn't feel that should be attacked with non-negativity and monotonicity (especially the latter). If you do do that, its effect is spread elsewhere in a non-trivial manner.

But ok, to the real point I want to make in the end: as I mentioned, I'm not against these new constraints in and of themselves, I just worry about them in conjunction with lmax=8 (for the response function; I'm not talking about FODs here). Look at these amp2response outputs from 157 SF WM voxels in a well conditioned scenario (60 directions, b=3000):

lmax=8 , no-con.: 875.6595195160179 -597.2968728328442 330.3383402205054 -135.8888512503506 39.54075218534354
lmax=8 , constr.: 875.6816274598276 -597.616327209113  328.7675821903119 -139.79105954571   31.88699757320352
lmax=10, no-con.: 875.6714305934237 -597.2851965189321 330.3996770353336 -135.8403907072307 39.62438258337632 -8.130994138392607
lmax=10, constr.: 875.6714305045891 -597.2851964582004 330.3996770015593 -135.8403906932427 39.62438257923262 -8.130994137327583

Note the impact of the constraints (versus no constraints) in the lmax=8 scenario (it's not huge, but it's there; note we invert this thing at SD stage, and it becomes an aggressive sharpening operation). This is certainly not due to poor conditioning, it's just due to our good friend Gibbs: going to lmax=10 (note: only 1 extra parameter) fixes this almost entirely, as using the constraints or not in this lmax=10 case makes virtually no difference at all. Saying that you highly value the monotonicity property in your response for these data, becomes (almost?) equivalent to saying that lmax=8 was the real problem. Introducing the constraints, but sticking with lmax=8, doesn't make it more accurate; probably slightly less (even though a bit more precise probably), it just biases the response a bit so monotonicity is adhered to. In other words: you've hit a boundary anyway (lmax=8), and introducing other ones (monotonicity) is just making things qualitatively prettier (not more accurate!), but not fixing the real underlying problem that is the Gibbs effect, that is due to truncation, specifically at lmax=8.
I worry about this, because it constrains the response to be even a bit more smooth, when we are already clearly very, very close to the boundary of what smoothness we should impose. An even higher b-value, correction for the Rician bias, etc... will push further here. I'm not sure any extra imposed smoothness on the response even benefits the conditioning of the (truly challenging) CSD problem thereafter; as making the response smoother renders the CSD into a more ill posed sharpening operation. For low quality data, that may mean more spurious FOD peaks. Note some people out there already criticise us for the response not being "the sharpest one", hence limiting what our model can thereafter fit, and even going as far as to say we may be introducing crossings where there are none (you know, those fans of fanning, ehm... dispersion 😉).

So, what about this attempt at a constructive proposal: let's say we go for switching on the constraints by default, but the default lmax comes along and goes to 10? I'm definitely not asking for soft constraints on non-negativity and monotonicity, it's just about relaxing the right one just enough (so that would be the lmax). It's definitely more accurate, and the constraints (at b=3000) will then just serve to help conditioning. It's only one extra parameter (from 5 to 6 parameters). Let's say you've got those poor 12 direction data, and you've even only got 100 SF WM voxels (low resolution and lots of atrophy): that's still 1200 measurements to estimate 6 parameters. Conditioning should be good I'd say. 🙂 👍

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Dec 9, 2016

Small addendum, before people panic: I'm only suggesting a default lmax=10 for amp2response, not for the default lmax cap of dwi2fod; I'd leave the latter at 8 for computation time, memory (potentially) and disk space usage... at least by default. Going to lmax=10 or higher there should be opt-in for those reasons.

@bjeurissen

This comment has been minimized.

Member

bjeurissen commented Dec 9, 2016

I fully agree that estimating only the ZSHs and estimating them from all the response data at the same time is the sane way to go. In fact, I have been using this approach myself for many years in MATLAB and it has never let me down.

About the monotonicity constraint: I am not entirely sure about this. In my experience, fODFs contain a lot less spurious lobes when extracted from noise-biased data, if the response contains a similar bias. Otherwise, the biased signal will have to be explained by spurious fiber orientations. In fact, it is astonishing how much further down you can push SNR and still get reasonable results using the biased responses. Not sure what the effect of the monotonicity constraint will be on this.

@Lestropie

This comment has been minimized.

Member

Lestropie commented Dec 11, 2016

Let's say we go for switching on the constraints by default, but the default lmax comes along and goes to 10?

I suppose more fundamentally the question being asked is whether the results from Donald's NMR Biomed paper carry across here. (I also couldn't help noticing the little p<0.001 asterisk over the b=3000, l=10 term in Figure 3...)

@jdtournier Got any one-button scripts that could be re-run with amp2response?

I'm only suggesting a default lmax=10 for amp2response, not for the default lmax cap of dwi2fod; I'd leave the latter at 8 for computation time, memory (potentially) and disk space usage... at least by default.

Personally I wouldn't be against going to lmax=10 for dwi2fod as well; I tend to prioritise output quality over time, there's plenty of the latter. But we'd need to prove it worthwhile. Which may include a tech note if it's deemed warranted.

Alternatively, have amp2response perform a model selection for lmax when not specified manually (to account for different b-values), and have dwi2fod default to whatever the RF is defined at.

fODFs contain a lot less spurious lobes when extracted from noise-biased data, if the response contains a similar bias. Otherwise, the biased signal will have to be explained by spurious fiber orientations. In fact, it is astonishing how much further down you can push SNR and still get reasonable results using the biased responses. Not sure what the effect of the monotonicity constraint will be on this.

I suspect that although you will be able to see inaccuracies in single-voxel FODs if deconvolved with a noise-less response function, there will also be (less visually obvious) inaccuracies in crossing-fibre FODs if a noise-biased response function is used: The Rician bias combined with variation in crossing fibre complexity break the canonical response function assumption.

I'm not too concerned about this result translating directly to the monotonicity question:

  • It has a much more subtle effect in half-decent data than changing between noise-biased and non-noise-biased RF.

  • The only robust way to test the effect of the Rician bias within DWI data and/or RF would be to simulate DWI data with a simulated RF. If you're generating a manually-defined RF rather than estimating it from the data, there's no reason for it to not have monotonicity. The issue arises purely from the fact that the RF is being estimated from the data.

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Dec 12, 2016

Ok ok, let me clarify myself, because we're running into all sorts of confusions I'm afraid.

  • We all agree that fitting over all voxels' data is a good thing which has benefits in challenging situations where conditioning may otherwise be problematic. Good, so this is resolved and we're all for it.

  • Sorry for all the confusion by bringing the Rician bias into this story: it has nothing to do with it in and of itself. I used it once for an analogy, and once in the hypothetical context of "if we already had Rician bias correction...". At the moment, we don't have it, and that indeed means that single-fibre voxels where biased response = biased signal will do well, and crossings not (but it's hard to notice due to non-negativity constraints in CSD) because mixture of biased responses != (less biased) signal due to mixture of signals. I threw it in the discussion just because: if we would have Rician bias correction on the raw signal as a preprocessing step, then the non-negativity constraint to estimate responses probably has a higher chance of coming into play therafter... and the original thing I brought up is me being worried about that, or any, constraint coming into play when we estimate responses at lmax=8. Apart from that, let's not confuse this discussion with a discussion on Rician bias effects... sorry if I partially led it to appearing like that... Ok, so back to un-Rician-related business.

  • I agree the constraints make sense, and so they may help condition the problem if need be. That's a positive effect. My worry is that they also attack Gibbs effects in the response, which they "shouldn't", as in: the Gibbs ringing is not an undesirable effect per se if the lmax itself is high enough to respresent the response decently (and lmax=8 being ok here stands as per @jdtournier's results from the 500 direction dataset). If they do attack the Gibbs ringing specifically, they bias the response function, hence making it less accurate. That's a con. So we're facing a pro and a con here in the question of whether they should be on by default. My first response was to avoid the con at all costs, so to argue against the constraints for that reason only. The lmax=10 proposal for the response specifically is an attempt to get rid of the con without getting rid of the pro: the result I presented, shows that in that case, conditioning or the data quality itself was not the problem, as at lmax=10 the constraints are adhered to without even enforcing them. Hence, it also shows that at lmax=8, the constraints are attacking the Gibbs effect specifically, so they do bias the response, making it less accurate. Lmax=10 practically resolves this for a b=3000 case, at the cost of only introducing one extra degree of freedom (5 to 6). A challenging case in terms of conditioning still fully benefits of both constraints as well as the all-voxels-combined benefit. More than safe enough to still get an awesome response at lmax=10. So my point is: a model selection is realistically speaking not needed; we could safely go with lmax=10 and benefit without any real risks (I reckon).

  • That was response function selection, then we got CSD. I'm often a bit surprised how these both get so mixed up/together: they're really very separate things, very separate problems with their own properties... The response selection is, taking all of those voxels together and only needing ZSH coefficients, a very very well conditioned problem. (C)SD is still a tricky beast. Let's say you've got about 45 directions, or just a little bit more. That sounds ok, because you need 45 parameters. But then there's two things that make it not so simple: 1) SD is a highly ill posed sharpening operation, this suggest you may like some more directions (or lower lmax); 2) you've got non-negativity to rely upon, and many/most FODs are very sparse, so you may get away with less directions (or can go for higher lmax), essentially super-resolved CSD. It's very hard to see how these two may exactly counter-balance, and it's actually going to be different for different voxels. And then there's uncertainty in directions, and then we even expect some dispersion beyond the response maybe as well. So, who will tell what's "right"? We end up having been quite happy with lmax=8 so far for the FOD. But this is (or should be) quite unrelated to the lmax findings for the response function. I've been able to super-CSD up to lmax=16 for 60 direction data. Results sure look nice and sharp (and having so many parameters, of course they fit the data better); but who will tell that it's not over-sharpening and turning some dispersion into false positive crossings...? All this just to say that it's not so trivial to say what's better here... lmax=8 does have the advantage of having served us well so far, and the filesize and processing time is good. There may be scenarios where we want to go beyond on purpose (and yes, we've got the time, space, knowledge to assess the results, ...), but your average user may not need/want to per se... definitely not if we want to serve the most clinical users in the easiest way, and of course allow more advanced scenarios for users like us, who used it consciously. We could even have a config file option to set the absolute lmax cap. But I'm still all for lmax=8 as the default here.

  • I know this may be strange in combination with lmax=10 for the response. It certainly would have been without response constraints, as due to orthogonality of the SH basis, that added independent l=10 ZSH coefficient is indeed ignored if you CSD to get an FOD up to lmax=8. I should clarify that, essentially, I'm still perfectly fine with lmax=8 all across the board. The introduction of the extra coefficient when estimating the response, is simply not to bias the other 5 coefficients due to the Gibbs effect and the constraints interacting... because the constraints do render the coefficients' values interdependent. I hope this explanation somehow makes that intention clear... 🙂 Of course, it will also further benefit a more accurate description of the response function nonetheless.

  • Side remark: with multi-shell data, it becomes even more clear how both response and FOD lmaxes are not per se to be reasoned about in the same manner: response lmaxes are still per shell, but the FOD(s) estimation can benefit from all directions accross all shells.

  • Othe side remark and completely independent idea: we could actually also introduce a monotonicity constraint accross shells for the response estimation (i.e. assume decay of signal).

@jdtournier

This comment has been minimized.

Member

jdtournier commented Dec 12, 2016

OK, finished ISMRM abstracts, I'm now allowed to think about this for 5 minutes...

First off: @thijsdhollander's analysis is really compelling. I wouldn't have thought we'd be able to detect the lmax=10 coefficient with a standard 60 DW direction data set, but it certainly looks like we can. Or rather, not including it does seem to affect the results (not sure this would survive an F-test if we were to test for it explicitly). But that's not the point: we can see that it affects things, and there's a simple fix which is to increase the lmax of the response function estimation, and as @thijsdhollander says, we can do that whilst remaining very well-conditioned using @Lestropie's fit-everything-together approach. Seems like a no-brainer, more than happy to switch to lmax=10 by default for the response function estimation.

So I agree with pretty much all of @thijsdhollander's comments in his last post. This includes sticking to lmax=8 for the CSD itself. The analysis in the NMR Biomed paper was all about whether we have any realistic power to detect each harmonic band at the single voxel level, and I don't think anything's changed on that front. Sure, we can estimate the response to lmax=10, but that's because we're aggregating data over hundreds of voxels - this doesn't apply to the CSD analysis itself, and what's more the non-negativity constraint will dwarf any information that might have been contributed by the lmax=10 band.

So on that note, one bit where I mildly disagree is the suggestion that things might be badly affected if we don't get these high harmonic response terms right. That would be true for the linear unconstrained version of SD, since these harmonic terms are small, and once inverted for the deconvolution result in massive amplification of the (noisy) high frequency terms. But once the non-negativity constraint is imposed, I really don't think it makes a great deal of difference how well these high harmonics terms are characterised in the response: the non-negativity constraint totally dominates these terms in practice. You could set all the response function terms above lmax=4 to zero and still get half-decent fODFs....

Also, one short remark about the Rician bias issue: I agree it's not central to the current argument since we don't currently have methods to deal with it. But: even if we did, the response is still supposed to represent the ideal single-fibre DWI signal, and is therefore supposed to be positive-definite. The fact that a decent Rician bias correction scheme would introduce negative values in the measurements shouldn't affect the validity of the non-negativity constraint imposed on the response, and it should only make things better in that the result will at least match expectations. What matters is that when applied in the actual per-voxel deconvolution, the DW signals are still allowed to be negative if that's what the Rician bias correction approach produced, so that if the DW signals happen to average to zero, the fODF will too. Or to put this another way, if the estimated response was negative anywhere, I would pretty much invariably conclude that something has gone wrong, either due to really bad data, or just the particular noise realisation on that day - either way, I think if we can make it positive definite, there is value in doing so (provided that doesn't interfere with the Gibbs ringing issue, in which case there may be an argument for relaxing it...).

And to address @bjeurissen's point about the potential for non-monotonicity in the presence of significant Rician bias: I think even in these cases the signal should be strictly monotonic. Sure, it'll level off as it approaches the axial direction where the Rician bias dominates, but at no point should it actually increase. Of from a different point of view: the relationship between true and Rician-biased signal is monotonic (even if it does degenerate near zero), so the Rician bias shouldn't break our monotonicity assumption.

  • Othe side remark and completely independent idea: we could actually also introduce a monotonicity constraint accross shells for the response estimation (i.e. assume decay of signal).

Good idea, all for it.

@Lestropie

This comment has been minimized.

Member

Lestropie commented Dec 12, 2016

Reason I suggested a model selection is that while lmax=10 looks good, that specifically applies to b=3000. Hard-coding that as the default, and having 12-direction b=700 data estimating lmax=10 by default, may come off as strange.

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Dec 13, 2016

So on that note, one bit where I mildly disagree is the suggestion that things might be badly affected if we don't get these high harmonic response terms right. That would be true for the linear unconstrained version of SD, since these harmonic terms are small, and once inverted for the deconvolution result in massive amplification of the (noisy) high frequency terms. But once the non-negativity constraint is imposed, I really don't think it makes a great deal of difference how well these high harmonics terms are characterised in the response: the non-negativity constraint totally dominates these terms in practice. You could set all the response function terms above lmax=4 to zero and still get half-decent fODFs....

Yep, that's certainly true: the CSD result using either of those responses I obtained above is pretty much the same. The constrained lmax=8 response that I consider "biased" is slightly lower frequency (i.e. less high frequency), and directly comparing FODs by flicking back and forth shows that they are every so slightly sharper if using this response (which I'd also consider a bias). It's more because going to lmax=10 for the response is a safe no-brainer anyway on literally all fronts, that I'm proposing it. Towards future scenarios (whatever that means: higher b-values, etc...) it also gives the responses the extra flexibility they might need at some point; even though at this point the effect is subtle up to some extent.

Also, one short remark about the Rician bias issue: ............... I think if we can make it positive definite, there is value in doing so (provided that doesn't interfere with the Gibbs ringing issue, in which case there may be an argument for relaxing it...).

Fully agree that non-negativity is again in and of itself the way to go. My only concern here was again just the interaction with the Gibbs effect; and I reason that this interaction is more likely to happen if the Rician bias has been corrected. If the signal becomes essentially zero over part of the orientational domain, then a typical SH overshoot (well, undershoot here) of the signal would probably happen where it becomes zero (this may depend on how smooth the signal is in that area compared to what the ZSH at a given lmax can represent of course). But my preferred solution would again be the same: relax on the front of the lmax first. It's easier to do, and more intuitive than a soft non-negativity constraint (which then comes inherently with a parameter of its own). There's always still a Gibbs effect due to any lmax cut-off, but the amount of it relative to the response's profile may for a high enough lmax just become compatible with both non-negativity and monotonicity constraints. It's essentially about turning the signal in that orientation region of the response into one of these:

wweeee

Reason I suggested a model selection is that while lmax=10 looks good, that specifically applies to b=3000. Hard-coding that as the default, and having 12-direction b=700 data estimating lmax=10 by default, may come off as strange.

It's still perfectly safe though, given at least a few voxels of data of course. It's not strange if explained well, and actually highlights a major benefit of the all-at-once estimation strategy. But I think that the kind of users who wouldn't per se understand, are probably not likely to even look at the contents of their response function files. And it's good for them that we give them the most accurate response function by default, while it doesn't come at any significant computational cost and sacrifices close to no conditioning.

I've started a branch integrate_amp2response to make this eventually happen: I've currently upped the lmax to 10 in amp2response in that branch. I've also added a simple -isotropic option to estimate isotropic responses, so that doesn't have to be done with lmax 0,0,0,... for multi-shell data. The nice thing is that, apart from very particularly customised scenarios, the average user should only end up having to worry about the anisotropic vs isotropic choice for any given response function (and the relevant dwi2response algorithms already make this choice for them too).

@thijsdhollander

This comment has been minimized.

Member

thijsdhollander commented Dec 13, 2016

(continues in #862 to look into these changes)

@thijsdhollander thijsdhollander deleted the ZSH branch Dec 13, 2016

@jdtournier

This comment has been minimized.

Member

jdtournier commented Dec 13, 2016

OK, glad to see we're coming to some kind of consensus.

Reason I suggested a model selection is that while lmax=10 looks good, that specifically applies to b=3000. Hard-coding that as the default, and having 12-direction b=700 data estimating lmax=10 by default, may come off as strange.

Like @thijsdhollander says, it looks like it would still be adequately conditioned even with much lower quality data, provided we have a few voxels with a sufficient spread of dominant orientations. If we were doing this without the monotonicity and non-negativity constraints, I'd be very wary of introducing more variability into the response by allowing the poorly-determined high frequency terms to contaminate the results, but with the constraints, I don't see that as a problem. I have to say it makes total sense to estimate the response all the way up to the limit of its potential harmonic content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment