Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
ZSH & amp2response #786
Thought I'd propose to include this as part of the upcoming tag, since it needs to go with a tag if it does gets pushed.
Edit: Could also make it deal with multi-shell data internally rather than having to run it in a loop.
Ok, there needs to be some input here; so I'm going to do an effort of sorts... I've had a good and long think about this; these are my ideas on this at the moment, as in: this is where I am with that thinking up to now. There's essentially 3 main components about the method implemented in
@jdtournier , @bjeurissen : I think we really also need some input from you guys on this one though (even just to check whether I'm making sense with everything I posted here). Ultimately, when people use these things in an SD context, this will affect the quality of the results they get and report, and this may influence the public image of both CSD and MSMT-CSD as a consequence. It's in everyone's interest that the (MSMT-)CSD results are of the best quality possible.
I don't think the mean of the results of independent least-squares fits on independent subsets of over-determined data is precisely equivalent to the least-squares fit to all data. As an extreme example: Imagine that you take your data, bisect it straight down the middle of your domain, and do a least-squares fit to each half independently; the mean of those two results will almost certainly not be equivalent to the least-squares solution that takes all data into account. And I would think that the difference would be above and beyond that of errors introduced by matrix solves or the like. If anybody knows the theory behind it though, speak up. As I always say, my math is junk.
I'm honestly having a lot of trouble getting anywhere much with this example. I get the point you're making, but in all the counter arguments I've come up with none of them are definitive enough for my liking. I'll give one a try and we'll see how we go... Trying my best to limit the topic purely to the application v.s. non-application of constraints, but it's easy for the lines to blur with other factors.
Even if the 'true' response function is a true spherical delta function, and you want to extract a truncated SH delta from response function estimation, it still has to be estimated from single-fibre voxels and an A2ZSH transform. The only way you could possibly achieve this result is:
So the question then becomes: In the presence of such confounds, combining signals from multiple single-fibre voxels for RF estimation, will what you get out be closer to the desired SH delta (and therefore be better for deconvolution) if you don't apply the constraints than if you do? Personally I think no: I don't think you're going to see any negativity / non-monotonicity arising due to the actual underlying SH delta you're seeking, I think if it's there it's going to be from the fitting procedure and/or the confounds.
With a realistic diffusion MRI response function, the RF features that the constraints are influencing are physically non-realistic. Moreover, the worse the data, the more prominent these undesirable features are, and applying the constraint brings the result closer to what we get from better-quality data (with or without constraints). If the constraints were degrading response function estimation rather than improving it, I'd expect to see the converse.
So even though omitting the constraints may allow you to get closer to the ideal response function in your extreme example, I'm just not convinced that that result extends to real use cases. But like I said, I can't fully pin it down.
My suspicion is that mathematically it has to do with extending the argument from a real spherical delta to a truncated SH delta, which implies sparse sampling, but then not knowing the single fibre voxel fibre orientations a priori, so you can't align one of your image samples with the delta and have every other direction yield a value of zero even in a noiseless case. Which means you'll never be able to construct that delta function in the space of image-intensity-as-a-function-of-angle-from-fibre-direction. It's like an intrinsic limitation being imposed by the fact that the RF model is coming from the same image data you're applying SD to, rather than the limitation being caused by whether or not constraints are applied during the RF model fitting.
Plus, the effect on good quality data is very small (see below).
If, once we have such a mechanism, it turns out that a hard non-negativity constraint is too stringent for such data, we try regularisation instead. But you then have to justify to users why allowing the response function to be negative is preferable to not, when we have the capability of applying that constraint, and also after we've been selling hard-constraint CSD as being preferable to soft-constraint. Yes it's a burden of proof fallacy; but in the absence of a meaningful test, personally I'd trust the constrained result that conforms to realistic physical expectations, rather than deliberately retaining implausible features in case it might be preferable for SD.
Here's the RFs on the 'typical' dataset I tried using
The difference with/without constraint is not much more than the difference between
Where the constraints make a larger difference is in cases where not having such constraints produces results in poorer-quality data that are inconsistent with what we see in better-quality data; in which case it would seem unusual to me to not fix such issues, because it's highly unlikely that retaining those features would be more 'accurate'.
Guess it's about time I chipped into this discussion... So in no particular order:
Depends how it's done... Let's take the voxel-wise least-squares fit:
where Z2A is a voxel-wise projection from zonal SH to DW signal accounting for the different main fibre direction in each voxel. Solving per voxel would give:
where A2Z = Z2A-1. If we simply extend this to solve over all voxels by forming a much wider A2Zall by concatening all the A2Zi matrices, and a much longer vector sall by concatening all the si vectors:
then this really would be the same as averaging the per-voxel results...
However, here we have the opportunity to solve the problem across all voxels in one go:
where Z2Aall is now a much taller matrix formed by concatenation of the individual Z2Ai matrices. We can solve this using standard solvers (I'd typically go for a Moore-Penrose pseudo-inverse with Cholesky decomposition since I'd expect the problem to be well-conditioned, but there are more stable solvers out there).
But the main point here really is that condition number of the Z2Aall matrix will be lower than that of its constituent Z2Ai, so we should have a much more stable fit. So for example, if you have a poorly-conditioned DW scheme, some of the Z2Ai matrices will be very poorly conditioned, just because the fibre direction lines up with a particular bad combination of DW gradient directions. In these cases, the individual per-voxel fits will introduce a massive amount of noise. Averaging will help reduce some of that, but it will nonetheless percolate through to the output. On the other hand, if this matrix is simply concatenated with the rest, other matrices will contribute complementary information, so that those directions in the solution space that were ill-conditioned when considering one of these matrices alone are now well-conditioned given the information from other matrices, and vice-versa.
To illustrate, consider matrices A = [ 1 0; 0 1e-3 ] and B = [ 1e-3 0; 0 1 ]. We want to estimate x where:
If you solve this as ½ ( A-1 a + B-1 b ), then any noise in a2 or b1 will be massively amplified and make the result worthless, since A-1 = [ 1 0; 0 1000 ] and similarly for B-1. On the other hand, solving both equations in one:
will provide a stable solution. In fact, in this case, we can just add the two equations to see this:
will be stable since (A+B) = [ 1.001 0; 0 1.0001 ], which is basically the identity...
All this to say, it's worth doing this in one go if we can, it'll help for poorly-conditioned situations, and furthermore makes it applicable in situations that would otherwise be under-determined - you could for instance use this to get a full lmax = 8 response from 12 direction data, provided you had enough fibre orientations in the single-fibre mask.
About the scepticism regarding the monotonicity and non-negativity constraints:
I think I understand the point that's being made, although I have to admit I'm not sure. It sounds like the issue relates to the fact that we have a truncated basis, which would naturally lead to oscillations and negativity even in the pure noiseless case. For instance, the delta function definitely will suffer from both of these due to Gibbs ringing - hence the apodisation. So any attempt at constraining these aspects will necessarily lead to deviations away from the perfect response, if it would otherwise have exhibited either of these traits.
So I think that's a fair argument - one that I have used in the past to defend the use of a soft constraint on the FOD: it allows some negativity in the FOD, which you would expect due to Gibbs ringing and the very sharp features in the FOD. And yet you guys have elected to use a hard constraint...
Which is where I agree with Rob: there's little point in having a more accurate reconstruction if its precision is woeful. You're much better off allowing some mild bias in the output if it brings precision down to the level where the results can actually be used meaningfully. And it seems these constraints only really make a difference when the estimation is poorly conditioned to begin with, which is exactly what we want. So that's point 1.
The next point is that whereas the FOD is expected to have sharp features, the response is expected to be relatively smooth. You'd really have to crank up the b-value to get any reasonable amount of high frequency content, and even then you'll probably hit the limit of the inherent orientation dispersion at some point. So this would argue that we should expect Gibbs ringing to be much less of a problem, and the truncated SH basis to actually provide a really good fit for a reasonable value of lmax. If this is true, we don't need to worry about whether the constraints will introduce a bias in the results due to their interaction with the SH representation (in contrast to the FOD case, where I think we really do need to worry).
Finally, these constraints simply express what we expect to see from the physics of the problem, and are I think fully justified. The response is at heart just the DW signal, so must be positive-definite - unless we're talking about porous media or something... This is true whether or not Rician bias correction has been applied. It's also the DW signal for a single, coherently-oriented (though typically dispersed) fibre bundle, and there are no situations that I can think of where you could reasonably expect non-monotonic behaviour. Every other model in the field will predict a monotonic decrease in the signal from radial to axial, and I think we can safely assume that too.
All of this of course applies to dMRI in tissue (brain or muscle), things might be different in different contexts. But given that this is our target audience, I think we're actually fully justified to turn on these constraints by default. If users do genuinely want to investigate the details of the response function, they would have the option to disable these features - but the vast majority of users will just want the best response they can get from their data without really having to worry about the details of how it was derived. Here, I think the constraints are justifiable from the physics of the problem, shouldn't interfere with the estimation process due to the limitations of the SH basis (or any other such issue), and should allow trouble-free processing of datasets that would currently prove very problematic, namely those at the low end of the quality spectrum, i.e. the clinical realm...
About that last comment, it's precisely because the users that would benefit the most from this option would be the more clinically oriented researchers that I think we should enable the constraints by default - they're the least likely to understand why their results look terrible, or to figure out that there is a solution for their data analysis...
As usual, just my 2 cents, feel free to disagree...
(much rambling about not so much; good -maybe fun- if you've got lots of free time, otherwise skip to the paragraph that starts in bold text
Uh-oh, all these efforts to explain... I should have mentioned some things I figured out earlier...
First things first, on the topic of how the data are combined: yep, I figured and fully admit I made a mistake in reasoning there (a while ago...) via a similar example as the one @Lestropie mentioned. I fully agree that it's not formally the same. In practice, the difference will only significantly show in case of poor conditioning, or under-determined cases (as evidenced by Rob's results for
Small (really not that important) comment on this though:
Note that even that is not under-determined (but yes, quite poorly conditioned): we only need to estimate 5 parameters for lmax=8. The contribution of
Next on both of the constraints (non-negativity and monotonicity): this shares, in a certain way, a few aspects with a discussion we had elsewhere at some point, when we reasoned about Rician bias correction and whether we should allow to have negative intensities in DWI data after such a correction. Note we also expect the signal to be positive, but we wouldn't yet correct for that (by capping data at 0 or something) at that point, because it's not the individual data points that matter, but the data (and certain statistical properties of it) at that point. I see the response function in a similar way: the way we use it, it needs to best represent the response as a whole, but not perse in certain individual aspects (e.g. non-negativity) of it. The FOD, however, is different: we actually rely on certain properties beyond the point we estimated it, and we start using it further on: e.g., we're hoping to segment it in fixels (so we're not happy with negative lobes, or other AFD lost in spurious positive lobes), we hope to get the total AFD per fixel, we hope to use individual amplitudes e.g. for probabilistic tractography... all this to say, that at that point we're not 100% after the best fit to the data, but also (at least a bit) after certain aspects... constraints.
But good, apart from this: I'm not against non-negativity and monotonicity, and I agree it's what we expect to find. What I guess I tried to illustrate with that initial convoluted example is not an extreme case, but rather an idealised (and mathematically simplified; for insight) case, where conditioning was not the issue. The only remaining "issue" then is that other often forgotten constraint: the lmax. My example was then, in the end, just about Gibbs ringing. It's just a consequence of truncation at lmax, and that's ok, as in: didn't feel that should be attacked with non-negativity and monotonicity (especially the latter). If you do do that, its effect is spread elsewhere in a non-trivial manner.
But ok, to the real point I want to make in the end: as I mentioned, I'm not against these new constraints in and of themselves, I just worry about them in conjunction with lmax=8 (for the response function; I'm not talking about FODs here). Look at these
Note the impact of the constraints (versus no constraints) in the lmax=8 scenario (it's not huge, but it's there; note we invert this thing at SD stage, and it becomes an aggressive sharpening operation). This is certainly not due to poor conditioning, it's just due to our good friend Gibbs: going to lmax=10 (note: only 1 extra parameter) fixes this almost entirely, as using the constraints or not in this lmax=10 case makes virtually no difference at all. Saying that you highly value the monotonicity property in your response for these data, becomes (almost?) equivalent to saying that lmax=8 was the real problem. Introducing the constraints, but sticking with lmax=8, doesn't make it more accurate; probably slightly less (even though a bit more precise probably), it just biases the response a bit so monotonicity is adhered to. In other words: you've hit a boundary anyway (lmax=8), and introducing other ones (monotonicity) is just making things qualitatively prettier (not more accurate!), but not fixing the real underlying problem that is the Gibbs effect, that is due to truncation, specifically at lmax=8.
So, what about this attempt at a constructive proposal: let's say we go for switching on the constraints by default, but the default lmax comes along and goes to 10? I'm definitely not asking for soft constraints on non-negativity and monotonicity, it's just about relaxing the right one just enough (so that would be the lmax). It's definitely more accurate, and the constraints (at b=3000) will then just serve to help conditioning. It's only one extra parameter (from 5 to 6 parameters). Let's say you've got those poor 12 direction data, and you've even only got 100 SF WM voxels (low resolution and lots of atrophy): that's still 1200 measurements to estimate 6 parameters. Conditioning should be good I'd say.
Small addendum, before people panic: I'm only suggesting a default lmax=10 for
I fully agree that estimating only the ZSHs and estimating them from all the response data at the same time is the sane way to go. In fact, I have been using this approach myself for many years in MATLAB and it has never let me down.
About the monotonicity constraint: I am not entirely sure about this. In my experience, fODFs contain a lot less spurious lobes when extracted from noise-biased data, if the response contains a similar bias. Otherwise, the biased signal will have to be explained by spurious fiber orientations. In fact, it is astonishing how much further down you can push SNR and still get reasonable results using the biased responses. Not sure what the effect of the monotonicity constraint will be on this.
I suppose more fundamentally the question being asked is whether the results from Donald's NMR Biomed paper carry across here. (I also couldn't help noticing the little p<0.001 asterisk over the b=3000, l=10 term in Figure 3...)
@jdtournier Got any one-button scripts that could be re-run with
Personally I wouldn't be against going to lmax=10 for
I suspect that although you will be able to see inaccuracies in single-voxel FODs if deconvolved with a noise-less response function, there will also be (less visually obvious) inaccuracies in crossing-fibre FODs if a noise-biased response function is used: The Rician bias combined with variation in crossing fibre complexity break the canonical response function assumption.
I'm not too concerned about this result translating directly to the monotonicity question:
Ok ok, let me clarify myself, because we're running into all sorts of confusions I'm afraid.
OK, finished ISMRM abstracts, I'm now allowed to think about this for 5 minutes...
First off: @thijsdhollander's analysis is really compelling. I wouldn't have thought we'd be able to detect the lmax=10 coefficient with a standard 60 DW direction data set, but it certainly looks like we can. Or rather, not including it does seem to affect the results (not sure this would survive an F-test if we were to test for it explicitly). But that's not the point: we can see that it affects things, and there's a simple fix which is to increase the lmax of the response function estimation, and as @thijsdhollander says, we can do that whilst remaining very well-conditioned using @Lestropie's fit-everything-together approach. Seems like a no-brainer, more than happy to switch to lmax=10 by default for the response function estimation.
So I agree with pretty much all of @thijsdhollander's comments in his last post. This includes sticking to lmax=8 for the CSD itself. The analysis in the NMR Biomed paper was all about whether we have any realistic power to detect each harmonic band at the single voxel level, and I don't think anything's changed on that front. Sure, we can estimate the response to lmax=10, but that's because we're aggregating data over hundreds of voxels - this doesn't apply to the CSD analysis itself, and what's more the non-negativity constraint will dwarf any information that might have been contributed by the lmax=10 band.
So on that note, one bit where I mildly disagree is the suggestion that things might be badly affected if we don't get these high harmonic response terms right. That would be true for the linear unconstrained version of SD, since these harmonic terms are small, and once inverted for the deconvolution result in massive amplification of the (noisy) high frequency terms. But once the non-negativity constraint is imposed, I really don't think it makes a great deal of difference how well these high harmonics terms are characterised in the response: the non-negativity constraint totally dominates these terms in practice. You could set all the response function terms above lmax=4 to zero and still get half-decent fODFs....
Also, one short remark about the Rician bias issue: I agree it's not central to the current argument since we don't currently have methods to deal with it. But: even if we did, the response is still supposed to represent the ideal single-fibre DWI signal, and is therefore supposed to be positive-definite. The fact that a decent Rician bias correction scheme would introduce negative values in the measurements shouldn't affect the validity of the non-negativity constraint imposed on the response, and it should only make things better in that the result will at least match expectations. What matters is that when applied in the actual per-voxel deconvolution, the DW signals are still allowed to be negative if that's what the Rician bias correction approach produced, so that if the DW signals happen to average to zero, the fODF will too. Or to put this another way, if the estimated response was negative anywhere, I would pretty much invariably conclude that something has gone wrong, either due to really bad data, or just the particular noise realisation on that day - either way, I think if we can make it positive definite, there is value in doing so (provided that doesn't interfere with the Gibbs ringing issue, in which case there may be an argument for relaxing it...).
And to address @bjeurissen's point about the potential for non-monotonicity in the presence of significant Rician bias: I think even in these cases the signal should be strictly monotonic. Sure, it'll level off as it approaches the axial direction where the Rician bias dominates, but at no point should it actually increase. Of from a different point of view: the relationship between true and Rician-biased signal is monotonic (even if it does degenerate near zero), so the Rician bias shouldn't break our monotonicity assumption.
Good idea, all for it.
Yep, that's certainly true: the CSD result using either of those responses I obtained above is pretty much the same. The constrained lmax=8 response that I consider "biased" is slightly lower frequency (i.e. less high frequency), and directly comparing FODs by flicking back and forth shows that they are every so slightly sharper if using this response (which I'd also consider a bias). It's more because going to lmax=10 for the response is a safe no-brainer anyway on literally all fronts, that I'm proposing it. Towards future scenarios (whatever that means: higher b-values, etc...) it also gives the responses the extra flexibility they might need at some point; even though at this point the effect is subtle up to some extent.
Fully agree that non-negativity is again in and of itself the way to go. My only concern here was again just the interaction with the Gibbs effect; and I reason that this interaction is more likely to happen if the Rician bias has been corrected. If the signal becomes essentially zero over part of the orientational domain, then a typical SH overshoot (well, undershoot here) of the signal would probably happen where it becomes zero (this may depend on how smooth the signal is in that area compared to what the ZSH at a given lmax can represent of course). But my preferred solution would again be the same: relax on the front of the lmax first. It's easier to do, and more intuitive than a soft non-negativity constraint (which then comes inherently with a parameter of its own). There's always still a Gibbs effect due to any lmax cut-off, but the amount of it relative to the response's profile may for a high enough lmax just become compatible with both non-negativity and monotonicity constraints. It's essentially about turning the signal in that orientation region of the response into one of these:
It's still perfectly safe though, given at least a few voxels of data of course. It's not strange if explained well, and actually highlights a major benefit of the all-at-once estimation strategy. But I think that the kind of users who wouldn't per se understand, are probably not likely to even look at the contents of their response function files. And it's good for them that we give them the most accurate response function by default, while it doesn't come at any significant computational cost and sacrifices close to no conditioning.
I've started a branch
referenced this pull request
Dec 13, 2016
OK, glad to see we're coming to some kind of consensus.
Like @thijsdhollander says, it looks like it would still be adequately conditioned even with much lower quality data, provided we have a few voxels with a sufficient spread of dominant orientations. If we were doing this without the monotonicity and non-negativity constraints, I'd be very wary of introducing more variability into the response by allowing the poorly-determined high frequency terms to contaminate the results, but with the constraints, I don't see that as a problem. I have to say it makes total sense to estimate the response all the way up to the limit of its potential harmonic content.