New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSMT CSD: Flexible weighting #737

Open
Lestropie opened this Issue Jul 25, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@Lestropie
Copy link
Member

Lestropie commented Jul 25, 2016

Unity weighting of all input volumes may not be ideal in all circumstances. For instance with HCP data there's an equal number of volumes for the three non-zero shells; so compared to other acquisitions with more volumes in higher shells, it puts greater emphasis on the lower b-values. Would be good to have further command-line options for volume weighting, e.g.:

  • Weight all volumes equally
  • Weight all shells equally (regardless of number of volumes in each shell)
  • Weight each shell according to a vector of floats provided by the user
  • Additionally scale the contribution of each shell according to median image intensity in that shell (so the lower intensities of high _b_value shells don't decrease the influence of those shells in the least-squares fit)

I was also thinking about the FOD integral calculation improvement in #699: Technically if there's any inhomogeneity in the distribution of directions within a shell, that same approach could be used to weight the contribution of each volume to the least-squares fit in MSMT. Might be a minor effect, but might explain a particular effect I was seeing in my own shenanigans.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 27, 2016

I'm not sold on this necessarily. All our models assume i.i.d. Gaussian (or maybe Rician) noise, which strongly suggests that the correct answer is always equal weighting (ignoring outliers). I know that this means an acquisition with more lower b-value measurements will preferentially weight the reconstruction towards these data, but it's still the correct answer in terms of fitting to the measurements. If we start weighting the reconstruction towards the high (noisier) shells, it in effect means that we think the noise level in the higher shell is lower than it actually is - i.e. we will start to fit to the noise in the higher shells even if that means a worse fit for the lower shells. I think if this is a real issue, the answer really should be to use a more balanced acquisition...

I'm also not convinced that #699 should be applied to the CSD (whether single or multi-shell). The FOD integral requires an under-determined minimum-norm least-squares fit to the SH coefficients, which relies on the forward SH transform. CSD on the other hand requires the inverse to be computed, and this process should already weight things appropriately (even if the fit might be unstable with poor direction sets), again on the assumption of i.i.d. noise in the measurements. Maybe if you explain what effect you're referring to...?

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jul 27, 2016

Well, the Rician noise issue is something we really need to raise and discuss in more detail within the dev group: we need a clear picture of how we're going to move forward on that. Ended up abandoning my second workshop abstract, won't be able to answer the question I was trying to address until that's dealt with.

I don't have a theoretical argument for it, but I would have thought that even if the noise were i.i.d. Gaussian, there'd be a justification for re-weighting if the contrast-to-noise were different between shells? I'm also not sure that the fact we're using an inappropriate noise model is justification to constrain ourselves to the weighting imposed by that noise model: if some heuristic weighting can slightly offset the problems introduced by use of that noise model, why not? Sure it's ad hoc, but there's little that isn't in our field...

The FOD integral requires an under-determined minimum-norm least-squares fit to the SH coefficients, which relies on the forward SH transform.

I deliberately generate a sufficiently large transform to make deriving the FOD integral adjustment weightings over-determined. Not sure if this is what you were referring to though.

CSD on the other hand requires the inverse to be computed, and this process should already weight things appropriately (even if the fit might be unstable with poor direction sets), again on the assumption of i.i.d. noise in the measurements.

I probably didn't think enough about this one at the time before adding. Given the use of a hard constraint, even inhomogeneity of the non-negativity directions can't be 'weighted'. So it's about the A2SH transform. The fundamental question is: in a case of inhomogeneous gradient direction distribution (since that's the easiest to conceptualise), should the squared error contributed from a direction with lots of other measurements nearby on the half-sphere contribute more or less than the squared error from a direction that's very distant from all other measurements?

Maybe if you explain what effect you're referring to...?

I'm simulating a fanning fibre configuration, noiseless, and getting an FOD that's not symmetric about the mean direction. There's a lot more testing I need to do to figure out precisely what's causing it (e.g. might go away with higher density non-negativity constraint), but that project's gone back on the backburner for now.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 27, 2016

I don't have a theoretical argument for it, but I would have thought that even if the noise were i.i.d. Gaussian, there'd be a justification for re-weighting if the contrast-to-noise were different between shells?

Well, even if the contrast to noise was different, it doesn't change the fact that the model is supposed to fit the data with uniform residuals, if we assume uniform noise in the measurements. Those low contrast to noise measurements should simply provide very little additional information for those features that rely on the contrast.

Where this breaks down is when the numbers are so imbalanced that even a minor change in the sum of squares residuals for the low contrast parts dwarfs any meaningful information from the lower SNR, higher contrast measurements. There may be a way to compensate for that, but it would inevitably result in a systematic increase in the residuals for the down-weighted measurements, which strikes me as undesirable. Maybe I need to think about this a bit more...

I'm also not sure that the fact we're using an inappropriate noise model is justification to constrain ourselves to the weighting imposed by that noise model

I'm not sure I'm following the logic here: most of the time, the noise model is appropriate - at least in terms of the parameter estimation. The Rician bias would introduce a bias in the estimated parameters, but wouldn't necessarily influence the validity of the fit itself, in that the residuals from that fit would nonetheless be expected to remain pretty uniform. I don't mind introducing an ad hoc correction for such issues, I'm just not convinced that the issue has anything to do with the noise model as such.

It's like fitting a straight line to three sets of measurements, taken at three distinct positions along the independent variable. If you have 10 measurements at x=0, 10 at x=1 and only one at x=2, the fit is unlikely to go through the unique point at x=2. But there's little doubt that the best fit line should include all of these measurements with equal weighting. Sure, that makes the prediction of the x=2 measurement seem unduly influenced by the 20 other measurements at lower x values, but if that results in a poor estimation of the best fit line, and hence higher residuals at x=2, then the question should really be whether a straight line was the right model...

But like I said, maybe there is something in there that I'm not quite grasping properly, I'll need to think about this some more to form something a little bit considered than an immediate knee-jerk reaction.

I deliberately generate a sufficiently large transform to make deriving the FOD integral adjustment weightings over-determined. Not sure if this is what you were referring to though.

No, it wasn't quite that. What I'm saying is that your FOD integral involves estimating a in an equation of the form Ma = f, where each column of M corresponds to the SH coefficients of a (potentially apodised) delta function (I assume that's correct?). All I'm saying is that M corresponds to a massively under-determined problem, in that number of parameters in a ≫ number of SH coefficients in f. So you have to resort to the minimum-norm solution as provided by e.g. the rank-deficient Moore-Penrose pseudo-inverse.

But thinking about this some more, I'm not sure that's really the problem, the issue arises when you attempt to compute the integral by summing up the weights for the delta functions assuming equal contribution from each, which only works for a perfectly uniform distribution that we don't have. This kind of summation doesn't happen with CSD anyway... So just ignore me.

The fundamental question is: in a case of inhomogeneous gradient direction distribution (since that's the easiest to conceptualise), should the squared error contributed from a direction with lots of other measurements nearby on the half-sphere contribute more or less than the squared error from a direction that's very distant from all other measurements?

The problem with a cluster of closely spaced measurements is that they tend to massively amplify noise for the high frequency terms. It's a bit like trying to fit a quadratic to 3 points. If the measurements are made at x=0, 1 & 2, the fit should be pretty stable. But if the points are at x = 0, 0.1 & 2, then a bit of noise in the measurements will massively influence the higher-order term in the fit. It's hard to see how weighting would help in those cases. In the current version of CSD, we apply a minor bit of minimum norm regularisation, which drastically helps to constrain such degenerate cases. But it's not a per-measurement weighting as such. The harsh reality is that in these situations, the data are inherently not well suited to the problem...

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jul 28, 2016

It's like fitting a straight line to three sets of measurements, taken at three distinct positions along the independent variable. If you have 10 measurements at x=0, 10 at x=1 and only one at x=2, the fit is unlikely to go through the unique point at x=2. But there's little doubt that the best fit line should include all of these measurements with equal weighting.

That actually helps, discussing the issue in a linear domain... The question is whether the fit should give equal weighting to each sample point, or equal weighting across the domain (i.e. x from [0,2]). If the latter, you would give 1/30 weight to all of the points at x=0 and x=1, and 1/10 weight to the sample at x=2. This is precisely what the numerical FOD integration weighting does: It figures out what fraction of the domain is covered by each sample point.

What I'm saying is that your FOD integral involves estimating a in an equation of the form Ma = f, where each column of M corresponds to the SH coefficients of a (potentially apodised) delta function (I assume that's correct?).

We might be going completely cross-purpose then. The numerical FOD integration has nothing to do with SH delta functions - unless you're not talking about the implementation but some philosophical interpretation thereof.

The amplitudes are calculated using the standard SH2A transform, and the integration is done essentially using the rectangle method. The addition of these integration 'weights' equates to taking into account variable width between the rectangles formed for different sample points across the domain (solid angles for sample directions in S2 in our case), rather than assuming that width to be fixed.

The problem with a cluster of closely spaced measurements is that they tend to massively amplify noise for the high frequency terms. It's a bit like trying to fit a quadratic to 3 points. If the measurements are made at x=0, 1 & 2, the fit should be pretty stable. But if the points are at x = 0, 0.1 & 2, then a bit of noise in the measurements will massively influence the higher-order term in the fit.

Not a perfect analogy since you always get an exact solution for fitting a quadratic to 3 points rather than least-squares; but I get the point you're making.

Similarly to before: If the fit were weighted equally across the domain, rather that equally for each sample point, would that help to decrease this noise amplification in higher frequencies? I would have thought so, since it's these 'clustered samples' that you'd be down-weighting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment