New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent qform / sform on image write #1318

Closed
Lestropie opened this Issue May 3, 2018 · 23 comments

Comments

Projects
None yet
2 participants
@Lestropie
Copy link
Member

Lestropie commented May 3, 2018

Since merge of #1212, I seem to be getting a lot of warning messages related to inconsistent qform / sform, despite those NIfTIs having just been generated using MRtrix3. Ideally want to figure out what's going on here before tagging; otherwise there'll be a lot of questions on the forum.

@Lestropie Lestropie added the bug label May 3, 2018

@Lestropie Lestropie added this to the Milestone 3.0_RC3 milestone May 3, 2018

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented May 3, 2018

Some observations:

  • NIfTIs generated by either mri_convert or mrconvert, which fslhd report as having identical qform and sform matrices (save for a -0.0 v.s. 0.0), yield warnings in MRtrix3.

  • Using mrconvert rather than mri_convert to go from MGZ to .mif results in a different header transform matrix than does mrconvert going from MGZ to .nii. If I set NIfTIUseSform to 1 however, they converge.

Result: Think the issue is in reading of qform...

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 3, 2018

I'll refer you to this old discussion...

Are we talking about the same mri_convert there?

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented May 3, 2018

Yes, but I don't think that thread is wholly relevant here; apart from having no involvement of bvecs / bvals, mrconvert has behaviour consistent with its own use independent of the NIfTI format if reading from sform, but behaves differently if reading from qform.

Maybe expressed another way, in order to remove the confound of mri_convert:

  • mrconvert MGZ to MIF gives transform T.
  • mrconvert MGZ to NIfTI then mrinfo with NIfTIUseSform: 0 gives a transform other than T.
  • mrconvert MGZ to NIfTI then mrinfo with NIfTIUseSform: 1 gives transform T.
  • mrconvert MGZ to NIfTI then fslhd reports that sform and qform matrices are identical.
@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 3, 2018

Ok, that sounds ominous. I'll look into that ASAP.

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented May 3, 2018

Commenting this line solves the issue for some test data, but introduces the issue in other data that behaved appropriately without any code modification. So that's the spot. Not sure if the difference is due to Eigen handling quaternions differently to pre-Eigen code.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 3, 2018

OK, the line you refer to:

              Q.w() = std::sqrt (std::max (1.0 - Q.squaredNorm(), 0.0));

is a straight implementation of the NIfTI standard for handling the quaternion parameters, in this section. The only modification I've made there relative to the standard is to make sure w is a real number, by making sure that we don't take the square root of a negative number. This can happen for example due to rounding errors if the images are almost perfectly aligned, so that the w parameter should be zero, but the sum of squares of the other parameters is greater than one. I'm not sure how the other packages handle this situation, but clearly, if the qform is assumed to be a unit quaternion, and it comes out larger than one before we've even added in the final term, it's badly formed.

But I agree there's a problem, the matrices can come out quite a bit more different than they should. Having dug through all this, I'm struggling to see where the issue is. I'm starting to think there might be an instability in Eigen's import of rotation matrices, i.e. this function. I'll dig deeper when I get the chance...

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 3, 2018

OK, it looks like the issue is with loss of precision in the cast to float. Quick code sample to illustrate. Starting with Eigen::Matrix3d close to axis-aligned (though not axial in this case...):

M =            -1            0  6.17001e-09
      3.14321e-09 -1.86265e-09           -1
      7.45058e-09           -1 -1.86265e-09

This illustrates the problem:

Eigen::Quaterniond Q (M);
VAR (Q.coeffs());
// gives Q =  [ 1.11129e-09  0.707107  -0.707107  -4.5275e-10 ]
VAR (Q.matrix());
// gives [         -1   9.3132e-10 -2.21189e-09
//        2.21189e-09  2.22045e-16           -1
//        -9.3132e-10           -1 -2.22045e-16 ]
// so pretty close.

// cast to floating-point version (as would happen when stored to NIfTI:
Eigen::Quaternionf Qf (Q.cast<float>());
// and re-generate first component from the others, as per NIfTI standard:
Qf.w() = 0.0;
Qf.w() = std::sqrt (1.0 - (Qf.x()*Qf.x() + Qf.y()*Qf.y() + Qf.z()*Qf.z()));
VAR (Qf);
// gives Qf = [ 1.11129e-09    0.707107   -0.707107   0.000244141 ]
// note final component (w) much larger than original - but probably within machine precision

VAR (Qf.matrix());
// gives [           -1  0.000345269  0.000345265
//         -0.000345265  5.96046e-08           -1
//         -0.000345269           -1  5.96046e-08 ]
// note off-diagonal components quite a bit off - this is what causes mismatch

// cast back to double (as would happen when reading from NIfTI):
Q.x() = Qf.x();
Q.y() = Qf.y();
Q.z() = Qf.z();
// and re-generate first component from the others, as per NIfTI standard:
Q.w() = 0.0;
Q.w() = std::sqrt (1.0 - (Q.x()*Q.x() + Q.y()*Q.y() + Q.z()*Q.z()));
VAR (Q);
// gives Q = [ 1.11129e-09    0.707107   -0.707107   0.00018501 ]
// last component (w) still off

VAR (Q.matrix());
// gives [            -1  0.000261645  0.000261641
            -0.000261641  3.42285e-08           -1
            -0.000261645           -1  3.42285e-08
// so that's still off compared to the original, and still causes mismatch error...

Yet fslhd reports both qform and sform matrices as essentially identical, which I just can't replicate. I've re-implemented the function we originally had to convert quaternion to rotation matrix (before Eigen), and it matches exactly what Eigen:::Quaternion::matrix() produces, and that code matches exactly what the NIfTI standard suggests. Yet fslhd gets the same inputs (what's in the NIfTI header), so what's going on? Next step would be to check what's in the NIfTI code to see what they're doing differently. It doesn't make sense to me, the loss of precision happens mostly because that 4th element is being re-generated from the other 3, as far as I can tell...

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 3, 2018

OK, I think I found it... Line 1480 in nifti1_io.c, function nifti_quatern_to_mat44()

   a = 1.0l - (b*b + c*c + d*d) ;
   if( a < 1.e-7l ){                   /* special case */
     a = 1.0l / sqrt(b*b+c*c+d*d) ;
     b *= a ; c *= a ; d *= a ;        /* normalize (b,c,d) vector */
     a = 0.0l ;                        /* a = 0 ==> 180 degree rotation */
   } else{
     a = sqrt(a) ;                     /* angle = 2*arccos(a) */
   }

and sure enough, if I implement the same fudge in MRtrix3, the problem goes away...

Pull request to follow shortly.

jdtournier added a commit that referenced this issue May 3, 2018

Lestropie added a commit that referenced this issue May 4, 2018

Nifti ver 2: Handle near-zero w component in qform
As discussed in #1318.
Echoes change in f28dbb8 for NIfTI version 2 format.
@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented May 4, 2018

🎉 Sorry I ran out of time yesterday to follow this through completely.

Same expression is being used in nifti2_io.c, so that's now echoed in #1324.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented May 9, 2018

closed with #1324

@jdtournier jdtournier closed this May 9, 2018

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jun 4, 2018

Stuffed around with this a little bit more due to forum thread:

  • Extra numerical precision of quaternion / matrix data in NIfTI-2 doesn't appear to have any effect on the issue.
    (I'm nevertheless doing my testing predominantly with NIfTI-2 since it removes the confound of 32-bit floating-point data)

  • Including modification of quaternion x,y,z parameters as in the code example above does have a minor effect on the transform matrix derived from the qform quaternion, but it's far smaller than the difference between qform and sform transform matrices.
    (In #1324 only nulling of the w component was added)

  • For an example problem image I found on my own hard drive:

    • voxel_grids_match_in_scanner_space() throws a warning even with a 0.01 tolerance; warning disappears at 0.011 tolerance.

    • fslhd gives a ~ 1e-3 difference in rotation matrix components between qform and sform; note though that this is for a NIfTI-1 image (my FSL install won't read NIfTI-2), so matrix-to-quaternion conversion and 32-bit storage could be having an effect here.

While we could up the tolerance for the warning to e.g. 0.1 of a voxel, that's beginning to approach the kind of accuracy one might expect from e.g. registration, so it's not an ideal degree of inaccuracy to be accepting. But this solution might have to suffice.

My suspicion is that the code example above is an incomplete fix for some issue in mapping quaternions near the z-axis to transform matrices, which can't be fixed by enhanced floating-point precision alone. However characterising this and deriving a more precise numerical fix might be a fair bit of work.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 4, 2018

Ok, that sounds bad. I'd have expected NIfTI-2 to sort this out. Any chance you could post the mih header for a problematic image? I'll stick on an empty dat file and convert to NIfTI-2 to see if I can reproduce.

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jun 4, 2018

mrtrix image
dim: 48,48,29
vox: 5,5,5
layout: -0,-1,+2
datatype: Float32LE
transform: 0.9999755859375, 6.3412392137252e-08, -0.0069812573492527, -120.01363337115
transform: -0.000328925903886557, 0.998889827728272, -0.0471052795648575, -106.24320230959
transform: 0.00697350800037384, 0.0471064984798431, 0.998865509033203, -68.2622950226068
command_history: mrconvert "test.nii" "test_qform.mif"  (version=3.0_RC3-20-g92e06e0a-dirty)
command_history: mrconvert "test_qform.mif" "test_qform.mih" "-force"  (version=3.0_RC3-20-g92e06e0a-dirty)
comments: FSL5.0
mrtrix_version: 3.0_RC3-20-g92e06e0a-dirty
file: test_qform.dat
@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 4, 2018

OK, confirmed with NIfTI-2... This will need to be sorted out ASAP.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 4, 2018

Quick update: had a play with this, issue is indeed present with NIfTI-1 and NIfTI-2, and fslhd output (NIfTI-1 only) matches mrinfo -raw output perfectly (once voxel sizes are accounted for). So if there is an issue, it's at the NIfTI write-out stage - i.e. the qform genuinely doesn't match the sform in the image header as written.

I had a quick play with fslorient to see how FSL handles this. I tried fslorient -copysform2qform to see what happens when it overwrites the qform from the sform information: this also produces exactly the same mismatch as we get, as reported by fslhd:

directly after mrconvert:

$ fslhd test-1.nii | grep _xyz
qto_xyz:1      -4.999878  0.000822  -0.034887  114.980644
qto_xyz:2      0.000822  -4.994450  -0.235529  128.418610
qto_xyz:3      -0.034887  -0.235529  4.994328  -55.553493
qto_xyz:4      0.000000  0.000000  0.000000  1.000000
sto_xyz:1      -4.999878  -0.000000  -0.034906  114.980644
sto_xyz:2      0.001645  -4.994449  -0.235526  128.418610
sto_xyz:3      -0.034868  -0.235532  4.994328  -55.553493
sto_xyz:4      0.000000  0.000000  0.000000  1.000000

after fslorient -copysform2qform:

$ fslorient -copysform2qform test-1.nii
$ fslhd test-1.nii | grep _xyz
qto_xyz:1      -4.999878  0.000822  -0.034887  114.980644
qto_xyz:2      0.000822  -4.994450  -0.235529  128.418610
qto_xyz:3      -0.034887  -0.235529  4.994328  -55.553493
qto_xyz:4      0.000000  0.000000  0.000000  1.000000
sto_xyz:1      -4.999878  -0.000000  -0.034906  114.980644
sto_xyz:2      0.001645  -4.994449  -0.235526  128.418610
sto_xyz:3      -0.034868  -0.235532  4.994328  -55.553493
sto_xyz:4      0.000000  0.000000  0.000000  1.000000

So clearly, FSL also produces non-matching sform and qform information here. Which makes me think that we should check whether the sform (i.e the rotation part of our transform matrix) is genuinely orthogonal... Clearly, if it's not a perfect rigid-body rotation, there's no way the qform can represent that, which might explain these issues. I'll look into that when I have a minute...

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 4, 2018

OK, the transform you provided is not significantly non-rigid, so that's not it. I've added checks for that now in the Header::sanitise_transform() call, which you can find on the nifti_cleanup branch if you're interested.

Also checked that if you prevent MRtrix from writing out the sform, and use fslorient -copyqform2sform to repopulate it from the qform, it all checks out OK - both in fslhd and mrinfo. But the converse is not true: if you prevent MRtrix from writing out the qform, and use fslorient -copysform2qform to repopulate the qform from the sform, they don't match in either fslhd or mrinfo, in exactly the same way as is currently generated by MRtrix. So FSL suffers from the exact same issue too, but I'm not sure what the issue is exactly...

@jdtournier jdtournier reopened this Jun 4, 2018

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 4, 2018

I think I've got to the bottom of this, but it's not great news. At heart, this is a fundamental limitation of the way quaternions are stored in NIfTI, namely the dropping of the w coefficient on the premise that the quaternion is unit normalised. This means the header read code needs to re-create that particular component using √(1-x²-y²-z²). To investigate, I modified core/file/nifti2_utils.cpp in the write() call, as follows:

371         VAR (R);
372         Eigen::Quaterniond Q (R);
373         VAR (Q.vec());
374         VAR (Q.w());
378         const double w = 1.0 - Q.vec().squaredNorm();
376         VAR (w);
377         Q.w() = std::sqrt (w);
378         VAR ((Q.matrix() - R).array().abs().maxCoeff());

This gives:

R =    -0.999976 -6.35545e-08  -0.00698126
 0.000328926     -0.99889   -0.0471053
 -0.00697351   -0.0471065     0.998866
Q.vec() = -0.00348968
 -0.0235596
   0.999716
Q.w() = 8.22707e-05
w = 6.76847e-09
(Q.matrix() - R).array().abs().maxCoeff() = 1.91774e-12

i.e. the w term regenerated from the other components is a bit mangled when close to zero - and this is using double precision throughout. If you comment out line 377 and leave Q.w() as-is, everything works out as expected, with that last max absolute difference falling to 4.44089e-16.

Now this isn't such a big deal as-is, but when combined with this line in the read() call:

202           Q.w() = w < 1.0e-7 ? 0.0 : std::sqrt (w);

you can see that the w component now gets set to zero - and if you do that, you find that the max absolute difference is now 0.000164495. This is what now triggers the warning.

I don't think there's a great deal we can do about this. The way these quaternions are stored in NIfTI is fundamentally unstable when w is close to zero, which I think represents 180° rotations - and that unfortunately applies to just about any NIfTI converted straight from a DICOM axial scan...

So I don't think we'll have any option but to relax the tolerance in the qform/sform consistency check.

Moving forward, I also think this is reason enough to default to the sform rather than the qform, since it's not affected by this instability. But the qform is likely closer to what the standard intended, so I don't know that we can revert to using the sform preferentially, given the discussion we had around that. But truth be told, I'd be quite happy to switch back to sform preferentially, since I think there's a good argument to be made for it being the intended transform if set, and besides it's (I think) more stable... Up for discussion, but probably not for any immediate action.

jdtournier added a commit that referenced this issue Jun 4, 2018

NIfTI: slight modification to handling of low w in qform
This is to match the FSL code more closely, see this post:
#1318 (comment)

Impact in practice is negligible, as far as I can tell...

jdtournier added a commit that referenced this issue Jun 4, 2018

NIfTI: slight modification to handling of low w in qform
This is to match the FSL code more closely, see this post:
#1318 (comment)

Impact in practice is negligible, as far as I can tell...
@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jun 5, 2018

Deleted comment, since I'm pretty sure my understanding was off.

I've mucked around a little myself, and I think I'm coming to the same conclusion as you did. Though I think it's w being negative that indicates a 180 rotation, not w being small. I got slightly confused due to this line:

if( a < 1.e-7l ){   

That's catching negative values, not values small in magnitude. I suspect it's accounting for the possibility where an input quaternion with very small negative w could produce a very small positive w when reconstructed from the other three coefficients due to single-precision data. A little numerical test showed that with w=0.0, random single-precision x,y,z, calculations in double-precision, w^2 > 1e-7 (almost) always. Therefore that's the threshold used to separate "negative" w from "positive" w.

I suspect that the test data that justified the change in #1324 was a case with a small negative w at input leading to a small positive recalculated w due to single-precision data, and hence treating w < 1e-7 as "negative" removed the warning; whereas new test data has a small positive w at input, small positive recalculated w, and clamping that to 0.0 is now leading to the same warning. What I think we need is to shift that "zero" threshold back to 0.0, get example data for both cases described above, and tune the qform / sform mismatch warning threshold accordingly.

Whether or not to revert to using sform as the default may depend on the maximal magnitude of the error between them, which voxel_grids_match_in_scanner_space() quantifies quite nicely.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 5, 2018

I think I'm coming to the same conclusion as you did.

Glad to hear it!

Though I think it's w being negative that indicates a 180 rotation, not w being small.

Well, according to the angle-axis representation: the w term is given as cos(θ/2), so is zero for θ=180°. When w<0, it's a rotation greater than 180°, which is best represented as a 360°-θ rotation around the opposite axis. In fact, that's exactly what is done at this point in the write() call: if w is negative, we invert the whole quaternion wholesale (although I note that only inverts the x,y,z coefficients, but that doesn't matter since they're the only ones to be written out), which has exactly that effect.

That's catching negative values, not values small in magnitude.

Technically, w should never be negative, given the line above, and the fact that it's regenerated via a square root. If it was allowed to be negative, there would be a fundamental ambiguity here. This did cause issues at some point after we switched to the Eigen::Quaternion class (although there's not much information about this in commit 9f97126...), which is why I modified the code to guarantee a positive w.

I suspect that the test data that justified the change in #1324 was a case with a small negative w at input leading to a small positive recalculated w due to single-precision data, and hence treating w < 1e-7 as "negative" removed the warning; whereas new test data has a small positive w at input, small positive recalculated w, and clamping that to 0.0 is now leading to the same warning.

Given the above, the 'input' w can never be negative by construction. But sometimes numerical precision means that when re-computed via w² = 1 - x² - y² - z², its square comes up negative. The safeguard I originally put in there was to catch this so we could safely take the square root - but as it turns out, that's not the only issue here:

What I think we need is to shift that "zero" threshold back to 0.0, get example data for both cases described above, and tune the qform / sform mismatch warning threshold accordingly.

You'll note that our previous implementation of this was clipped at zero, not 1e-7. If you look back up this thread, this caused quite significant instabilities. This is why we looked up what FSL did, and used the same fudge. I don't think we can go back to a zero threshold here.

A little numerical test showed that with w=0.0, random single-precision x,y,z, calculations in double-precision, w^2 > 1e-7 (almost) always. Therefore that's the threshold used to separate "negative" w from "positive" w.

Maybe we should use that idea to determine an appropriate threshold for NIfTI-2 also...?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 5, 2018

OK, I also wrote a small bit of test code to figure out the appropriate precisions. Here it is for reference:

#include <iostream>
#include <random>


int main ()
{
  std::random_device r;
  std::mt19937 rng (r());
  std::normal_distribution<> normal_dist;

  double min_wd = 0.0, max_wd = 0.0;
  double min_wf = 0.0,  max_wf = 0.0;

  for (int n = 0; n < 1e6; ++n) {
    double x = normal_dist (rng);
    double y = normal_dist (rng);
    double z = normal_dist (rng);
    double d = std::sqrt (x*x + y*y + z*z);
    x /= d;
    y /= d;
    z /= d;

    double wd = 1.0 - (x*x + y*y + z*z);

    x = float (x);
    y = float (y);
    z = float (z);
    double wf = 1.0 - (x*x + y*y + z*z);

    max_wd = std::max (max_wd, wd);
    min_wd = std::min (min_wd, wd);
    max_wf = std::max (max_wf, wf);
    min_wf = std::min (min_wf, wf);
  }



  std::cout << "double [ " << min_wd << " " << max_wd << " ]    float: [ " << min_wf << " " << max_wf << " ]\n";

  return 0;
}

and this compiles to give:

]$ g++ -std=c++11 test_quat.cpp -o test_quat && ./test_quat
double [ -6.66134e-16 5.55112e-16 ]    float: [ -1.00603e-07 9.89116e-08 ]

So the 1e-7 for the float version seems appropriate, but far too conservative for the double version. It looks like we should change that threshold of 1e-15 . What do you think?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 5, 2018

We should also bear in mind that changing the threshold for NIfTI-2 is unlikely to have any effect on our users, they'll overwhelmingly be using NIfTI-1 images for the foreseeable future. Realistically, the only 'fix' I can see here is to adopt a much more liberal threshold on the consistency check, to basically confirm that the transformations at least look like they correspond, even if not particularly accurately.

And having thought about it a bit more, I also think we should revert back to the sform preferentially, since it's the only one that guarantees no loss of precision when converting to NIfTI and back again... So at least in our pipelines, things will remain perfectly aligned. Since we have the consistency check in place, we also warn our users if they differ, which is the only case where having to choose one over the other might introduces issues in their processing.

Maybe an even better approach would be to pick the sform if the (more relaxed) consistency check passes, but default to the qform (or whatever is specified in the config) otherwise? This would preserve precision in our pipelines, but also remain true to the 'spirit' of the standard and our recent changes on that front, as per previous discussions. What do you reckon?

@Lestropie

This comment has been minimized.

Copy link
Member Author

Lestropie commented Jun 7, 2018

I was thinking that other packages may not guarantee positive w on write; but if that were the case, the ambiguity would extend beyond small w. The threshold is also on w^2, not w 🤦‍♂️

You'll note that our previous implementation of this was clipped at zero, not 1e-7. If you look back up this thread, this caused quite significant instabilities.

Were those instabilities any larger than what we're observing now? Looks like two sides of the same coin to me: Initial problem was not clamping to zero when we should have been, but now we're clamping to zero when ideally we wouldn't.

So the 1e-7 for the float version seems appropriate, but far too conservative for the double version. It looks like we should change that threshold of 1e-15.

👍

Maybe an even better approach would be to pick the sform if the (more relaxed) consistency check passes, but default to the qform (or whatever is specified in the config) otherwise?

👍, as long as the documentation / terminal messaging is clear.

jdtournier added a commit that referenced this issue Jun 8, 2018

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jun 8, 2018

BTW, it turns out we were already defaulting to the sform if the transforms match. So this is a smaller change than expected. PR in #1363 - closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment