Skip to content

Convergence failure in secular equation DLAED4 #1166

@angsch

Description

@angsch

The issue was originally filed as a convergence failure of SYEVD in Jax. It can be traced to a convergence failure of DLAED4. Here is a minimal standalone reproducer.

// main.cpp
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <vector>
#include <iostream>
#include <iomanip>

extern "C" {
  void dlaed4_(const int* N, const int* I,
               const double* D, const double* Z,
               double* DELTA, const double* RHO,
               double* DLAM, int* INFO);
}

int main(int argc, char** argv) {
  const int n = 29;
  const int i = 23;
  const double rho = 6.2964808955495258e-05;

  const double d[n] = {
    -8.9907411945275305e-17, 3.1527281176230556e-06, 7.8104209108493992e-06, 1.4528378412105169e-05, 1.6571840586254615e-05, 1.6791959834500457e-05, 1.6951003434697501e-05, 1.6961635049587819e-05, 1.6969218460422806e-05,     1.7028037406879576e-05, 1.9135232081591496e-05, 2.1876238530929982e-05, 2.1920146416725336e-05, 3.9357203097119847e-05, 1.0557423781645928e-04, 1.3686127039388258e-04, 1.3686300570614894e-04, 1.3686313427158908e-04, 1.3686314099616321e-04, 1.3686314127052412e-04, 1.3686314210337733e-04, 1.3686314231692531e-04, 1.3686314257213152e-04, 5.3532577970471340e-02, 9.9996866934433004e-01, 1.0000000849390769e+00, 1.0000000901678945e+00, 1.0000000918404301e+00, 1.0000001095998177e+00
  };
  const double z[n] = {
    1.6700411548440888e-07, 2.2704845744451106e-06, -6.8596999842130412e-06, -1.9813464524908256e-04, 3.4676472662621390e-03, 5.8131816598176476e-02, -2.4389840480820292e-03, 2.2571085689296944e-03, -2.0150012240231657e-03, -9.7432047705138750e-03, 7.3184880793391433e-04, -2.9319792346511997e-02, 8.3364827297377855e-07, -8.9722126354645391e-09, 7.0412738449665979e-01, -1.4042299148200350e-06, 3.8731670039916887e-06, -5.4330381400333468e-07, 4.4026694557354993e-07, 2.2528424563293299e-07, 3.6064772311683760e-06, 2.8472995783701232e-07, 8.3086036055888118e-07, 2.8464273601162125e-02, -7.0642255076744132e-01, -5.2009238883566685e-08, 8.6348330718758654e-08, -2.9664496252475970e-08, 1.4932776629316714e-09
  };

  std::vector<double> delta(n, 0.0);
  double dlam = 0.0;
  int info = 0;

  dlaed4_(&n, &i, d, z, delta.data(), &rho, &dlam, &info);

  std::cout << std::setprecision(17) << std::scientific;
  std::cout << "info = " << info << "\n";
  std::cout << "dlam = " << dlam << "\n";

  return (info == 0) ? 0 : 1;
}

The code returns info = 1, which means not converged. The maximum number of iterations in DLAED4 is 30; the algorithm requires 31 iterations.

The eigenvalue to be found is very close to the left pole. There are several iterations where the Newton-like step overshoots and a fallback is taken. This happens for the first time in iteration 5. The step taken by the fallback turns out to be a bad step. The objective function value (W in the table) grows a lot and the remaining iterations do not suffice to correct this. Recall that this is a root finder, so W should go to zero.

iter W relative position
3 -15.183206754205958 9.3224424664962507E-012
4 -7.2541345644250672 2.8321247865713130E-011
5 15845.035200581398 0.25000000001416062
6 7937.0973190230861 5.8695311144112583E-004
7 5292.0409126050627 2.9347656988118687E-004
8 2646.0775072154233 1.1736977032438319E-004
9 1443.4324592139662 5.8684899322815527E-005
10 721.72516385211111 2.7943564578650639E-005
11 369.27924724566680 1.3971796449949251E-005
12 184.64794882583919 6.9033154364403488E-006
13 92.881305918680368 3.4516718788441071E-006
14 46.448843381611326 1.7204602732982808E-006
15 -2.9323820134484166 8.3977374357369141E-011
16 -1.4556166910784307 1.8246643456796507E-010
17 23.277161269594469 8.6032136986642434E-007
18 11.646269558195501 4.2954626252879772E-007
19 -0.91250965489136449 3.0284357148388565E-010
20 -0.45016815654621373 6.3928690716673472E-010
21 5.8486784388543924 2.1509277471798222E-007
22 2.9300774463199879 1.0726528303886306E-007
23 1.4829047906270059 5.3952284973014894E-008
24 0.73997721204365996 2.6852690757367693E-008
25 -0.35749940737873870 8.0525079151536220E-010
26 -0.15628904239033362 1.6119748312661070E-009
27 7.0768362172978744E-002 4.8489293641088358E-009
28 1.1908566236531963E-002 3.6314105287554539E-009
29 -3.2274931252138529E-004 3.4136194909394072E-009
30 -2.3743868398512502E-007 3.4192031981775864E-009

The root cause of the slow convergence is the fallback to bisection.

lapack/SRC/dlaed4.f

Lines 849 to 856 in 06f5ba3

TEMP = TAU + ETA
IF( TEMP.GT.DLTUB .OR. TEMP.LT.DLTLB ) THEN
IF( W.LT.ZERO ) THEN
ETA = ( DLTUB-TAU ) / TWO
ELSE
ETA = ( DLTLB-TAU ) / TWO
END IF
END IF

As we see by the relative position in iteration 5, bisection is really bad in this case and moves the candidate point far away from the root.

I would like to understand why bisection was chosen as the fallback.

  • LAPACK 2.0 had a different update (I did not try to decipher what exactly it does) and goes into a cyclic infinite loop for this case.
  • LAPACK 3.0 introduced bisection as fallback. This part has not been changed since.

Are there any release notes for LAPACK 3.0? Was there any other kind of dissemination, email lists etc, where I could find a hint?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions