- the updates produced by both SGD-momentum and Adam for the 2D parameters in transformer-based neural networks typically have very high **condition number**. That is, they are almost **low-rank matrices**, with the updates for all neurons being dominated by just a few directions.
    - https://kellerjordan.github.io/posts/muon/

- 矩阵的条件数（Condition Number）衡量的是当输入数据发生微小变化时，输出结果会发生多大程度的变化。换句话说，它是一个数值计算稳定性的指标。
- 对于一个方阵 $A$，其条件数 $\kappa(A)$ 通用定义
    - $\kappa(A) = \|A\| \cdot \|A^{-1}\|$
    - 这个定义已经暗示了，只有可逆矩阵（非奇异矩阵）才有（有限的）条件数。如果一个矩阵是奇异的（不可逆），那么 $A^{-1}$ 不存在，或者是无穷大的；
- 实际计算中，最常用和最能揭示几何本质的是基于奇异值分解 (SVD) 的计算方法。
    - 假设 $A$ 的的奇异值从大到小排列为 $\sigma_1, \sigma_2, \cdots, \sigma_n$
    - $\kappa(A) = \frac{\sigma_{\text{max}}}{\sigma_{\text{min}}}$

In [1]:
import numpy as np

def analyze_matrix(name, A):
    """
    Analyzes a matrix to show its rank, singular values, and condition number.
    """
    print(f"--- Analyzing Matrix: {name} ---")
    print("Matrix A:\n", A)

    # Calculate rank
    rank = np.linalg.matrix_rank(A)
    print(f"\nRank of A: {rank}")

    # Calculate singular values
    # The svd function returns U, s, Vh. 's' contains the singular values.
    U, s, Vh = np.linalg.svd(A)
    print(f"Singular values (s): {s}")
    
    sigma_max = np.max(s)
    sigma_min = np.min(s)
    print(f"  - Max singular value (σ_max): {sigma_max:.6f}")
    print(f"  - Min singular value (σ_min): {sigma_min:.6f}")

    # Calculate condition number
    # Check if the matrix is singular before calculating condition number
    if sigma_min < 1e-15: # A small threshold to check for numerical zero
        cond_num = float('inf')
        cond_num_manual = float('inf')
    else:
        cond_num = np.linalg.cond(A)
        cond_num_manual = sigma_max / sigma_min

    print(f"\nCondition number (from np.linalg.cond): {cond_num:,.2f}")
    print(f"Condition number (from σ_max / σ_min): {cond_num_manual:,.2f}")
    
    if cond_num > 1000:
        print("\nConclusion: This is an ill-conditioned (sick) matrix.")
        print("It is very close to being a singular (lower rank) matrix because its smallest singular value is tiny compared to its largest.")
    elif cond_num == float('inf'):
        print("\nConclusion: This is a singular matrix (rank-deficient). Its condition number is infinite.")
    else:
        print("\nConclusion: This is a well-conditioned matrix.")
    print("-" * 40 + "\n")


# 1. A well-conditioned matrix (close to identity)
# The two column vectors are orthogonal.
A_well = np.array([[1.0, 0.0],
                   [0.0, 1.0]])
analyze_matrix("Well-Conditioned Matrix", A_well)


# 2. An ill-conditioned matrix
# The second column is very close to being a multiple of the first.
# It's "almost" a rank-1 matrix.
A_ill = np.array([[1.0, 1.0],
                  [1.0, 1.000001]])
analyze_matrix("Ill-Conditioned Matrix", A_ill)


# 3. A singular matrix (rank-deficient)
# The second column is exactly a multiple of the first.
A_singular = np.array([[1.0, 2.0],
                       [2.0, 4.0]])
analyze_matrix("Singular Matrix", A_singular)

--- Analyzing Matrix: Well-Conditioned Matrix ---
Matrix A:
 [[1. 0.]
 [0. 1.]]

Rank of A: 2
Singular values (s): [1. 1.]
  - Max singular value (σ_max): 1.000000
  - Min singular value (σ_min): 1.000000

Condition number (from np.linalg.cond): 1.00
Condition number (from σ_max / σ_min): 1.00

Conclusion: This is a well-conditioned matrix.
----------------------------------------

--- Analyzing Matrix: Ill-Conditioned Matrix ---
Matrix A:
 [[1.       1.      ]
 [1.       1.000001]]

Rank of A: 2
Singular values (s): [2.00000050e+00 4.99999875e-07]
  - Max singular value (σ_max): 2.000001
  - Min singular value (σ_min): 0.000000

Condition number (from np.linalg.cond): 4,000,002.00
Condition number (from σ_max / σ_min): 4,000,002.00

Conclusion: This is an ill-conditioned (sick) matrix.
It is very close to being a singular (lower rank) matrix because its smallest singular value is tiny compared to its largest.
----------------------------------------

--- Analyzing Matrix: Singular Mat