We wish to find a non-trivial factor of a composite, odd integer $N$. We will not be concerned with finding the complete factorisation of $N$ into primes.

Suppose that we are given a supply of integers $x$ and $y$ satisfying the congruence $x^2 \equiv y^2 \mod{N}$. Then
\begin{equation}
    x^2 - y^2 = (x - y)(x + y) \equiv 0 \mod{N}.
\end{equation}

This means that the integer $N$ divides the product $(x - y)(x + y)$, so a factor of $N$ must be shared with at least one of $(x - y)$ or $(x + y)$. Now we compute $d_1 = \gcd(x - y, N)$ and $d_2 = \gcd(x + y, N)$. Provided that $x \not\equiv \pm y \mod{N}$, this successfully finds a non-trivial factor of $N$. This is known as the quadratic sieve.

The main computational step in this process, once $x$ and $y$ are known, is the calculation of the greatest common divisor. Uisng the Euclidean algorithm, for two integers whose size is at most $d$ digits, the time complexity is $O(\log^2 N) = O(d^2)$ which is polynomial time.

Since $N$ is an odd composite number, it must have at least two distinct odd prime factors $N = pq$. The chinese remainder theorem states that a system of congruences
\begin{equation}
    y \equiv 1 \mod p, \\
    y \equiv -1 \mod q,
\end{equation}
has a unique solution for $y$ modulo $N = pq$. Then
\begin{equation}
    y^2 \equiv 1 \mod{p}, \\
    y^2 \equiv 1 \mod{q},
\end{equation}
therefore $y^2 - 1 \equiv 0 \mod{N}$. Then for the trivial choice $x = 1$, we have
\begin{equation}
    x^2 \equiv y^2 \mod{N}.
\end{equation}
Clearly, $y \not\equiv \pm 1 \mod{N}$, as either cae would violate one of the congruences in our defined system.


---

In fact, if $N$ has $k$ distinct prime factors, there are at least $2^{k-1}$ such square roots of $1$ that can be used to factor $N$. One source of integers
arises from the convergents of the continued fraction expansion $\sqrt{N}$. The convergents $P_n$ grow exponentially as $N$ increases. An efficient algorithm should perform all calculations modulo N.

The recurrence relation for the numerators of the convergents is
\begin{equation}
    P_n = a_n P_{n-1} + P_{n-2}.
\end{equation}
We can compute its value modulo $N$ at each step,
\begin{equation}
    P_n \mod N = a_n (P_{n-1} \bmod N) + (P_{n-2} \bmod N) \mod N.
\end{equation}
This ensures that the numbers involved never exceed $(a_n + 1) N$.

We use modular multiplication which computes $P_n^2 \bmod{N}$. mod N using only additions and doublings, avoiding a large intermediate product.

The following program implements the continued fraction algorithm for $\sqrt{N}$ while keeping track of the convergents $P_n$ modulo $N$. It iterates through the convergents until it finds one $P_n$ such that $P_n^2 \bmod N$ is a perfect square. This provides a pair $(x, y)$ where $x = P_n \mod N$ and $y = \sqrt{P_n^2 \bmod N}$, satisfying the condition $x^2 \equiv y^2 \mod N$.

In [181]:
import math

def find_congruent_squares(N):
    '''
    Computes the convergents Pn of sqrt(N) and finds the first Pn
    such that Pn^2 mod N is a perfect square. This is the DEBUGGED version.
    Args:
        N: The integer to be factored.

    Returns:
        A tuple (n, x, y) where x = Pn mod N and y = sqrt(Pn^2 mod N).
        Returns None if N is a perfect square.
    '''
    m0 = math.isqrt(N)
    if m0 * m0 == N:
        print(f"N = {N} is a perfect square.")
        return None

    # State for the continued fraction algorithm: m_n, d_n, a_n
    # Initialise for n=0
    m = 0
    d = 1
    a = m0

    # State for the convergents P_n mod N
    # P_{-2} = 0, P_{-1} = 1
    p_prev, p_curr = 0, 1

    print(f"\nSearching for a solution for N = {N}.")
    for n in range(2 * m0 * m0): # A safety break for the loop
        # Calculate the next convergent P_n using a_n
        # The 'a' here is a_n, carried over from the previous iteration.
        p_next = (a * p_curr + p_prev) % N
        p_prev, p_curr = p_curr, p_next

        # Calculate P_n^2 mod N
        p_squared_mod_N = pow(p_curr, 2, N)

        # Check if we found a non-trivial solution
        if n > 0: # Avoid the n=0 case
            y = math.isqrt(p_squared_mod_N)
            if y * y == p_squared_mod_N:
                # We have x^2 = y^2 mod N. Now check if it's a useful solution.
                # x = p_curr, y = y
                # A trivial solution is one where x = y or x = N-y
                if p_curr != y and p_curr != (N - y):
                    print(f"Success! Found non-trivial solution at n = {n}.")
                    print(f"  x = P_{n} mod N = {p_curr}")
                    print(f"  y = sqrt(P_{n}^2 mod N) = {y}")

                    # Calculate the factor
                    factor = math.gcd(abs(p_curr - y), N)
                    print(f"  Factor found: gcd(|x-y|, N) = {factor}")
                    return n, p_curr, y

        # Calculate the *next* a (a_{n+1}) for the next iteration
        m_next = a * d - m
        d_next = (N - m_next * m_next) // d
        a_next = (m0 + m_next) // d_next
        m, d, a = m_next, d_next, a_next

    print("No solution found within the search limit.")
    return None

numbers_to_run = [
    2012449237,
    2575992413,
    3548710699
]

for number in numbers_to_run:
    find_congruent_squares(number)


Searching for a solution for N = 2012449237.
Success! Found non-trivial solution at n = 19.
  x = P_19 mod N = 214480650
  y = sqrt(P_19^2 mod N) = 38
  Factor found: gcd(|x-y|, N) = 43987

Searching for a solution for N = 2575992413.
Success! Found non-trivial solution at n = 143.
  x = P_143 mod N = 1455799129
  y = sqrt(P_143^2 mod N) = 22
  Factor found: gcd(|x-y|, N) = 36467

Searching for a solution for N = 3548710699.
Success! Found non-trivial solution at n = 365.
  x = P_365 mod N = 3295385430
  y = sqrt(P_365^2 mod N) = 35
  Factor found: gcd(|x-y|, N) = 113497


We wish to find a non-zero vector $v$ in the kernel of a binary matrix $A$, i.e., a solution $Av = 0$. This is equivalent to finding a linear dependency among the columns of $A$. Gaussian elimination over the field $F_2$ is particularly simple as addition is equivalent to XOR and multiplication is equivalent to AND.

The only row operation needed to clear a column is adding one row to another (row-wise XOR), since the only non-zero scalar is $1$.

In [182]:
import numpy as np
from collections import defaultdict

def solve_binary_system(A):
    '''
    Solves for a non-zero vector v such that Av = 0 over the field F2,
    using Gaussian elimination.
    Args:
        A: The input matrix with entries 0 or 1.

    Returns:
        A non-zero solution vector v, or None if no solution exists.
    '''
    num_rows, num_cols = A.shape
    matrix = np.copy(A)

    pivot_row = 0
    pivot_cols = []

    # Forward elimination to row echelon form
    for j in range(num_cols): # Iterate through columns
        if pivot_row < num_rows:
            # Find a row with a 1 in the current column
            i = pivot_row
            while i < num_rows and matrix[i, j] == 0:
                i += 1
            if i < num_rows: # Pivot found at (i, j)
                # Swap rows to move the pivot to the current pivot_row
                matrix[[pivot_row, i]] = matrix[[i, pivot_row]]
                pivot_cols.append(j)
                # Eliminate other 1s in this column
                for k in range(num_rows):
                    if k != pivot_row and matrix[k, j] == 1:
                        # Add pivot_row to row k (XOR operation)
                        matrix[k, :] = (matrix[k, :] + matrix[pivot_row, :]) % 2
                pivot_row += 1

    # Identify free variables and back substitution
    free_cols = [j for j in range(num_cols) if j not in pivot_cols]

    if not free_cols:
        # Only the trivial solution v=0 exists
        return None

    # Find a solution
    v = np.zeros(num_cols, dtype=int)

    # Set the first free variable to 1 to ensure a non-zero solution
    first_free_col = free_cols[0]
    v[first_free_col] = 1

    # Solve for the pivot variables using back substitution
    for i in range(len(pivot_cols) - 1, -1, -1):
        pivot_col = pivot_cols[i]
        pivot_val = matrix[i, pivot_col] # Should be 1
        if pivot_val == 1:
            # Sum of other terms in the row
            row_sum = np.dot(matrix[i, :], v) % 2
            v[pivot_col] = row_sum

    return v


def get_exponent_vector(n, factor_base):
    exponents = defaultdict(int)
    if n < 0:
        exponents[-1] = 1
        n = -n
    for p in factor_base:
        if p == -1: continue
        while n % p == 0:
            exponents[p] += 1
            n //= p
    if n == 1:
        return np.array([exponents[p] % 2 for p in factor_base], dtype=int)
    else:
        return None

*   Choose a factor base $B$: Select a set of small prime numbers $\{p_1, \dots, p_k\}$ and append $-1$ to handle signs.
*   Generate relations:
    *   Compute the convergents $P_n/Q_n$ of the continued fraction of $\sqrt{N}$.
    *   For each convergent we have the approximation $P_n/Q_n \approx \sqrt{N}$, which gives $P_n^2 \approx NQ_n²$.
    *   Let $K_n = P_n² - N$. We are interested in the convergents where $K_n$ is B-smooth.
*   Collect smooth relations: We generate convergents and test $K_n$ for $B$-smoothness until we have collected at least $|B| + 1 = k + 1$ such relations. For each smooth $K_n$, we find its prime factorization over $B$
\begin{equation}
    K_n = (-1)^{e_0} p_1^{e_1} \cdots p_k^{e_k}.
\end{equation}
*   Linear algebra over $F_2$:
    *   For each smooth $K_n$, we create an exponent vector $v_n = [e_0, \dots, e_k]$ modulo $2$.
    *   We then form a matrix $A$ where each row corresponds to an exponent vector.
    *   The goal is to find a set of these $K_n$ values whose product is a perfect square. This happens if the sum of their corresponding exponent vectors is the zero vector modulo $2$.
    *   This is equivalent to finding a non-zero solution $v$ to the system $A^\top v = 0.$ The vector $v$ will have $1$s indicating which $K_n$ values should be combined.

*   Form the congruence:
    *   Let the solution $v$ select the relations corresponding to indices $\{i_1, \dots, i_m\}$.
    *   Define
\begin{equation}
    x = (P_{i_1} \cdots P_{i_m}) \mod{N}, \\
    y^2 = K_{i_1} \cdots K_{i_m}.
\end{equation}
    *   Since the sum of exponent vectors is zero, all exponents in the factorisation of y^2 are even, so $y$ is an integer. This generates the congruence $x^2 \equiv y^2 \mod{N}$.
*   Factor $N$: Calculate $d = \gcd(x - y, N)$. If $1 < d < N$, then we have found a factor. Otherwise, if $d$ is a trivial factor, then we must find another linear dependency in the matrix $A$ and try again.

In [183]:
def cfrac_final(N, factor_base_size=50, max_convergents=10000):
    '''
    CFRAC implementation for factorisation.
    '''
    print(f"\nStarting CFRAC factorisation for N = {N}")

    # Choose factor base
    primes = [2]
    i = 3
    while len(primes) < factor_base_size:
        is_prime = all(i % p != 0 for p in primes)
        if is_prime: primes.append(i)
        i += 2
    factor_base = [-1] + primes
    print(f"Using factor base of size {len(factor_base)} (primes up to {primes[-1]}).")

    # Generate and collect smooth relations ---
    m0 = math.isqrt(N)
    m, d, a = 0, 1, m0

    p_prev, p_curr = 0, 1
    q_prev, q_curr = 1, 0

    smooth_relations = []
    exponent_vectors = []

    print(f"Searching for {len(factor_base) + 5} B-smooth relations (up to {max_convergents} convergents).")

    for n in range(max_convergents):
        if n > 0 and n % 500 == 0:
            print(f"  Checked {n} convergents, found {len(smooth_relations)} smooth relations.")

        # Calculate P_n and Q_n for the current convergent
        p_next = a * p_curr + p_prev
        q_next = a * q_curr + q_prev
        p_prev, p_curr = p_curr, p_next
        q_prev, q_curr = q_curr, q_next

        # Test the small value Kn = Pn^2 - N * Qn^2
        Kn = p_curr**2 - N * q_curr**2

        exp_vec = get_exponent_vector(Kn, factor_base)
        if exp_vec is not None:
            # We store P_n mod N, not the full P_n, to save memory
            smooth_relations.append({'p_mod_n': p_curr % N, 'k': Kn})
            exponent_vectors.append(exp_vec)

        if len(smooth_relations) >= len(factor_base) + 5:
            print(f"Found enough smooth relations at convergent n = {n}.")
            break

        m_next = a * d - m
        d_next = (N - m_next * m_next) // d
        a_next = (m0 + m_next) // d_next
        m, d, a = m_next, d_next, a_next

    if len(smooth_relations) < len(factor_base) + 1:
        print(f"Could not find enough smooth relations within {max_convergents} convergents.")
        return

    # Linear algebra and factorization
    print("Constructing matrix and finding linear dependency.")
    A = np.array(exponent_vectors).T

    v = solve_binary_system(A)
    if v is None:
        print("\n--- FAILED ---")
        print("Could not find a linear dependency.")
        return

    x, y_squared_factors = 1, defaultdict(int)
    for i in range(len(v)):
        if v[i] == 1:
            relation = smooth_relations[i]
            x = (x * relation['p_mod_n']) % N
            temp_n = relation['k']
            if temp_n < 0: y_squared_factors[-1] += 1; temp_n = -temp_n
            for p in factor_base:
                if p == -1: continue
                while temp_n % p == 0: y_squared_factors[p] += 1; temp_n //= p
    y = 1
    for p, exponent in y_squared_factors.items():
        y = (y * pow(p, exponent // 2))
    y %= N

    factor = math.gcd(abs(x - y), N)

    if 1 < factor < N:
        print(f"Success! Found a non-trivial factor: {factor}")
        print(f"{N} = {factor} * {N // factor}")
    else:
        print("Failed. Found a trivial factor (x = +/-y).")

numbers_to_factor = [
    2012449237,
    2575992413,
    3548710699
]
for n in numbers_to_factor:
    cfrac_final(n, factor_base_size=10, max_convergents=10000)


Starting CFRAC factorisation for N = 2012449237
Using factor base of size 11 (primes up to 29).
Searching for 16 B-smooth relations (up to 10000 convergents).
Found enough smooth relations at convergent n = 160.
Constructing matrix and finding linear dependency.
Success! Found a non-trivial factor: 43987
2012449237 = 43987 * 45751

Starting CFRAC factorisation for N = 2575992413
Using factor base of size 11 (primes up to 29).
Searching for 16 B-smooth relations (up to 10000 convergents).
Found enough smooth relations at convergent n = 490.
Constructing matrix and finding linear dependency.
Failed. Found a trivial factor (x = +/-y).

Starting CFRAC factorisation for N = 3548710699
Using factor base of size 11 (primes up to 29).
Searching for 16 B-smooth relations (up to 10000 convergents).
Found enough smooth relations at convergent n = 198.
Constructing matrix and finding linear dependency.
Success! Found a non-trivial factor: 113497
3548710699 = 113497 * 31267


The number of convergents required is highly probabilistic. It depends on the size of $N$ and the factor base $B$. The density of $B$-smooth numbers decreases as the numbers $K_n$ grow.

For a small choice of factor base, the linear algebra step is fast because the matrix is small. but the probability of a number $K_n$ being $B$-smooth is very low and we must generate a huge number of convergents. Conversely, for a large choice of factor base the probability of finding smooth numbers is much higher so we need fewer convergents, but the matrix becomes very large and the Gaussian elimination step with complexity around $O(k^3)$ is infeasible.

*   For the huge but sparse matrices that arise when factoring large numbers, iterative methods like the Block Lanczos algorithm or Wiedemann algorithm can be used.
*   Instead of checking each $K_n$ for smoothness one-by-one by trial division, it could be more efficient to use a sieve of Eratosthenes.
*   While historically important, CFRAC is no longer the standard factorisation algorithm. The quadratic sieve replaces the $P_n^2 - N$ relation generation with a simpler polynomial $(x+m)^2 - N$.
*   The general number field sieve is used for very large numbers and it is asymptotically the fastest known classical algorithm and works with more complex number field structures but follows the same fundamental principle of finding a congruence of squares via linear algebra.