# Notebook 05a: The Discrete Logarithm Problem

**Module 05. The Discrete Logarithm and Diffie-Hellman**

---

**Motivating Question.** In the real numbers, if someone tells you $2^x = 1024$, you solve $x = \log_2 1024 = 10$ instantly. But what if someone tells you $3^x \equiv 13 \pmod{17}$? You can't just take a logarithm, the group structure of $\mathbb{Z}/p\mathbb{Z}^*$ scrambles the usual patterns. This "discrete logarithm" problem is *believed* to be fundamentally hard for large primes, and that hardness is the foundation of Diffie-Hellman, ElGamal, DSA, and elliptic curve cryptography.

---

**Prerequisites.** You should be comfortable with:
- Modular arithmetic and cyclic groups (Module 01)
- The multiplicative group $\mathbb{Z}/p\mathbb{Z}^*$ and Euler's totient (Module 04)

**Learning objectives.** By the end of this notebook you will be able to:
1. State the discrete logarithm problem precisely.
2. Distinguish the DLP from ordinary (real-valued) logarithms.
3. Compute discrete logs by brute force and understand why brute force fails for large groups.
4. Use SageMath's `discrete_log()` to solve DLP instances.
5. Appreciate the *asymmetry* that makes DLP useful for cryptography: exponentiation is fast, but inversion (the discrete log) is slow.

## 1. Ordinary vs Discrete Logarithms

Over the real numbers, the exponential function $f(x) = b^x$ is a **bijection** from $\mathbb{R}$ to $\mathbb{R}_{>0}$, so it has a well-defined inverse: the logarithm $\log_b$.

| Setting | Given | Find | Method |
|---------|-------|------|--------|
| $\mathbb{R}$ | $2^x = 1024$ | $x = 10$ | $x = \log_2(1024)$, closed-form formula |
| $\mathbb{Z}/p\mathbb{Z}^*$ | $g^x \equiv h \pmod{p}$ | $x = ?$ | No known efficient general method! |

The key difference: in $\mathbb{Z}/p\mathbb{Z}^*$, the exponential function "wraps around" modulo $p$. The smooth, monotone structure that makes real logarithms easy is **destroyed** by the modular reduction.

In [None]:
# Real logarithm: smooth, predictable
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Left: continuous exponential
x_vals = [i/10 for i in range(1, 161)]
y_real = [2.0^x for x in x_vals]
ax1.plot(x_vals, y_real)
ax1.set_title('Real: $2^x$ (smooth, invertible)')
ax1.set_xlabel('x')
ax1.set_ylabel('$2^x$')

# Right: discrete exponential mod p, scrambled!
p = 23
g = Mod(5, p)
x_disc = list(range(0, p - 1))
y_disc = [int(g^x) for x in x_disc]
ax2.scatter(x_disc, y_disc, s=20, color='red')
ax2.set_title(f'Discrete: $5^x \mod {p}$ (scrambled!)')
ax2.set_xlabel('x')
ax2.set_ylabel('$5^x \mod p$')
ax2.set_yticks(range(0, p, 2))

plt.tight_layout()
plt.show()

Look at the right plot: the values jump around unpredictably. Given a $y$-value, you cannot "read off" the $x$ from the plot, you would have to check each point. This visual scrambling is the heart of the DLP's difficulty.

## 2. The Formal Definition

**Definition (Discrete Logarithm Problem).** Let $G$ be a cyclic group of order $n$, and let $g$ be a generator. Given $h \in G$, find an integer $x$ with $0 \le x < n$ such that
$$g^x = h.$$

We write $x = \log_g h$ and call $x$ the **discrete logarithm** of $h$ with respect to base $g$.

The most common concrete setting is $G = \mathbb{Z}/p\mathbb{Z}^*$ (the multiplicative group modulo a prime $p$), which has order $p - 1$.

---

> **Misconception alert.** "The discrete log might not exist."  
> If $g$ is a **generator** of the cyclic group $G$, then every element $h \in G$ is a power of $g$, so the discrete log *always* exists and is unique modulo $|G|$. The problem is not *existence*, it is *finding* it efficiently.

In [None]:
# Concrete example: DLP in Z/23Z*
p = 23
g = Mod(5, p)   # 5 is a primitive root mod 23 (we'll prove this in 05b)

# Build the complete "discrete log table"
print(f"Powers of g = {g} in Z/{p}Z*:")
print("x | g^x")print("-" * 14)
for x in range(p - 1):
    print(f"{x} | {int(g^x)}")

> **Checkpoint 1.** Using the table above, find $\log_5 8 \pmod{23}$ (i.e., find $x$ such that $5^x \equiv 8 \pmod{23}$). Scan the right column for 8 and read off $x$. Could you do this if $p$ had 300 digits?

## 3. Brute-Force Search

The simplest approach: try $x = 0, 1, 2, \ldots$ until $g^x = h$. This requires at most $|G|$ multiplications.

For $|G| = p - 1 \approx 22$, this is instant. For $|G| \approx 2^{2048}$, the number of operations exceeds the number of atoms in the observable universe ($\approx 2^{266}$).

In [None]:
def discrete_log_brute(g, h, group_order):
    """
    Find x such that g^x = h by trying all x in [0, group_order).
    Returns x if found, None otherwise.
    """
    power = g^0   # start at g^0 = 1
    for x in range(group_order):
        if power == h:
            return x
        power = power * g
    return None

# Small example
p = 23
g = Mod(5, p)
h = Mod(8, p)
x = discrete_log_brute(g, h, p - 1)
print(f"log_{5}({8}) mod {p} = {x}")
print(f"Verification: 5^{x} mod {p} = {int(g^x)}")

In [None]:
# Brute force works fine for small groups
p = 23
g = Mod(5, p)

print("Solving all DLPs in Z/23Z*:")
for target in range(1, p):
    h = Mod(target, p)
    x = discrete_log_brute(g, h, p - 1)
    print(f"  log_5({target}) = {x}   [check: 5^{x} = {int(g^x)}]")

## 4. The One-Way Function: Fast Forward, Slow Backward

The cryptographic magic is the **asymmetry** between exponentiation and discrete log:

| Operation | Complexity | 2048-bit prime |
|-----------|-----------|----------------|
| Compute $g^x \bmod p$ (square-and-multiply) | $O(\log x)$ multiplications | ~2048 multiplications, milliseconds |
| Find $x$ from $g^x \bmod p$ (best known) | $O(\exp(c \cdot (\log p)^{1/3} (\log \log p)^{2/3}))$ | Sub-exponential but still infeasible |

This makes exponentiation a **one-way function** (under the DLP assumption): easy to compute, hard to invert.

Let us see this asymmetry in action.

In [None]:
import time

# Exponentiation: fast even for large numbers
p = next_prime(2^64)
g = Mod(primitive_root(p), p)
x_secret = randint(2, p - 2)

start = time.time()
h = g^x_secret
exp_time = time.time() - start
print(f"p has {p.ndigits()} digits")
print(f"Exponentiation: g^x mod p computed in {exp_time*1000:.3f} ms")

# Brute force: painfully slow even for "small" cryptographic sizes
p_small = next_prime(10^7)
g_small = Mod(primitive_root(p_small), p_small)
x_small = randint(2, p_small - 2)
h_small = g_small^x_small

start = time.time()
x_found = discrete_log_brute(g_small, h_small, p_small - 1)
brute_time = time.time() - start
print(f"\np_small has {p_small.ndigits()} digits ({p_small - 1} elements in group)")
print(f"Brute force:    found x in {brute_time*1000:.1f} ms")
print(f"Correct? {x_found == x_small}")

> **Checkpoint 2.** If brute force on a 7-digit prime takes about 1 second, roughly how long would brute force take on a 20-digit prime? (Hint: the group order grows with $p$, and brute force is $O(p)$.)

## 5. Visualising the Scramble

Let us build a more detailed picture of *why* discrete logs are hard. We will map out which exponent maps to which group element and look for patterns.

In [None]:
# Visualise the permutation induced by x -> g^x mod p
p = 47
g = Mod(primitive_root(p), p)

exponents = list(range(p - 1))
values = [int(g^x) for x in exponents]

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Plot 1: x -> g^x (the "scramble")
axes[0].scatter(exponents, values, s=10, color='blue')
axes[0].set_title(f'$g^x \mod {p}$ (g={int(g)})')
axes[0].set_xlabel('Exponent x')
axes[0].set_ylabel('$g^x \mod p$')

# Plot 2: histogram of values (should be uniform)
axes[1].hist(values, bins=range(1, p + 1), color='green', alpha=0.7, edgecolor='black')
axes[1].set_title('Distribution of $g^x \mod p$')
axes[1].set_xlabel('Value')
axes[1].set_ylabel('Count')

# Plot 3: differences between consecutive powers
diffs = [(values[i+1] - values[i]) % p for i in range(len(values)-1)]
axes[2].scatter(range(len(diffs)), diffs, s=5, color='red')
axes[2].set_title('Consecutive differences $g^{x+1} - g^x$')
axes[2].set_xlabel('x')
axes[2].set_ylabel('Difference mod p')

plt.tight_layout()
plt.show()

**Observations:**
- The scatter plot looks essentially random, no visible pattern linking $x$ to $g^x \bmod p$.
- The histogram is perfectly uniform: every value $1, 2, \ldots, p-1$ appears exactly once (since $g$ is a generator).
- The consecutive differences are also erratic, knowing $g^x$ gives no useful information about $g^{x+1}$ without computing it.

This "pseudorandom" behaviour is what makes the DLP hard: there is no shortcut that exploits structure in the output.

## 6. SageMath's `discrete_log()`

SageMath has a built-in `discrete_log()` function that uses a combination of algorithms (baby-step giant-step, Pohlig-Hellman, index calculus) to solve DLPs much faster than brute force. We will study these algorithms individually in notebooks 05e and 05f.

---

> **Bridge from Module 04.** In Module 04 we used `power_mod(g, x, p)` to compute $g^x \bmod p$ efficiently. Now `discrete_log()` does the *inverse*: given the result $h$, it recovers $x$. Notice the asymmetry, `power_mod` is always fast, while `discrete_log` becomes slow as $p$ grows.

In [None]:
# discrete_log(h, g) finds x such that g^x = h
p = next_prime(10^9)
g = Mod(primitive_root(p), p)
x_secret = randint(2, p - 2)
h = g^x_secret

print(f"p = {p} ({p.ndigits()} digits)")
print(f"g = {g}")
print(f"h = g^x = {h}")
print(f"(x_secret = {x_secret})")

start = time.time()
x_found = discrete_log(h, g)
elapsed = time.time() - start

print(f"\ndiscrete_log found: x = {x_found}")
print(f"Correct? {x_found == x_secret}")
print(f"Time: {elapsed*1000:.1f} ms")

In [None]:
# Scaling experiment: how does solve time grow with the group size?
print("Bits p digits Time (ms)")
for bits in [16, 20, 24, 28, 32, 36, 40]:
    p_test = next_prime(2^bits)
    g_test = Mod(primitive_root(p_test), p_test)
    x_test = randint(2, p_test - 2)
    h_test = g_test^x_test
    
    start = time.time()
    x_result = discrete_log(h_test, g_test)
    t = (time.time() - start) * 1000
    
    assert x_result == x_test, "discrete_log returned wrong answer!"
    print(f"{bits} {p_test.ndigits()} {t:>12.1f}")

Notice how the time grows rapidly with the number of bits. At cryptographic sizes (2048 bits), even SageMath's sophisticated algorithms cannot solve the DLP in reasonable time.

---

> **Crypto foreshadowing.** This one-way property is the bedrock of:
> - **Diffie-Hellman key exchange** (notebook 05c): Alice publishes $g^a$, Bob publishes $g^b$; neither can recover the other's secret exponent, yet both can compute $g^{ab}$.
> - **ElGamal encryption**: the public key is $h = g^x$; encrypting is easy, but decrypting without $x$ requires solving the DLP.
> - **DSA / Schnorr signatures** (Module 09): signing uses the secret exponent, verification uses only the public power.
> - **Elliptic curve cryptography** (Module 06): the same DLP idea, but in elliptic curve groups where the problem is even harder per bit.

## 7. Exercises

### Exercise 1 (Worked): Manual DLP in a Small Group

**Problem.** In $\mathbb{Z}/11\mathbb{Z}^*$ with generator $g = 2$, find $\log_2 7$ (i.e., find $x$ such that $2^x \equiv 7 \pmod{11}$).

**Solution.** Compute successive powers of 2 modulo 11:

| $x$ | $2^x \bmod 11$ |
|-----|----------------|
| 0 | 1 |
| 1 | 2 |
| 2 | 4 |
| 3 | 8 |
| 4 | 5 |
| 5 | 10 |
| 6 | 9 |
| 7 | 7 |

So $\log_2 7 = 7$ in $\mathbb{Z}/11\mathbb{Z}^*$.

In [None]:
# Exercise 1, verification
p = 11
g = Mod(2, p)
h = Mod(7, p)
x = discrete_log_brute(g, h, p - 1)
print(f"log_2(7) mod 11 = {x}")
print(f"Check: 2^{x} mod 11 = {int(g^x)}")

### Exercise 2 (Guided): Timing the One-Way Function

**Problem.** For a 50-bit prime $p$:
1. Compute $h = g^x \bmod p$ for a random $x$, and time it.
2. Recover $x$ using `discrete_log()`, and time it.
3. Compute the ratio: how many times slower is the "backward" direction?

*Hint: Use `next_prime(2^50)` for $p$ and `time.time()` for timing.*

In [None]:
# Exercise 2, fill in the TODOs
import time

p = next_prime(2^50)
g = Mod(primitive_root(p), p)
x_secret = randint(2, p - 2)

# TODO 1: Time the forward direction (exponentiation)
# start = time.time()
# h = ???
# forward_time = time.time() - start

# TODO 2: Time the backward direction (discrete log)
# start = time.time()
# x_found = ???
# backward_time = time.time() - start

# TODO 3: Print results and compute the ratio
# print(f"Forward:  {forward_time*1000:.3f} ms")
# print(f"Backward: {backward_time*1000:.1f} ms")
# print(f"Ratio:    {backward_time/forward_time:.0f}x slower")

### Exercise 3 (Independent): DLP Existence and Uniqueness

**Problem.**
1. In $\mathbb{Z}/13\mathbb{Z}^*$, let $g = 2$. Build the complete discrete log table (i.e., for each $h \in \{1, 2, \ldots, 12\}$, find $\log_2 h$). Verify that every element $h$ has a unique discrete log in $\{0, 1, \ldots, 11\}$.
2. Now try $g = 3$ in $\mathbb{Z}/13\mathbb{Z}^*$. Is $g = 3$ a generator? Build its power table and explain what you observe.
3. What happens if you try to compute $\log_3 2 \pmod{13}$ using brute force with only the powers of 3? Does it exist?

In [None]:
# Exercise 3, write your solution here


## Summary

| Concept | Key Fact |
|---------|----------|
| **Discrete log problem** | Given $g, h$ in a cyclic group, find $x$ such that $g^x = h$ |
| **Existence** | If $g$ generates the group, $\log_g h$ always exists and is unique mod $|G|$ |
| **Brute force** | Try all $x$: $O(|G|)$ time, infeasible for large groups |
| **One-way asymmetry** | Exponentiation is $O(\log x)$; discrete log is believed hard (sub-exponential best known for $\mathbb{Z}/p\mathbb{Z}^*$) |
| **Cryptographic use** | DLP hardness underlies DH, ElGamal, DSA, and ECC |

The DLP is only hard when the group is chosen carefully. In the next notebook, we will study **primitive roots and generators**, the elements that make $\mathbb{Z}/p\mathbb{Z}^*$ cyclic and give the DLP its full strength.

---

**Next:** [05b. Primitive Roots and Generators](05b-primitive-roots-generators.ipynb)