### Maximum likelihood decoder

When a codeword is sent through a physical channel, some bits may be corrupted. As a result, the received message might differ from the original codeword. Decoding aims to recover the original codeword despite these errors. Without knowing the exact errors that occurred, we cannot be certain that decoding will find the intended message. So instead we try to find the codeword that is most likely to have been sent. More precisely, let $\mathcal{C}$ be a code of length $n \in \mathbb{Z}_{>0}$ and suppose we receive a message $m \in \{0,1\}^n$. Then we wish to find a codeword $\hat{c} \in \mathcal{C}$ such that 
\begin{equation*}
    \mathbb{P}(\text{ Sent } \hat{c} \mid \text{ Received } m) = \mathrm{max}_{c \in \mathcal{C}} \{\mathbb{P}(\text{ Sent } c \mid \text{ Received } m)\}.
\end{equation*}

We denote $\mathbb{P}(\text{ Sent } c \mid \text{ Received } m)$ by  $\mathbb{P}(c \mid m)$. 
The application of Bayes Theorem to $\mathbb{P}(c \mid m)$ results in
$$\mathbb{P}(c \mid m) = \frac{\mathbb{P}(m \mid c)\cdot \mathbb{P}(c)}{\mathbb{P}(m)}.$$
Notice that $\mathbb{P}(m)$ does not depend on the specific codeword $c$, and so, to maximise $\mathbb{P}(c \mid m)$ over the code $\mathcal{C}$ it suffices to maximise $$\mathbb{P}(m \mid c)\cdot \mathbb{P}(c)$$ over $\mathcal{C}$. To do this, we first make a couple of assumptions:

1. **Uniform codeword distribution:** Each codeword is equally likely to be sent. That is, for all codewords $c \in \mathcal{C}$ we have that $$\mathbb{P}(c) = \frac{1}{|\mathcal{C}|}.$$ Thus, maximising $\mathbb{P}(m \mid c)\cdot \mathbb{P}(c)$ over $\mathcal{C}$ is equivalent to maximising $\mathbb{P}(m \mid c)$ over $\mathcal{C}$.

2. **All errors are bit-flips:** A codeword $c \in \mathcal{C}$ produces a message $m$ that is a binary string of the same length. 

3. **Bit-flips are independent:** Each bit flips independently of others. That is, for a message $m=m_1m_2\dots m_n$ and a codeword $c=c_1c_2\dots c_n$, we have that
    $$\mathbb{P}(m \mid c) = \prod_{i=1}^n \mathbb{P}(m_i \mid c_i).$$
    In particular, since both $m_i$ and $c_i$ are bits, we have that:
    $$ \mathbb{P}(m_i \mid c_i) = \begin{cases} \mathbb{P}(c_i \text{ not flipped}) &\text{ if } m_i=c_i \\ \mathbb{P}(c_i \text{ flipped to } m_i) &\text{ if } m_i \neq c_i. \end{cases} $$

Therefore, to compute $\mathbb{P}(c \mid m)$ for any codeword $c$ and message $m$ of length $n \in \mathbb{Z}_{>0}$, it suffices to know for each $1 \leq i \leq n$, the probabilities
$$ \mathbb{P}(c_i \text{ flips } 0 \to 1) \; \text{ and } \; \mathbb{P}(c_i \text{ flips } 1 \to 0). $$
A _binary symmetric channel_ is a channel where these two probabilities coincide and a bit flips with the same probability from $0$ to $1$ as it does from $1$ to $0$.

---
##### Example.

Let us consider the code $\mathcal{C} = \{000, 111\}$. Suppose that we use a binary symmetric channel and each bit has a probability $p=\frac{1}{4}$ of flipping regardless of position. Suppose that we receive the message $m=100$ and we wish to compute the most likely codeword it came from. Since maximising $\mathbb{P}(c \mid m)$ over $\mathcal{C}$ is equivalent to maximising $\mathbb{P}(m \mid c)$ over $\mathcal{C}$, it suffices to compute $\mathbb{P}(m \mid c)$ for each codeword $c \in \mathcal{C}$. That is, for $c=000$ and $c=111$.

First suppose the codeword $c$ was $000$. Then receiving the message $m=100$ means the first bit flipped and the other two did not flip (TODO: can a bit flip more than once?). In particular, we have
\begin{align*}
    \mathbb{P}(m=100 \mid c=000) &= \mathbb{P}(m_1=1 \mid c_1=0) \cdot \mathbb{P}(m_2=0 \mid c_2=0) \cdot \mathbb{P}(m_3=0 \mid c_3=0) \\ &= \mathbb{P}(c_1 \text{ did flip}) \cdot \mathbb{P}(c_2 \text{ did not flip}) \cdot \mathbb{P}(c_3 \text{ did not flip}) \\ &= \frac{1}{4} \cdot \frac{3}{4} \cdot \frac{3}{4} = \frac{9}{64}. 
\end{align*}
Now suppose the codeword $c$ was $111$, then receiving the message $m=100$ means the first bit did not flip and the other two did flip, so we have
\begin{equation*}
    \mathbb{P}(m=100 \mid c=111) = \frac{3}{4} \cdot \frac{1}{4} \cdot \frac{1}{4} = \frac{3}{64}. 
\end{equation*}
Therefore, as $\mathbb{P}(m=100 \mid c=000)$ is larger than $\mathbb{P}(m=100 \mid c=111)$ it is most likely that the message $m=100$ came from the codeword $c=000$.
<!-- 
Therefore, we have that
\begin{align*}
    \sum_{b \in \mathcal{C}} \mathbb{P}(m \mid b) &= P(m \mid c=000) + P(m \mid c=111) 
    \\
    &= \frac{9}{64} + \frac{3}{64} = \frac{3}{16}.
\end{align*}
Thus, substituting these probabilities in, it follows that
$$\mathbb{P}(c=000 \mid m) = \frac{1}{|\mathcal{C}|} \cdot \frac{\mathbb{P}(m \mid c=000)}{\sum_{d \in \mathcal{C}} \mathbb{P}(m \mid d)} =  \frac{1}{1}\cdot \frac{\frac{9}{64}}{\frac{3}{16}} = \frac{3}{4},$$
and
$$\mathbb{P}(c=111 \mid m) = \frac{1}{|\mathcal{C}|} \cdot \frac{\mathbb{P}(m \mid c=111)}{\sum_{d \in \mathcal{C}} \mathbb{P}(m \mid d)} =  \frac{1}{1}\cdot \frac{\frac{3}{64}}{\frac{3}{16}} = \frac{1}{4}.$$
Therefore, if $m=100$ is the transmitted message, then the most likely codeword that was sent is $c=000$ with a probability of $\frac{3}{4}$. -->

We can compute the most likely codeword for each of the possible messages $m \in (\mathbb{Z}/2\mathbb{Z})^3$.

<!-- | Message $m$   | $\mathbb{P}(m \mid c=000)$ | $\mathbb{P}(m \mid c=111)$ | $\sum_{b \in \mathcal{C}} \mathbb{P}(m \mid b)$ | $\mathbb{P}(c=000 \mid m)$ | $\mathbb{P}(c=111 \mid m)$ |
| ---                    | ---                    | ---                    | ---             |---                    | ---                    |
| 000                    | $\frac{27}{64}$        | $\frac{1}{64}$         | $\frac{7}{16}$  | $\frac{27}{28}$        | $\frac{1}{28}$         |
| 100, 010, 001          | $\frac{9}{64}$         | $\frac{3}{64}$         | $\frac{3}{16}$  | $\frac{3}{4}$         | $\frac{1}{4}$         |
| 110, 101, 011          | $\frac{3}{64}$         | $\frac{9}{64}$         | $\frac{3}{16}$  | $\frac{1}{4}$         | $\frac{3}{4}$         |
| 111                    | $\frac{1}{64}$         | $\frac{27}{64}$        | $\frac{7}{16}$  | $\frac{1}{28}$         | $\frac{27}{28}$        |  -->

| Message $m$   | $\mathbb{P}(m \mid c=000)$ | $\mathbb{P}(m \mid c=111)$  | Most likely codeword $c$ |
| ---                    | ---                    | ---                    | ---             |
| 000                    | $\frac{27}{64}$        | $\frac{1}{64}$         | 000 <!-- with probability $\frac{27}{28}$ --> |
| 100, 010, 001          | $\frac{9}{64}$         | $\frac{3}{64}$         | 000 <!-- with probability $\frac{3}{4}$ --> |
| 110, 101, 011          | $\frac{3}{64}$         | $\frac{9}{64}$         | 111 <!-- with probability $\frac{3}{4}$ --> |
| 111                    | $\frac{1}{64}$         | $\frac{27}{64}$        | 111 <!-- with probability $\frac{27}{28}$ --> |


---


In [None]:
from linearcodes import Codeword, LinearCode, HammingCode
import random
import numpy as np
import time

def send_codeword(
        code_length: int,
        channel_probabilities: list[float],
        codeword: Codeword):
    if len(channel_probabilities) != code_length:
        raise ValueError(f"List of probabilities must be of length code_length={code_length}. Got {len(channel_probabilities)}.")
    if len(codeword) != code_length:
        raise ValueError(f"Codeword must of length code_length={code_length}. Got {len(codeword)}.")
    message = []
    for bit_idx in range(code_length):
        p = channel_probabilities[bit_idx]
        c = codeword.bits[bit_idx]
        r = np.random.uniform(0,1)
        if r < p:
            flip_bit = (c+1)%2
            message.append(flip_bit)
        else:
            message.append(c)
    return message

def maximum_likelihood_decoder(channel_probabilities: list[float],
                               code: LinearCode, 
                               message: list[int]) -> Codeword:
    """
    Returns the codeword that was most likely to be sent given message is received.
    Args:
        - channel_probabilities (list[float]):
            entry with index i is the probability bit i will flip.
        - code (lc.LinearCode):
            code used for encoding.
        - message (list[int]):
            received message.
    """
    n = code.length
    if len(channel_probabilities) != n:
        raise ValueError(f"List of probabilities must be of length {n}. Got {len(channel_probabilities)}.")
    if len(message) != n:
        raise ValueError(f"Message must be of length {n}. Got {len(message)}.")
    codewords = code.codewords
    most_likely_codeword = Codeword([0]*n)
    most_likely_codeword_prob = 0
    for c in codewords:       
        c_probability = 1
        for bit_idx in range(n):
            p = channel_probabilities[bit_idx]
            if c.bits[bit_idx] != message[bit_idx]:
                c_probability *= p
            else:
                c_probability *= (1-p)
        if c_probability > most_likely_codeword_prob:
            most_likely_codeword = c
            most_likely_codeword_prob = c_probability
    return most_likely_codeword

code = HammingCode(3)
code_length = code.length
codewords = code.codewords
channel_probabilities = [float(0.1)]*code_length
results: dict[int, tuple[bool, float]] = {}

num_trials = 10
for trial in range(num_trials):
    print(f"\n=== Maximimum likelihood decoder {trial+1} ===")
    codeword = random.choice(codewords)

    message = send_codeword(code_length, channel_probabilities, codeword)
    start_time = time.time()
    most_likely_codeword = maximum_likelihood_decoder(channel_probabilities, code, message)
    decode_time = time.time() - start_time

    # print(f"\nRows are: sent codeword c, received message m, most likely sent codeword c'.")
    data = np.array([codeword.bits, message, most_likely_codeword.bits])
    # print(data)

    decode_correct = codeword == most_likely_codeword
    # print(f"\nDecoding correct? {decode_correct}")
    # print(f"Actual number of errors: {sum(1 for bit_idx in range(code_length) if codeword.vector[bit_idx] != message[bit_idx])}")
    # print(f"Guessed number of errors: {sum(1 for bit_idx in range(code_length) if most_likely_codeword.vector[bit_idx] != message[bit_idx])}")

    results[trial] = (decode_correct, decode_time)
    print(f"Decode correct? {decode_correct}")
    print(f"Time to decode = {decode_time*(10**5):.3f}x10^5")

#### Hamming distance

We can simplify the Maximum Likelihood Decoder by using the Hamming distance. Recall maximising $\mathbb{P}(c \mid m)$ is equivalent to maximising $\mathbb{P}(m \mid c) = \prod_{i=1}^n \mathbb{P}(m_i \mid c_i)$. Let us assume we have a binary symmetric channel such that the probability the probability a bit flips is $p$. Therefore, we have that
$$
\mathbb{P}(m_i \mid c_i) =
\begin{cases}
p & \text{if } m_i \neq c_i, \\
1-p & \text{if } m_i = c_i.
\end{cases}
$$
Notice that the probabilities depend only whether the bits are the same or different. In particular, the power of $p$ in the product $\prod_{i=1}^n \mathbb{P}(m_i \mid c_i)$ is precisely the number of bits in $m$ that differ from the bits in $c$, that is the Hamming distance $d(m,c)$. Therefore, we have that 
$$\mathbb{P}(m \mid c) = \prod_{i=1}^n \mathbb{P}(m_i \mid c_i) = p^{d(m,c)}(1-p)^{n-d(m,c)}.$$
Rearranging this expression we obtain
$$\mathbb{P}(m \mid c) = \left( \frac{p}{1-p}\right)^{d(m,c)}  \cdot (1-p)^{n}.$$
Note that $(1-p)^{n}$ depends neither on the codeword $c$ nor the message $m$. Thus, maximising $\mathbb{P}(m \mid c)$ over $\mathcal{C}$ is equivalent to maximising
$$\left( \frac{p}{1-p}\right)^{d(m,c)}$$
over $\mathcal{C}$.

By convention we assume that $p < \frac{1}{2}$. If it is not, then we can assume the received message is more likely to be wrong and flip all the bits before working with them .... (Fix this phrasing!). Consequently, the value $\frac{p}{1-p}$ is less than $1$ and so to maximise $$\left( \frac{p}{1-p}\right)^{d(m,c)}$$ we must minimise $d(m,c)$. In particular, we have that the codeword $\hat{c}$ that maximises $\mathbb{P}(c \mid m)$ is precisely the codeword that minimises $d(m,c)$.

---
**Example.**

Let us consider the example $\mathcal{C} = \{000,111\}$ again. Suppose we receive the message $m=100$. We know from previous computations that the most likely codeword that was sent is $c=000$. We can recompute this using the Hamming distance. In particular, we have that
$$ d(m=100,c=000) = 1 \; \text{ and } \; d(m=100,c=111) = 2 $$
and so $c=000$ minimises the Hamming distance.

As before we can compute the Hamming distance and most likely codeword for all possible messages $m \in \{0,1\}^3$.
| Message $m$   | $d(m, c=000)$ | $d(m, c=111)$ | Most likely codeword $c$ |
| ---                    | ---                    | ---                    | ---             |
| 000                    | 0        | 3         | 000  |
| 100, 010, 001          | 1         | 2         | 000  |
| 110, 101, 011          | 2         | 1         | 111  |
| 111                    | 3         | 0        | 111  |
---

<!-- 

Moreover, since the log function is monotonically increasing maximising $\mathbb{P}(m \mid c)$ is equivalent to maximising $\mathrm{log}(\mathbb{P}(m \mid c)) $. (TODO: make this a more natural jump.) By applying the log function we obtain
$$ \mathrm{log}(\mathbb{P}(m \mid c)) = \mathrm{log}(\prod_i \mathbb{P}(m_i \mid c_i)) = \sum_i \mathrm{log}(\mathbb{P}(m_i \mid c_i)). $$
Now recall that for a binary symmetric channel we assume that 
In particular, it depends only on when bits are the same and when they are not, similar to the Hamming distance. Thereofore, by using the Hamming distance between two codewords we obtain
$$ \sum_i \mathrm{log}(\mathbb{P}(m_i \mid c_i)) = d(c,m)\mathrm{log}(p) + (1-d(c,m))\mathrm{log}(1-p). $$
Simplifying this we obtain
$$ \sum_i \mathrm{log}(\mathbb{P}(m_i \mid c_i)) =  d(c,m)\mathrm{log}(\frac{p}{1-p}) + \mathrm{log}(1-p). $$
Notice that $\mathrm{log}(1-p)$ does not depend on $c$, so maximising $\mathbb{P}(c \mid m)$ is equivalent to maximising $$d(c,m)\mathrm{log}(\frac{p}{1-p})$$. Since we assume that $p < 0.5$, we have that $\mathrm{log}(\frac{p}{1-p})$ is negative. Thus, maximising over $d(c,m)\mathrm{log}(\frac{p}{1-p})$ is equivalent to minimsiing over $d(c,m)$. Therefore, we have that
$$ \mathrm{max}_{c \in \mathcal{C}} (\mathbb{P}(c \mid m)) = \mathrm{min}_{c \in \mathcal{C}} (d(c,m)).$$

#### Example

Take the previous example with the code $\mathcal{C} = \{000,111\}$. We can easily compute the Hamming distances between any message $m$ and codeword $c$ to obtain the following table:

| Message $m$   | $d(m, c=000)$ | $d(m, c=111)$ | Closest = most likely codeword |
| ---                    | ---                    | ---                    | ---             |
| 000                    | 0        | 3         | 000  |
| 100, 010, 001          | 1         | 2         | 000  |
| 110, 101, 011          | 2         | 1         | 111  |
| 111                    | 3         | 0        | 111  | 

---
-->
The Hamming distance makes it easier to find the most likely codeword. However, for a code of rank $k$, it still requires computing and comparing $2^k$ different numbers. Therefore, as the code gets bigger the Maximum Likelihood Decoder becomes computationally impractical. We need more efficient algorithms that estimate the most likely codeword without having to compute the Hamming distances explicitly.

In [None]:
from linearcodes import Codeword, LinearCode, HammingCode
import random
import numpy as np
import time

def send_codeword(
        code_length: int,
        channel_probabilities: list[float],
        codeword: Codeword):
    if len(channel_probabilities) != code_length:
        raise ValueError(f"List of probabilities must be of length code_length={code_length}. Got {len(channel_probabilities)}.")
    if len(codeword) != code_length:
        raise ValueError(f"Codeword must of length code_length={code_length}. Got {len(codeword)}.")
    message = []
    for bit_idx in range(code_length):
        p = channel_probabilities[bit_idx]
        c = codeword.bits[bit_idx]
        r = np.random.uniform(0,1)
        if r < p:
            flip_bit = (c+1)%2
            message.append(flip_bit)
        else:
            message.append(c)
    return message

def hamming_distance(x: list[int], y: list[int]) -> int:
    if len(x) != len(y):
        raise ValueError("Both x and y must be the same length.")
    return sum([1 for bit_idx in range(len(x)) if x[bit_idx] != y[bit_idx]])


def minimise_hamming_distance(code_length: int,
                              codewords: list[Codeword],
                              message: list[int]) -> Codeword:
    """
    Returns the codeword that was most likely to be sent given message is received using Hamming distance.
    Args:
        - channel_probabilities (list[float]):
            entry with index i is the probability bit i will flip.
        - code (lc.LinearCode):
            code used for encoding.
        - message (list[int]):
            received message.
    """
    if len(message) != code_length:
        raise ValueError(f"Message must be of length {code_length}. Got {len(message)}.")
    most_likely_codeword = Codeword([0]*code_length)
    min_hamming_dist = code_length
    for c in codewords:       
        hamming_dist = hamming_distance(c.bits, message)
        if hamming_dist < min_hamming_dist:
            most_likely_codeword = c
            min_hamming_dist = hamming_dist
    return most_likely_codeword

code = HammingCode(3)
code_length = code.length
codewords = code.codewords
channel_probabilities = [float(0.1)]*code_length
results: dict[int, tuple[bool, float]] = {}

num_trials = 10
for trial in range(num_trials):
    print(f"\n=== Minimise Hamming distance decoder {trial+1} ===")
    codeword = random.choice(codewords)

    message = send_codeword(code_length, channel_probabilities, codeword)
    start_time = time.time()
    most_likely_codeword = minimise_hamming_distance(code_length, codewords, message)
    decode_time = time.time() - start_time

    print(f"\nRows are: sent codeword c, received message m, most likely sent codeword c'.")
    data = np.array([codeword.bits, message, most_likely_codeword.bits])
    print(data)

    decode_correct = codeword == most_likely_codeword
    print(f"\nDecoding correct? {decode_correct}")
    print(f"Actual number of errors: {sum(1 for bit_idx in range(code_length) if codeword.bits[bit_idx] != message[bit_idx])}")
    print(f"Guessed number of errors: {sum(1 for bit_idx in range(code_length) if most_likely_codeword.bits[bit_idx] != message[bit_idx])}")

    results[trial] = (decode_correct, decode_time)
    print(f"\nDecode correct? {decode_correct}")
    print(f"Time to decode = {decode_time*(10**5):.3f}x10^5")


=== Minimise Hamming distance decoder 1 ===

Rows are: sent codeword c, received message m, most likely sent codeword c'.
[[1 0 1 0 0 1 0]
 [1 1 1 0 0 1 0]
 [1 0 1 0 0 1 0]]

Decoding correct? True
Actual number of errors: 1
Guessed number of errors: 1

Decode correct? True
Time to decode = 12.827x10^5

=== Minimise Hamming distance decoder 2 ===

Rows are: sent codeword c, received message m, most likely sent codeword c'.
[[0 0 0 0 1 1 1]
 [0 0 0 0 1 1 1]
 [0 0 0 0 1 1 1]]

Decoding correct? True
Actual number of errors: 0
Guessed number of errors: 0

Decode correct? True
Time to decode = 2.313x10^5

=== Minimise Hamming distance decoder 3 ===

Rows are: sent codeword c, received message m, most likely sent codeword c'.
[[1 0 1 0 1 0 1]
 [1 0 1 0 1 1 1]
 [1 0 1 0 1 0 1]]

Decoding correct? True
Actual number of errors: 1
Guessed number of errors: 1

Decode correct? True
Time to decode = 2.170x10^5

=== Minimise Hamming distance decoder 4 ===

Rows are: sent codeword c, received messa