# **Error Correcting Codes**

There are two main kinds of Errors that can occur: 
1) Erasure Errors 
2) Corruption Errors

### **Erasure Errors**


Erasure errors occur during the transmission of data over unreliable channels such as the Internet, where packets of data can be lost. 
* To address this, we use a method of encoding the data which allows the reconstruction of the entire message even if some packets are lost.

1. **Assumptions**:
   - Each packet's content is considered a number modulo $q$, where $q$ is a prime number, ensuring calculations are performed in $GF(q)$, the finite field of $q$ elements.
   - The message consists of $n$ packets, and we can tolerate the loss of up to $k$ packets.

2. **Encoding the Message**:
   - A unique polynomial $P(x)$ of degree $n-1$ is constructed such that $P(i) = m_i$, where $m_i$ is the content of the $i$-th packet.
   - The polynomial is evaluated at additional points to create $n+k$ packets in total. This redundancy allows for the recovery of the message despite the loss of any $k$ packets.

3. **Decoding the Message**:
   - Using the received packets, one can reconstruct $P(x)$ using Lagrange interpolation from any $n$ out of the $n+k$ transmitted packets.
   - This reconstructed polynomial is then evaluated at $x = 1, 2, \dots, n$ to retrieve the original message packets $m_1, m_2, \dots, m_n$.

#### Mathematical Basis:
- **Polynomial Construction**: The polynomial $P(x)$ encodes the packets such that $P(i) = m_i$ for $i = 1, 2, \dots, n$.
- **Error Correction Capability**: The system can uniquely reconstruct the polynomial and hence the message from any $n$ received packets out of $n+k$, demonstrating the resilience against up to $k$ packet losses.

<br>

### **Example: Erasure Errors in Error-Correcting Codes**

Alice wants to send Bob a message of $n = 4$ packets over an unreliable channel that can lose packets. To guard against the loss of up to $k = 2$ packets, they use a polynomial-based encoding scheme.

#### Message Encoding:
- **Message**: Alice's message consists of packets $m_1 = 1$, $m_2 = 2$, $m_3 = 5$, $m_4 = 0$.
- **Polynomial**: A polynomial $P(x)$ of degree $n-1 = 3$ is used, given by $P(x) = x^3 + 4x^2 + 5$ over $GF(7)$ (finite field of size 7).
- **Encoding**: Evaluate $P(x)$ at points $x = 1$ to $x = 6$ to create $n+k = 6$ packets:
  - Encoded packets: $P(1) = 3$, $P(2) = 2$, $P(3) = 5$, $P(4) = 0$, $P(5) = 6$, $P(6) = 1$.

#### Packet Transmission and Loss:
- Suppose packets 2 and 6 are lost during transmission. Bob receives packets: $P(1) = 3$, $P(3) = 5$, $P(4) = 0$, $P(5) = 6$.

#### Decoding the Message:
- **Lagrange Interpolation**: Bob uses the received values to reconstruct $P(x)$.
  - Basis polynomials:
    $$
    \Delta_1(x) = \frac{(x-3)(x-4)(x-5)}{(1-3)(1-4)(1-5)} \mod 7 = 4(x-3)(x-4)(x-5)
    $$
    $$
    \Delta_3(x) = \frac{(x-1)(x-4)(x-5)}{(3-1)(3-4)(3-5)} \mod 7 = 5(x-1)(x-4)(x-5)
    $$
    $$
    \Delta_4(x) = \frac{(x-1)(x-3)(x-5)}{(4-1)(4-3)(4-5)} \mod 7 = 6(x-1)(x-3)(x-5)
    $$
    $$
    \Delta_5(x) = \frac{(x-1)(x-3)(x-4)}{(5-1)(5-3)(5-4)} \mod 7 = 1(x-1)(x-3)(x-4)
    $$
- **Reconstruct $P(x)$**:
  $$
  P(x) = 3\Delta_1(x) + 5\Delta_3(x) + 0\Delta_4(x) + 6\Delta_5(x) = x^3 + 4x^2 + 5 \mod 7
  $$
- **Message Recovery**: Evaluate $P(x)$ at $x = 1, 2, 3, 4$ to recover $m_1, m_2, m_3, m_4$.





## **General Errors / Corruption Errors**

In this class, we defined general errors as errors which occur during data transmission or storage, causing the data to become altered or damaged in some way.
* So the message is the same length as when it left, but now parts of it have been corrupted / changed, and we don't know which parts those are

If we anticipate that $k$ packets will be corrupted during transmission, we can send out a total of $n + 2k$ packets across the channel in order to account for this

Now that we've received these packets, we don't know which ones are corrupted or not 

We can make use of the error locator polynomial, given by: 
$$E(X) = (x - e_1)(x - e_2) ... (x - e_k)$$
* This polynomial is of degree $k$, as there are $k$ corruption points 
* We will eventually solve for the positions $e_i$ of the errors 
* If we simplify this down to $x^k + b^{k - 1}(x^{k-1}) +... + b_1x +  b_0$, we know that the first coefficient must always be $1$

We will define the polynomial 

$$Q(x) = P(x)E(x)$$ 

* This polynomial has degree $n + k - 1$ and is described by $n + k$ coefficients $a_0, a_1, ... a_{n + k - 1}$

We can write the following equations for both $Q(x)$ and $E(x)$

$$Q(x) = a_{n+ k - 1}x^{n+ k - 1} +  a_{n+ k - 2}x^{n+ k - 2} + ... + a_1x + a_0$$

$$E(x) = x^k + b_{k - 1}x^{k-1} + ... + b_1x + b_0$$

We don't know any of the coefficients yet: 

Let's now notice that: 

$$Q(i) = r_iE(i) \text{ for } 1 \leq i \leq n + 2k$$

Writing out the $i$ th equation using the coefficients of $Q(x)$ and $E(x)$, we get 

$$a_{n+k-1} i^{n+k-1} + a_{n+k-2} i^{n+k-2} + \cdots + a_1 i + a_0 = r_i (i^k + b_{k-1} i^{k-1} + \cdots + b_1 i + b_0) \pmod{q}$$

* This is a set of $n + 2k$ linear equations, one for each vlaue of $i$, in the $n + 2k$ unknowns $a_0, a_1, ... a_{n + k - 1}, b_0, b_1, ... b_{k - 1}$

We can solve the system to get $E(x)$ and $Q(x)$. We can then compute the ration $\frac{Q(x)}{E(x)}$ to obtain $P(x)$ 

#### Problem Setup:
Alice sends a message to Bob over a noisy channel where the message consists of $n$ packets, and up to $k = 1$ of these might be corrupted. The message is encoded as values of a polynomial $P(x)$ of degree $n-1$ evaluated at $n+2k$ points to account for errors.

#### Encoding and Transmission:
- **Polynomial Encoding**: The message packets $m_1, \dots, m_n$ are encoded using a polynomial $P(x)$ over a finite field $GF(q)$.
  * We find the $P(x) = x^2 + x + 1$


- **Transmission**: $P(x)$ is evaluated at $n+2k$ points, providing redundancy that allows for error correction.

Let's assume that we choose $q = 7$

In this setup, we assume the following encoded and received messages respectively:
\[
\begin{bmatrix}
c_1 & c_2 & c_3 & c_4 & c_5 \\
3 & 0 & 6 & 0 & 3 
\end{bmatrix}
$\rightarrow$ 
\begin{bmatrix}
r_1 & r_2 & r_3 & r_4 & r_5 \\
2 & 0 & 6 & 0 & 3 
\end{bmatrix}
\]

Recall that $c_1, c_2, c_3$ is the original message sent while $c_4$ and $c_5$ are the extra characters to account for error

#### Error Correction via Berlekamp-Welch Algorithm:
- **Received Data**: Bob receives $n+2k$ packets, among which up to $k$ might be erroneous.
- **Algorithm Task**: Construct two polynomials, $Q(x)$ and the error-locator polynomial $E(x)$, such that:
  $$
  Q(x) = P(x)E(x)
  $$
  where $E(x)$ has roots at the error positions and degree $k$.

  * In this case, $Q(x) = a_3 x^3 + a_2 x^2 + a_1 x + a_0$, where each of the coefficients $a_i$ are unknown
  * Bob also doesn't know that $r_1$ is the actual corrupted position of the message
  * Since $k$ is $1$, the degree of $E(x)$ is also $1$. Thus, $x -e_1 = x + b_0$ or $-e_1 = b_0$

He then sets up the system of equations by substituting $i=1$ to $5$ into $Q(i) = r_i E(i)$, and simplifies to determine the coefficients of $E(x)$:


\begin{aligned}
a_3 + a_2 + a_1 + a_0 + 5b_0 &= 2 \\
a_3 + 4a_2 + 2a_1 + a_0 &= 0 \\
6a_3 + 2a_2 + 3a_1 + a_0 + b_0 &= 4 \\
a_3 + 2a_2 + 4a_1 + a_0 &= 0 \\
6a_3 + 4a_2 + 5a_1 + a_0 + 4b_0 &= 1 
\end{aligned}

*Note that we are always doing our calculatons in (modulo 7) for this example

Solving this system, Bob finds that the coefficients are $a_3 = 1, a_2 = 0, a_1 = 0, a_0 = 6,$ and $b_0 = 6$. 

The polynomial $E(x) = x + 6 = x - 1$ indicates that the error is at $x=1$ (the first character) and the error value was $6$.
  * $6$ is equivalent to $-1$ in mod $7$, and we also want the error equation to be in $x -e_1$ form to directly see the position of the error

 To retrieve the original message, Bob calculates $P(x) = Q(x) / E(x) = \frac{x^3 + 6}{x - 1} = x^2 + x + 1 \pmod 7$
 * Bob notices the first character was corrupted ($e_1 = 1$) so now he computes $P(1) = 3$ and retrieves the original message at that point.