### Execute the cell below before proceeding.

The code in this cell will download a file with a Python script from the Internet. Make sure that you have a network connection before executing it.  

In [1]:
import requests
with open("hamming.py", 'w') as foo:
    foo.write(requests.get("https://raw.githubusercontent.com/bbadzioch/MTH309_F2019/master/notebooks_2024/hamming.py").text)
from hamming import *

# Error correction with Hamming codes

This notebook illustrates the process of correcting errors in data using the [Hamming (7,4) error correcting code](https://en.wikipedia.org/wiki/Hamming(7,4)).  As an example, take an array consisting of 4 bits (zeros and ones): 

In [2]:
data = [0,0,1,1]

The `hamming_encode()` function uses the Hamming (7,4) code to add three additional bits to this array in order to enable error correction: 

In [3]:
#encode data
encoded_data = hamming_encode(data)
#print the result
print(encoded_data)

[0, 0, 1, 1, 0, 0, 1]


Next, we introduce an error in the encoded data by changing its first bit:

In [4]:
#the next statement assigns the value of 1 to the first array element
#in Python array indexing starts with 0 which is why we use encoded_data[0]
encoded_data[0] = 1
print(encoded_data)

[1, 0, 1, 1, 0, 0, 1]


The `hamming_decode()` function can be then applied to recover the original array, removing the error:

In [5]:
hamming_decode(encoded_data)

[0, 0, 1, 1]

## Application: correcting errors in text

The next example shows how the Hamming code can work in practice. We will use it to correct errors that can occur in transmission of text strings. Like all data stored and processed by computers, text string are represented in the computer memory as arrays of zeros and ones. The `text2bits()` function shows such binary representation of a text string:    

In [6]:
s = "hi"
bits = text2bits(s)
print(bits)

[0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]


In the above array each subsequence of 8 consecutive bits encodes one character of the text. Since the string "hi" consists of 2 characters, the array is 16 bits long.

The `bits2text()` function converts an array of bits into a text string:

In [7]:
bits2text(bits)

'hi'

By modifying an element of the array of bits we introduce an error in the text encoding:

In [8]:
bits[0] = 1
print(bits)

[1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]


As a result the text recovered from the array will be distorted:

In [9]:
bits2text(bits)

'èi'

If we use the Hamming code we can recover from such errors. Here is an example. First, we convert text into bits:

In [10]:
s = 'hi'
bits = text2bits(s)
print(bits)

[0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]


Next, we apply Hamming encoding:

In [11]:
encoded = hamming_encode(bits)
print(encoded)

[0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0]


Then, we introduce a couple of errors in the encoded array, by changing the first and the last bit:

In [12]:
encoded[0] = 1
encoded[-1] = 1
print(encoded)

[1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1]


Finally we apply Hamming decoding to the array with errors, and translate the resulting bits into text:

In [13]:
decoded = hamming_decode(encoded)
print(decoded)
bits2text(decoded)

[0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]


'hi'

The original text is recovered!

## Testing error correction with larger texts

The function `text_compare()` lets us experiment with larger text strings. This function takes a text string, replaces it by an array of bits, and randomly introduces some errors into this array. The second argument of the function, `p` is the probability that a bit of the array will contain an error. For example, if `p=0.01`  the probability that a bit is correct will be 99%, and the probability that it contains an error will be 1%. The higher value of `p` the bigger chances for each bit to contain an error and, in effect, the more errors we will get in the bit array. 

`text_compare()` shows the text recovered from the bit array with errors. In one column text is obtained straight from the array, without error correction, in the second column error correction is applied. Red background indicates characters in the text that are not correct:

In [14]:
sample_text = "This is a test of the Hamming (7,4) code"

In [15]:
text_compare(sample_text, p=0.01)

Increasing the value of `p` produces more errors. Since the Hamming (7,4) code can correct one error per 7 bits of data, if two errors appear close to each other then code may not be able to handle it:

In [16]:
text_compare(sample_text, p=0.05)

As a larger text sample we will use a fragment of "The Adventures of Tom Sawyer" downloaded from the [Project Gutenberg](http://www.gutenberg.org) website.

In [17]:
#download the file
r = requests.get('http://www.gutenberg.org/files/74/74-0.txt')
r.encoding = 'utf8'
sawyer = r.text

#select a text sample from a given range of characters in the text file
sample = sawyer[30000:31400]

Here is the text sample we produced:

In [18]:
print(sample)

 the middle of the afternoon came, from being a
poor poverty-stricken boy in the morning, Tom was literally rolling in
wealth. He had besides the things before mentioned, twelve marbles, part
of a jews-harp, a piece of blue bottle-glass to look through, a spool
cannon, a key that wouldn’t unlock anything, a fragment of chalk, a
glass stopper of a decanter, a tin soldier, a couple of tadpoles,
six fire-crackers, a kitten with only one eye, a brass door-knob, a
dog-collar—but no dog—the handle of a knife, four pieces of orange-peel,
and a dilapidated old window sash.

He had had a nice, good, idle time all the while—plenty of company—and
the fence had three coats of whitewash on it! If he hadn’t run out of
whitewash he would have bankrupted every boy in the village.

Tom said to himself that it was not such a hollow world, after all. He
had discovered a great law of human action, without knowing it—namely,
that in order to make a man or a boy covet a thing, it is only necessary
to make t

Here is the `text_compare()` function applied to this text sample:

In [19]:
text_compare(sample, p=0.005)