# Lecture 9.1 - coding and decoding messages as integers

## Summary 

### Programming

- ASCII encoding 
- The functions `chr`, `ord` and `bin` 
- Coding characters as binary strings
- Converting text/strings to integers 
- Converting integers to text/strings

## Main topics 

- Coding and decoding texts as integers.

### Note on this Lecture 

This lecture gives you a guide on coding and decoding messages/strings into integers.  The main motivation here is to learn how to develop encoding and decoding functions such as  `convert_to_integer` and `convert_to_string` below which allow us to encode a message into an integer to be encrypted and transmitted and then, following decryption to decode the (same) integer into the original message. 

## Coding and decoding texts as integers 

Modern computers and programming languages make available $100000$ **unicode** characters covering the alphabets around the world. **ASCII** (American Standard Code for Information Interchange) comprises a subset of $256$ characters. This subset is sufficient for communicating in most European languages, so we will restrict ourselves to using ASCII characters only.   

Python has built-in commands `chr` and `ord`for converting between code numbers (here in the interval &nbsp;$[0,255]$&nbsp;) and characters. 

**Note.** The character encoding in python only  corresponds  exactly to ASCII encoding over the first 128 characters. For simplicity here we refer to the encoding of the first 256 characters as **ASCII**. 

In [None]:
chr(65)

In [None]:
ord('A')

In order to help us convert messages (i.e. strings of characters) to integers and back we will also be interested in the binary representation of characters. 

In [None]:
bin(ord('A'))

Well yes: we need to get rid of `'0b'` at the start. 

In [None]:
bin(ord('A'))[2:]

Now let's get an overview of the characters that will be available to us. We print below the numbers `0` to `255` in the first column, then the binary representation and the character itself in the next two columns. Note that when the third column is empty this means either that we have the space  `' '` in the case of number `32`, or  otherwise that the associated 'character' is a special non-printing character. For example `7` stands for the `'bell'` which can be used to make your computer beep in some way.     

In [None]:
print("ASCII   Binary ASCII   Character")
print("=====   ============   =========")
for n in range(0,256):
    n_bin = bin(n)[2:]
    n_char = chr(n)
    print("{:8s}{:15s}{:s}".format(str(n),n_bin, n_char))

Note that the binary string representations are of variable length. For the purpose of coding 
and decoding it will be useful to have all representations of the same (i.e. maximal) length 8. The function `char_to_byte` below does just that. 


In [None]:
def char_to_byte(char): 
    """
    Returns the 8 bit binary representation (padded with 
    leading zeros when necessary) of ord(char), i.e. of 
    the order of the input character char. 
    """
    byte_string = bin(ord(char))[2:]            # The order of char as a binary string 
    num_zeros = 8 - len(byte_string)            # The number of zeros needed to pad out byte_string
    byte_string = '0' * num_zeros + byte_string # Now pad out byte_string with num_zeros many zeros
                                                # to obtain the 8-bit binary representation
    return byte_string  

Let's now check the result on all the ASCII characters. (But we'll avoid all the initial special characters by starting at $32$ this time.) Let's also check in an extra column (labelled `Check`) that we are able to convert the $8$ character binary string back to the original integer. For this we use python's inbuilt function `int(b,2)` where `b` is a binary string and the `2` instructs the function that the string is binary.  

In [None]:
print("ASCII   Binary ASCII   Character   Check")
print("=====   ============   =========   =====")

for n in range(32,256):
    n_char = chr(n)
    n_bin = char_to_byte(n_char)
    n_result = int(n_bin,2)
    print("{:7s} {:14s} {:11s} {}".format(str(n),n_bin, n_char, n_result))

We now encode any message (in the from of a string of characters) using the $8$ bit binary string representation of each symbol in the message. To do this we simply concatenate the binary string representations with a leading '1'. We then convert this binary string into a decimal integer that is unique to this message.  (Make sure that you understand why.) 

**Note.** Instead of the present method we could of course have use the decimal representation directly, i.e. using strings of decimal digits of length $3$ since $255$ is the largest number that we use. There are in fact infinitely many ways of encoding/decoding text and many of these that are more efficient than the method given here. However note that internally the computer stores these characters as bytes, i.e. binary sequences of length $8$, so that our method simulates to a certain extent the internal workings of the computer.  

In [None]:
message = "This is a secret message meant only for Alice."

Let's check how long `message` and its binary representation is.

In [None]:
length = len(message)
bin_length = 1 + (8 * length)
print("The message contains {} characters".format(length))
print("The binary representation of the message", end = " ")
print("will contain {} bits".format(bin_length))

Now we convert `message` into an integer in the way described above.

In [None]:
def convert_to_integer(text,verbose=False): 
    """
    Returns an integer that encodes the input string text. 
    Each character of text is encoded as a binary string of 
    8 bits. These strings are concatenated with a leading 1
    and the resulting binary string is converted into the 
    returned integer.
    """
    bin_string = '1'
    for letter in text: 
        bin_string = bin_string + char_to_byte(letter)
    if verbose: 
        print("The binary representation of this message is:")
        print(bin_string)
    return int(bin_string,2)

And now check what happens. 

In [None]:
print("The message is: '{}'\n".format(message))
result = convert_to_integer(message,True)
print()
print("The resulting decimal integer is: ")
print(result)

Now to decode the resulting integer `result` back to a string of characters, i.e. the original message, we do the following. 

1. Convert `result` to a binary string `bin_string`. 
2. Remove `'0b1'` from the front of the string. (Why?)
3. Slice `bin_string` in to strings containing 8 bits each, convert these to characters, and concatenate the result.

In [None]:
def convert_to_text(number): 
    """ 
    Returns a string that is the decoding of the input integer number.
    This is done by converting number to a binary string, removing the 
    leading character '1', slicing out each 8 bit substring consecutively,
    converting each such string to the character it encodes and concatenating
    these characters to obtain the decoded string.    
    """
    # Remove '0b1' from the string 
    bin_string = bin(number)[3:] 
    text = ''                           
    length = len(bin_string)
    for i in range(0,length,8):  
        # Pick out binary strings, 8 bits at a time
        byte_string = bin_string[i:i+8]   
        # Convert byte_string to a character before 
        # appending it to text 
        text = text + chr(int(byte_string,2))  
    return text

In [None]:
convert_to_text(result)

To see how this works we'll try a verbose version 

In [None]:
def convert_to_text_verbose(number): 
    bin_string = bin(number)[3:] 
    print("The number is: ")
    print(number)
    print("\nThe binary string representation is:")
    print(bin_string)
    print("\nThe conversion/concatenation happens as follows:\n")
    text = ''
    length = len(bin_string)
    for i in range(0,length,8):
        byte_string = bin_string[i:i+8]
        print(byte_string,chr(int(byte_string,2)))
        text += chr(int(byte_string,2))
    print("\nThe output text is:")
    return text

In [None]:
convert_to_text_verbose(result)

During the encryption process we will want to work with numbers whose binary representation contains at most $512$ bits. (See the next lecture.) Thus for long messages we could, for example, slice the message into strings of length $60$ (with the last string being of length $\le 60$) which, under our coding method, yields a binary representation containing $1 + 60 \times 8 = 481$ bits, encode each string individually, and then encrypt and send the resulting list of numbers (instead of just encrypting and sending one number as in the example from above that we use here). 