# Bits, bytes and encoded messages

The only information computers understad is in binary form but we (as humans) need to work in other bases, for instance the natural numbers (or base 10). Here we will learn how to transform in between bases and how characters are represented in the computer.

## Bits
A **bit** is a binary number, 0 or 1.  

$$1101$$

So what is this number in base 10? The binary number 1101 can also be written as follows: ($1 * 2^3$) + ($1 * 2^2$) + ($0 * 2^1$) + ($1 * 2^0$) = $13$.

In [40]:
from random import randrange

bits = '0b' + ''.join([str(randrange(2)) for _ in range(10)])
bits

'0b1010110011'

In [42]:
int(bits, 2)

691

In [44]:
sum([int(k)*2**(len(bits[2:])-i-1) for i, k in enumerate(bits[2:])])

691

What's the largest number we can represent in $n$ bits?

In [48]:
n = 8

print(f"max integer (base 10) value for {n} bits is {2**n -1}")

max integer (base 10) value for 8 bits is 255


## bytes

A **byte** is a collection of 8 bits. is a byte expressed in binary form. The maximum value (in base 10) of a byte is $2^8-1$, therefore 255.  

In [68]:
bytes_int = [randrange(256) for _ in range(10)]
print(f"bytes_int\n\t{bytes_int}")

bytes_bin = [bin(x) for x in bytes_int]
print(f"bytes_bin\n\t{bytes_bin}")

bytes_int
	[58, 112, 250, 142, 72, 254, 105, 221, 88, 97]
bytes_bin
	['0b111010', '0b1110000', '0b11111010', '0b10001110', '0b1001000', '0b11111110', '0b1101001', '0b11011101', '0b1011000', '0b1100001']


In [71]:
int("0b00111010", 2)

58

## Decimal, binary, hexadecimal and octal

So far we represented numbers in base 2 (binary form) or base 10 (decimal form). But we can also express numbers in octal (base 8) or hexadecimal (base 16), check [this](https://www.rapidtables.com/convert/number/base-converter.html) conversor.

**Hexadecimal characters**: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f]

**Octal characters**: [0, 1, 2, 3, 4, 5, 6, 7]

**Decimal characters**: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


For instance:

$654_{10}$=$1010001110_2$=$1216_8$=$28\rm{E}_{16}$

8 binary characters = 1 byte, max in base 10: 255


In [72]:
n_bytes = 2
x = randrange(2**(8*n_bytes))

print(f"x in binary (base 2):\n\t{bin(x)}")
print(f"x in octal (base 8):\n\t{oct(x)}")
print(f"x in decimal (base 10):\n\t{x}")
print(f"x in hexadecimal (base 16):\n\t{hex(x)}")

x in binary (base 2):
	0b1001110100111001
x in octal (base 8):
	0o116471
x in decimal (base 10):
	40249
x in hexadecimal (base 16):
	0x9d39


## Encoding

How can we represent chracters in the computer?. We need a dictionary that is able to convert an integer to its character form. This is known as encoding. 

UTF-8 is the Unicode Transformation Format for 8 bits (a byte). UTF bytes string is of variable length, having as maximum 4 bytes. ASCII (American Standard Code for Information Interchange) are designated as characters of one byte because they are the most frequently used (also it has a reason historically).

Let's print the ascii characters!

In [73]:
for i in range(0, 128):
    b = i.to_bytes(1, byteorder='big')
    print(f"int = {i}, hex = {hex(i)}, bytes = {b}, decoded = {b.decode(encoding='UTF-8')}")

int = 0, hex = 0x0, bytes = b'\x00', decoded =  
int = 1, hex = 0x1, bytes = b'\x01', decoded = 
int = 2, hex = 0x2, bytes = b'\x02', decoded = 
int = 3, hex = 0x3, bytes = b'\x03', decoded = 
int = 4, hex = 0x4, bytes = b'\x04', decoded = 
int = 5, hex = 0x5, bytes = b'\x05', decoded = 
int = 6, hex = 0x6, bytes = b'\x06', decoded = 
int = 7, hex = 0x7, bytes = b'\x07', decoded = 
int = 8, hex = 0x8, bytes = b'\x08', decoded = 
int = 9, hex = 0x9, bytes = b'\t', decoded = 	
int = 10, hex = 0xa, bytes = b'\n', decoded = 

int = 11, hex = 0xb, bytes = b'\x0b', decoded = 
int = 12, hex = 0xc, bytes = b'\x0c', decoded = 
int = 13, hex = 0xd, bytes = b'\r', decoded = 
int = 14, hex = 0xe, bytes = b'\x0e', decoded = 
int = 15, hex = 0xf, bytes = b'\x0f', decoded = 
int = 16, hex = 0x10, bytes = b'\x10', decoded = 
int = 17, hex = 0x11, bytes = b'\x11', decoded = 
int = 18, hex = 0x12, bytes = b'\x12', decoded = 
int = 19, hex = 0x13, bytes = b'\x13', decoded = 
int = 20, he

In [76]:
x = int('e0a887', 16)
x_b = x.to_bytes(3, byteorder='big')
dec_char = x_b.decode(encoding="UTF-8")

print(f"int = {x}")
print(f"bytes = {x_b}")
print(f"decoded = {dec_char}")

int = 14723207
bytes = b'\xe0\xa8\x87'
decoded = ਇ


We can turn a message from ascii letters into bytes (according to UTF-8 encoding) and then transform it into its binary, hexadecimal or octal form:

In [77]:
from crypto import bytes_to_bin, bytes_to_hex

message = b"simple message"

bin_repr = bytes_to_bin(message, pre="")
hex_repr = bytes_to_hex(message, pre="")

print(f"message:\n{str(message)}\n")
print(f"message in binary:\n{bin_repr}\n")
print(f"message in hexadecimal:\n{hex_repr}\n")

message:
b'simple message'

message in binary:
0111001101101001011011010111000001101100011001010010000001101101011001010111001101110011011000010110011101100101

message in hexadecimal:
73696d706c65206d657373616765

