<a href="https://colab.research.google.com/github/SCS-Technology-and-Innovation/IntroComp/blob/main/digital.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Digital representation of information

Most of us are used to thinking of numbers in base 10, called the decimal system. 123 to us means (**one**) *hundred*, twenty (**two** *tens*), plus **three** (*ones*):
$$ 123_{10}  = 1 \times 10^2 + 2 \times 10^1 + 3 \times 10^0.$$



In [1]:
int('123', 10)

123

The math would work just as well with any other number base, with the caveat that any individual digit must be smaller than the base, just as the largest digit is nine in base ten.

Hence, 123 could, for example, be a base 5 number:
$$ 123_5 =  1 \times 5^2 + 2 \times 5^1 + 3 \times 5^0 = 25 + 10 + 3 = 38.$$

In [2]:
int('123', 5)

38

Base two is relevant in computing since information is stored in ones and zeroes. A single base-two digit is called a **bit** and numbers represented in base two are known as *binary* numbers.

For example, $20_{10}$ in binary is $10100_2$ since $16 + 4 = 20$ and $16 = 2^4$ and $4 = 2^2$. Iterating over small powers of two gives us the bits:
$$1 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 0 \times 2^0.$$

The right-most digit always corresponds to the zeroeth power of the base and the other digits correspond to higher powers, with increments of one, always right to left.

Knowing how this works, it is not impossible to work out binary representations of decimal values or vice versa.

In [7]:
decimal = 35
bits = bin(decimal)
binary = bits[2:]
print(binary)
print(int(binary, 2))

100011
35


As powers of two are special for computers, bases 8 (**octal**) and 16 (**hexadecimal**) are rather commonly used. The former appears for example in *linux file permissions*, whereas the latter is frequently used to express *colors*.

In [18]:
n = 84
print(bin(n), int(bin(n)[2:], 2))
print(oct(n), int(oct(n)[2:], 8))
print(hex(n), int(hex(n)[2:], 16))

0b1010100 84
0o124 84
0x54 84


Note how binary values get a prefix `0b`, octal ones `0o` and hexadecimal ones `0x`.

Experiment with different values for `n` until you get hexadecimal representations that contain *letters*. The letters are actually just digits. Since the base is 16, the digits can take values from 0 to 9 as in decimal, but in addition possible **digits** include 10, 11, 12, 13, 14 and 15.

Since we have to use a single symbol per digit (so as not to ruin the relationship between the digit position from right to left and the power of the base that it corresponds to), these are coded as `a = 10`, `b = 11`, `c = 12`, `d = 13`, `e = 14` and `f = 15`.  A larger base, such as base 20, would use more letters.

Once lowercase letters run out (bases above 36), another design choice would need to be made, but such high bases are not commonly used.



In [22]:
int('hello', 28)

10773556

Another very cool thing to notice while experimenting is that if you group the bits of the binary number in triplets, right to left, the values of those triplets correspond to the octal digits because $2^3 = 8$.

In [35]:
n = 497
bits = bin(n)[2:] # skip the prefix
l = len(bits)

print(bits)

segment = 3
while not l % segment == 0:
  bits = '0' + bits # add leading zeroes to make triplets
  l = len(bits)

digits = oct(n)[2:] # skip the prefix here as well
print(digits)

for i in range(0, l, segment):
  print(bits[i : i + segment], digits[i // segment])

111110001
761
111 7
110 6
001 1


The same exact thing happens for groups of four bits and the hexadecimal representation, since $2^4 = 16$.

In [38]:
n = 6786
bits = bin(n)[2:] # skip the prefix
l = len(bits)

print(bits)

segment = 4
while not l % segment == 0:
  bits = '0' + bits # add leading zeroes to make quadruplets
  l = len(bits)

digits = hex(n)[2:] # skip the prefix here as well
print(digits)

for i in range(0, l, segment):
  d = digits[i // segment]
  print(bits[i : i + segment], d, int(d, 16))

1101010000010
1a82
0001 1 1
1010 a 10
1000 8 8
0010 2 2


Whereas the special bases 2, 8 and 16 have specific conversion routines `bin`, `oct` and `hex`, respectively, for other bases we need to craft our own. It is a bit of math, so it is okay not to understand just how it works yet.

The trick is start with the decimal number as the current value, use the modulus (the remainder of division) between the current value and the base to get a digit (and convert the digit into a letter for larger bases),  iteratively divide the current value by the base (exact integer division) and keep going until the current value hits zero.

In [53]:
import string
digits = string.digits + string.ascii_lowercase # first 0 to 9 followed by a to z

def dec2base(n, b):
  result = '' # accumulate digits in the reverse order
  while n > 0: # continue until hitting zero
    result += digits[n % b] # append the obtained digit since it is a shorter code than prepending would be
    n //= b # divide and discard decimals
  return result[::-1] # flip the order of the digits

decimal = 10240985
base = 22
other = dec2base(decimal, base)
print(decimal, other, decimal == int(other, base))

10240985 1lfh17 True


All **numbers** are stored in the memory of a computer as zeroes and ones (bits), often represented by opposite states of magnetization or any other physical or chemical phenomena that has two states (on and off, so to speak).

All **other** information is *encoded* into numbers. Letters (and other characters) become numbers through an encoding such as UTF-8 or ASCII; the former also includes emojis. This is how text is stored.

Images are stored as pixels: a matrix (think about a grid of small squares) that has a specified width and a height (resolution). The pixek colors are stored as numbers by associating a numerical value to one or more color channels. Black-and-white pictures only contain one bit per pixel, whereas a gray-scale picture could have a byte (eight bits) per pixel to represent $2^8 = 256$ shared of gray. A common way for storing color images is to use three channels R (for red), G (for green) and B (for blue).

Audio commonly undergoes signal processing such as a Fourier transform to create a numerical representation for the sound that can then be stored as bits. A video is typically stored as a sequence of image frames and the corresponding audio track.

When you compress a file, repeated patterns within the bit sequence are substituted by more compact representations, reducing the total size of the digital information.

It's all numbers. Binary numbers. Embrace it and you might find cool ways to make your code faster. For example, multiplying a number by two is equivalent to adding a zero at the end of its binary representation. No need for any other kind of math. And multiplying by another number can be achieved by multiplying only with powers of two and then summing those together. Following this logic, exponentiation gets simplified as well.