# Useless Information: QR Codes

A brief walkthrough on how QR codes are generated.

<style>
  table, td, th {
    border: 1px solid gray;
  }
  th {
    padding-top: 5px;
    padding-right: 10px;
    padding-bottom: 5px;
    padding-left: 10px;
  }
</style>

## Mathematics Refresher

<br>

**Monomial**: $3x^2$ (single term)

**Polynomial**: $4x^6+16x^4+2x^2+5x+1$ (multiple terms or monomials)

<br>

### Logarithms 

$y = \log_b x \implies b^y=x$

Example: $\log_2(8) = 3 \implies 2^3=8$

<br>

#### Multiplication via logarithms - A trivial example with base 2

$128 \times 512 = ?$

$\log_2(128) = 7,\; \log_2(512) = 9$

$128 \times 512 = 2^7 \times 2^9 = 2^{16} = 65536$

<br>
<img src="https://www.math.utah.edu/~alfeld/sliderules/side2half.jpg" alt="A slide rule" width="500" height="150"/>

## Some Fun Facts About QR Codes

(Also known as, I skimmed the wikipedia page - https://en.wikipedia.org/wiki/QR_code)

- QR code -> Quick Response code
- 1994 Masahiro Hara at Denso Wave (Japanese automotive company)
- A matrix barcode (2D barcode) -> faster to read and stores more data
- Error correction (Reed-Solomon) allows damaged QR Codes to still be read

<br>

<img src="https://www.thonky.com/qr-code-tutorial/hello-world-final.png" alt="Hello World QR Code" width="250" height="250"/>
&nbsp;&nbsp;&nbsp;
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/QR_Code_Damaged.jpg/800px-QR_Code_Damaged.jpg" alt="Damaged QR Code" width="250" height="250"/>

<hr>

## Let's Make a QR Code

### Choose an Encoding Mode

| Mode         | Maximum Character (40-L) | Mode Indicator |
| ------------ | ------------------------ | -------------- |
| Numeric      | 7089 characters          | 0001           |
| Alphanumeric | 4296 characters          | 0010           |
| Byte         | 2953 characters          | 0100           |
| Kanji        | 1817 characters          | 0111           |

In [5]:
MODE_NUMERIC = 1   # 0001
MODE_ALPHANUM = 2  # 0010
MODE_BYTE = 4      # 0100
MODE_KANJI = 8     # 1000

In [37]:
# Declare payload and encode it (byte mode)

# UTF-8 encode -> hex bytes -> 8-bit binary
def encode_byte_mode(s):
    as_hex = [c.encode('utf-8').hex() for c in s]
    return [bin(int(byte, 16))[2:].zfill(8) for byte in as_hex]

# convert integer to bits
def int_to_bits(i, word_size):
    return bin(int(hex(i), 16))[2:].zfill(word_size)

payload = 'https://github.com/barrettotte'
encoded = encode_byte_mode(payload)
encoded_len = len(encoded)
mode = int_to_bits(MODE_BYTE, 4)

print(f"encoded '{payload}' to\n\n{encoded}\n\nsize: {encoded_len} byte(s)")
print(f'mode: {mode}')

encoded 'https://github.com/barrettotte' to

['01101000', '01110100', '01110100', '01110000', '01110011', '00111010', '00101111', '00101111', '01100111', '01101001', '01110100', '01101000', '01110101', '01100010', '00101110', '01100011', '01101111', '01101101', '00101111', '01100010', '01100001', '01110010', '01110010', '01100101', '01110100', '01110100', '01101111', '01110100', '01110100', '01100101']

size: 30 byte(s)
mode: 0100


### Choose an Error Correction Level

Higher error correction, less character capacity

| Level | Name     | Recovery | Level Indicator |
| ----- | -------- | -------- | --------------- |
| L     | Low      | 7%       | 01              |
| M     | Medium   | 15%      | 00              |
| Q     | Quartile | 25%      | 11              |
| H     |  High    | 30%      | 10              |

In [20]:
ERROR_L = 1  # 01
ERROR_M = 0  # 00
ERROR_Q = 3  # 11
ERROR_H = 2  # 10

err_lvl = ERROR_Q

### Find the Version

Versions 1-40, select smallest possible!

| Version + Error Correction Level | Numeric | Alphanumeric | Byte | Kanji |
| -------------------------------- | ------- | ------------ | ---- | ----- |
| 1-L                              | 41      | 25           | 17   | 10    |
| 1-M                              | 34      | 20           | 14   | 8     |
| 1-Q                              | 27      | 16           | 11   | 7     |
| 1-H                              | 17      | 10           | 7    | 4     |
| ...                              | ...     | ...          | ...  | ...   |
| 3-L                              | 127     | 77           | 53   | 32    |
| 3-M                              | 101     | 61           | 42   | 26    |
| 3-Q                              | 77      | 47           | 32   | 20    |
| 3-H                              | 58      | 35           | 24   | 15    |
| ...                              | ...     | ...          | ...  | ...   |
| 40-L                             | 7089    | 4296         | 2953 | 1817  |
| 40-M                             | 5596    | 3391         | 2331 | 1435  |
| 40-Q                             | 3993    | 2420         | 1663 | 1024  |
| 40-H                             | 3057    | 1852         | 1273 | 784   |


In [17]:
# trimmed down version of capacity table
BYTE_MODE_CAPACITY_LOOKUP = [
    # L, M, Q, H
    [0, 0, 0, 0],       # (one-indexing)
    [17, 14, 11, 7],    # 1
    [32, 26, 20, 14],   # 2
    [53, 42, 32, 24],   # 3
    [78, 62, 46, 34],   # 4
    [106, 84, 60, 44],  # 5
    # and so on...to 40
]

# fixes issue with LMQH ordering
ERROR_IDX_TO_LOOKUP = [1, 0, 3, 2]

In [22]:
# find version to use based on payload size and error correction
def get_version(size, err_lvl):
    err_idx = ERROR_IDX_TO_LOOKUP[err_lvl]
    for col, row in enumerate(BYTE_MODE_CAPACITY_LOOKUP):
        if row[err_idx] > size:
            return col
    raise Exception("couldn't find version")

# find smallest version for our payload
version = get_version(encoded_len, err_lvl)
assert version == 3  # should be using a 3-Q configuration


### Fetch Error Correction Configuration

The scheme used to break up our encoded bytes into groups/blocks to run through error correction.

In [25]:
EC_CONFIG_LOOKUP = [
    [],  # L                      M                       Q                    H
    [[19, 7, 1, 19, 0, 0], [16, 10, 1, 16, 0, 0], [13, 13, 1, 13, 0, 0], [9, 17, 1, 9, 0, 0]],         # 1
    [[34, 10, 1, 34, 0, 0], [28, 16, 1, 28, 0, 0], [22, 22, 1, 22, 0, 0], [16, 28, 1, 16, 0, 0]],      # 2
    [[55, 15, 1, 55, 0, 0], [44, 26, 1, 44, 0, 0], [34, 18, 2, 17, 0, 0], [26, 22, 2, 13, 0, 0]],      # 3
    [[80, 20, 1, 80, 0, 0], [64, 18, 2, 32, 0, 0], [48, 26, 2, 24, 0, 0], [36, 16, 4, 9, 0, 0]],       # 4
    [[108, 26, 1, 108, 0, 0], [86, 24, 2, 43, 0, 0], [62, 18, 2, 15, 2, 16], [46, 22, 2, 11, 2, 12]],  # 5
    # and so on...to 40
]

In [55]:
def get_ec_config(version, err_lvl):
    return EC_CONFIG_LOOKUP[version][ERROR_IDX_TO_LOOKUP[err_lvl]]

# fetch error correction configuration
ec_config = get_ec_config(version, err_lvl)

print('error correction config:')
print(f'  Total data words                           = {capacity}')
print(f'  Error correction words per block           = {ec_config[1]}')
print(f'  Number of blocks in group 1                = {ec_config[2]}')
print(f'  Number of data words in each group 1 block = {ec_config[3]}')
print(f'  Number of blocks in group 2                = {ec_config[4]}')
print(f'  Number of data words in each group 2 block = {ec_config[5]}')

error correction config:
  Total data words                           = 34
  Error correction words per block           = 18
  Number of blocks in group 1                = 2
  Number of data words in each group 1 block = 17
  Number of blocks in group 2                = 0
  Number of data words in each group 2 block = 0


## Fetch Character Count Indicator

Depending on version and encoding mode, the payload size will need to take up more bits.

| Version Range | Numeric | Alphanumeric | Byte    | Kanji   |
| ------------- | ------- | ------------ | ------- | ------- |
| 1-9           | 10 bits | 9 bits       | 8 bits  | 8 bits  |
| 10-26         | 12 bits | 11 bits      | 16 bits | 10 bits |
| 27-40         | 14 bits | 13 bits      | 16 bits | 12 bits |

In [43]:
# is test between low and high (inclusive)?
def is_between(low, high, test):
    return test >= low and test <= high

# determine character count indicator
def get_count(size, version, mode):
    if int(mode, 2) == MODE_BYTE:
        if is_between(1, 9, version):
            word_size = 8
        elif is_between(10, 26, version):
            word_size = 16
        elif is_between(27, 40, version):
            word_size = 16
        else:
            raise Exception("Invalid version")
    else:
        raise Exception("Only byte mode implemented!")
    return int_to_bits(size, word_size)


count = get_count(encoded_len, version, mode)
capacity = ec_config[0]
capacity_bits = capacity * 8

print(f"size: {encoded_len} byte(s) - char count: {count}")
print(f"version {version} with max capacity of {capacity} byte(s)", end='')
print(f" or {capacity_bits} bit(s)")

size: 30 byte(s) - char count: 00011110
version 3 with max capacity of 34 byte(s) or 272 bit(s)


## Pad the Payload

Before feeding the encoded payload into the error correction algorithm, it needs to be byte/bit padded.

In [48]:
# utility to build string of byte/bit size
def byte_size_str(d):
    size = len(d)
    return f"{size} bit(s) => {size // 8} byte(s), {size % 8} bit(s)"

seg = mode + count + ''.join(encoded)
print('before padding: ' + byte_size_str(seg))

# Add terminator of zeros up to four bits (if there is room)
terminal_bits = 0
while terminal_bits < 4 and len(seg) < capacity_bits:
    seg += '0'
    terminal_bits += 1

# pad bits to nearest byte
while len(seg) % 8 != 0 and len(seg) < capacity_bits:
    seg += '0'

# pad bytes to full capacity (alternating 0xEC and 0x11)
use_EC = True
while len(seg) < capacity_bits:
    seg += int_to_bits(int(0xEC), 8) if use_EC else int_to_bits(int(0x11), 8)
    use_EC = not use_EC

print(f'after padding:  {byte_size_str(seg)}')
print("seg: {0:0>4X}".format(int(seg, 2)))

assert len(seg) == capacity_bits

before padding: 252 bit(s) => 31 byte(s), 4 bit(s)
after padding:  272 bit(s) => 34 byte(s), 0 bit(s)
seg: 41E68747470733A2F2F6769746875622E636F6D2F626172726574746F7474650EC11


## Split the Payload

Using error correction configuration, split the payload into groups and blocks

In [68]:
# split segment into words (bytes)
code_words = [seg[i: i + 8] for i in range(0, len(seg), 8)]
print(f'total word(s) = {len(code_words)}')

g1_blocks = []  # only two groups
g2_blocks = []  # so we can be lazy

ecw_per_block = ec_config[1]
g1_block_count = ec_config[2]
g1_data_block_size = ec_config[3]
g2_block_count = ec_config[4]
g2_data_block_size = ec_config[5]

print('\nerror correction config:')
print(f'  Total data words                           = {capacity}')
print(f'  Error correction words per block           = {ecw_per_block}')
print(f'  Number of blocks in group 1                = {g1_block_count}')
print(f'  Number of data words in each group 1 block = {g1_data_block_size}')
print(f'  Number of blocks in group 2                = {g2_block_count}')
print(f'  Number of data words in each group 2 block = {g2_data_block_size}')
print('')

# build group 1
cw_idx = 0
while len(g1_blocks) < g1_block_count:
    to_idx = g1_data_block_size * (len(g1_blocks) + 1)
    g1_blocks.append(code_words[cw_idx: to_idx])
    cw_idx += g1_data_block_size
assert len(g1_blocks) == g1_block_count

print(f'group 1 blocks:')
for i, b in enumerate(g1_blocks):
    print(f'\nblock {i}: {b}')

# build group 2
g2_offset = cw_idx
while len(g2_blocks) < g2_block_count:
    to_idx = (g2_data_block_size * (len(g2_blocks) + 1)) + g2_offset
    g2_blocks.append(code_words[cw_idx:to_idx])
    cw_idx += g2_data_block_size
assert len(g2_blocks) == g2_block_count

print(f'\ngroup 2 blocks:')
for i, b in enumerate(g2_blocks):
    print(f'\nblock {i}: {b}')

total word(s) = 34

error correction config:
  Total data words                           = 34
  Error correction words per block           = 18
  Number of blocks in group 1                = 2
  Number of data words in each group 1 block = 17
  Number of blocks in group 2                = 0
  Number of data words in each group 2 block = 0

group 1 blocks:

block 0: ['01000001', '11100110', '10000111', '01000111', '01000111', '00000111', '00110011', '10100010', '11110010', '11110110', '01110110', '10010111', '01000110', '10000111', '01010110', '00100010', '11100110']

block 1: ['00110110', '11110110', '11010010', '11110110', '00100110', '00010111', '00100111', '00100110', '01010111', '01000111', '01000110', '11110111', '01000111', '01000110', '01010000', '11101100', '00010001']

group 2 blocks:


### Reed-Solomon Error Correction

An error correction algorithm to allow a damaged payload to still get read correctly.

#### A Very High Level Overview of Galois Fields

A Galois or finite field is a field consisting of a finite amount of elements.

A finite field with $p^n$ elements is given by $\text{GF}(p^n)$, where $p$ is a prime number.

Think of a wagon wheel with $p^n$ spokes.

When $p$ is 2, we can start thinking in binary.
$\text{GF}(2^3) = (0,1,2,3,4,5,6,7)$

- $\text{GF}(8)[0] = 0(2^2) + 0(2^1) + 0(2^0) = 000$
- $\text{GF}(8)[1] = 0(2^2) + 0(2^1) + 1(2^0) = 001$
- $\text{GF}(8)[2] = 0(2^2) + 1(2^1) + 0(2^0) = 010$
- $\text{GF}(8)[3] = 0(2^2) + 1(2^1) + 1(2^0) = 011$
- $\text{GF}(8)[4] = 1(2^2) + 0(2^1) + 0(2^0) = 100$
- $\text{GF}(8)[5] = 1(2^2) + 0(2^1) + 1(2^0) = 101$
- $\text{GF}(8)[6] = 1(2^2) + 1(2^1) + 0(2^0) = 110$
- $\text{GF}(8)[7] = 1(2^2) + 1(2^1) + 1(2^0) = 111$

So, any binary number can be represented as a polynomial and vice versa.

$10011100 = 1x^7 + 0x^6 + 0x^5 + 1x^4 + 1x^3 + 1x^2 + 0x^1 + 0x^0 = x^7+x^4+x^3+x^2$

<br>

Finite fields are used in cryptography algorithms since they allow bytes to be easily 
and rapidly scrambled using polynomial arithmetic.

Reed-Solomon error correction uses $\text{GF}(2^8) = \text{GF}(256)$.

#### Finite Field Arithmetic in $\text{GF}(256)$

<br>

**Addition and subtraction**:

$(x^6+x^4+x+1) + (x^7+x^6+x^3+x) = x^7+x^4+x^3+1$

$01010011 + 11001010 = 10011001 \implies \text{XOR}$

Think of a wagon wheel.

<br>

**Multiplication**:

I cheat and use a lookup table. But, the algorithm is called [Russian peasant multiplication](https://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication#Russian_peasant_multiplication) which is actually a special case of the algorithm
used in ancient Egyptian multiplication.

<br>

This is already too in the weeds, so let's go back to code.