# Side-Channel Analysis of the hardware AES co-processor

This notebook demonstrates a Correlation Power Analysis (CPA) attack on the hardware AES implementation in the CC2640R2F.

## Hardware Setup

The required hardware modifications are outlined in the main README of the repository.

* Remove the 3V3 jumper and connect the target side pin to the ChipWhisperer's 3V3 output
* Remove the RESET jumper and connect the target side to the ChipWhisperer's NRST output
* Connect the ChipWhisperer's IO4/TRG to the target's DIO6 pin
* Connect the ChipWhisperer's ground to a ground pin on the target board

In addition to these connections you should also connect an external bench supply set to ~1.45V to the shunt resistor.
Additionally, you can remove the external 24 MHz oscillator from the development board and provide a clock to the board from the ChipWhisperer. This is explained in more detail in the preparation section. 

## Preparation

In [1]:
import sys
import time
import os
import numpy as np
import chipwhisperer as cw
from tqdm.notebook import tqdm
import serial
import matplotlib.pyplot as plt
from Crypto.Cipher import AES
import itertools  
from bokeh.plotting import figure, show
from bokeh.io import output_notebook, push_notebook
from bokeh.palettes import Dark2_5 as palette
from bokeh.layouts import column
from bokeh.models import Span

ser = 0

The results presented in the paper used synchronous sampling when targeting the CC2640R2F. This notebook allows you to choose between asynchronous and synchronous sampling. 

1. `SYNCHRONOUS = False` and no external clock supplied by the ChipWhisperer -- > Asynchronous sampling
  - The firmware will use the external 24 MHz crystal oscillator as a main clock source
  - The main CPU operating frequency will be 48 MHZ
  - If the external crystal oscillator has been removed the board will fall back on the internal RC oscillator

2. `SYNCHRONOUS = True` and an external clock supplied by the ChipWhisperer --> Synchronous sampling
  - The firmware will automatically try to use an externally supplied clock as the main clock source
  - In this notebook we configure the ChipWhisperer to supply a 12 MHz clock on the HS2 pin
  - This means that the CPU operating frequency will be 24 MHz
  - As the main CPU operating frequency is lower than expected, we will have to configure the serial interface at half (115200/2) the baud rate 
  
Note that for both scenarios we configure the ChipWhisperer Husky to sample at 240 MSPS. If you are using a ChipWhisperer Lite or Pro you will have to modify the sampling rate accordingly.

  
### Supplying an external clock for synchronous sampling
Admittedly, using synchronous sampling and supplying an external clock to the microcontroller is a bit more involved compared to asynchronous sampling. To supply an external clock we replaced the high frequency oscillator with a pin header. If you go down this road, take care to remove the correct oscillator and provide the external clock on the correct pin (X24M_P). An additional picture showing this modification is provided in the main README of the repository.

![SCA setup](img/sca_setup.jpg)

Remember to connect the ChipWhisperer's HS2 pin to supply the clock to the development board if you set `SYNCHRONOUS` to `True` in the next cell.

In [2]:
SYNCHRONOUS = True

if SYNCHRONOUS:
    div = 2
else:
    div = 1

basebaud = 115200
baseclock = 24e6
clock = baseclock // div
baud = basebaud // div

# Adjust the base adc_mul value if you are not using a ChipWhisperer Husky
adc_mul = 10 * div 

print(clock, baud)

12000000.0 57600


In [3]:
# Connect to the ChipWhisperer and perform some basic initialization

scope = cw.scope()

scope.adc.clear_clip_errors()
scope.adc.samples = 2500
scope.adc.presamples = 0
scope.clock.clkgen_src = 'system'
scope.clock.clkgen_freq = clock
scope.clock.adc_mul = adc_mul
scope.adc.offset= 0
scope.trigger.triggers = "tio4"
scope.adc.basic_mode = "rising_edge"
scope.glitch.enabled = False
scope.gain.db = 24
scope.io.target_pwr = True
scope.io.hs2 = "clkgen"
scope.adc.bits_per_sample = 12
scope.adc.segments = 1

In [4]:
# Connect to the LAUNCHXL-CC2640R2 UART
# You may have to change the serial port ('/dev/ttyACM1')

if ser:
    ser.close()

ser = serial.Serial('/dev/ttyACM0', baud)

In [5]:
# Modify the dslite_path variable to point to your installation of Uniflash
# Running this cell will load the example target firmware
# THIS WILL OVERWRITE THE FIRMWARE ON YOUR LAUNCHXL-CC2640R2

import subprocess
from pathlib import Path

home_dir = str(Path.home()) 
dslite_path = home_dir + '/ti/uniflash_7.0.0/dslite.sh'
erase_cmd = dslite_path + ' --mode cc13xx-cc26xx-mass-erase -d XDS110'
flash_cmd = dslite_path + ' --config ./bin/CC2640R2F.ccxml --flash ./bin/VFI_SCA_CC2640R2.out' 

process = subprocess.Popen(erase_cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output = process.communicate()

if b'Device Unlocked' not in output[0]:
    print('There was an error while trying to erase the microcontroller')
    print(output)
else:
    scope.io.nrst = 'low'
    scope.io.target_pwr = False
    time.sleep(0.1)
    scope.io.target_pwr = True
    scope.io.nrst = 'high'
    
    process = subprocess.Popen(flash_cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    output = process.communicate()
    if b'Board Reset Complete' in output[0]:
        print('Target has been flashed!')
    else:
        print('Error flashing target. Check your connections and try again.')

Target has been flashed!


In [6]:
# Simple function to reset the target microcontroller
def reset_dut(delay=0.1):
    scope.io.nrst = 'low'
    scope.io.target_pwr = False
    time.sleep(delay)
    scope.io.target_pwr = True
    scope.io.nrst = 'high'
    time.sleep(0.05)
    ser.flushInput()
    ser.write(b's') # To select the segmented hardware aes function of the firmware

## The interface

The ChipWhisperer supports [segmented memory](https://github.com/newaetech/chipwhisperer-jupyter/blob/master/demos/Using%20Segmented%20Memory%20for%20Hardware%20AES%20(STM32F4).ipynb), allowing to fill up the sample buffer with multiple traces. This can greatly increase the number of traces we can capture per second. To accommodate this feature we implemented the following interface in the firmware.

1. PC --> DUT: `| # AES operations (1 byte) | AES-128 key (16 bytes) | AES-128 initial plaintext (16 bytes) |`
2. The DUT will perform the provided number of AES-128 operations, each time taking the ciphertext from the previous block as the input to the next block
3. DUT --> PC: `| Final AES-128 ciphertext |`

In this case we are providing the key so we can compute the plaintext and ciphertext associated to each block on the PC end. This minimizes the number of bytes that have to communicated over serial which is a relatively slow.
Note that you can also use this interface to acquire the side-channel measurements trace by trace by setting the number of AES operations to be performed to 1.

As stated earlier, this is a rather artificial scenario (trigger signals and we know the key), but it works well for evaluation purposes.

In [7]:
def encrypt_rand_block(count=1, key=None):
    assert count < 256
    
    ser.reset_input_buffer()

    if key is None:
        # Generate a random key
        key = list(np.random.randint(0,256, size=16))
    else:
        assert len(key) == 16
    
    # Generate a random initial plaintext
    pt = list(np.random.randint(0,256, size=16))
    
    ser.write(bytes([count])) # The number of AES operations to perform
    time.sleep(0.002)
    ser.write(bytes(key)) # The key used for each AES operation
    time.sleep(0.002)
    ser.write(bytes(pt)) # The initial plaintext
    # At this point the development board starts performing the AES operations
    
    # Arrays to store the intermediate plaintexts and ciphertexts
    cts = np.zeros((count, 16), dtype='uint8')
    pts = np.zeros((count, 16), dtype='uint8')
    
    # Calculate the intermediate plaintexts and ciphertexts
    cipher = AES.new(bytes(list(key)), AES.MODE_ECB)
    temp = pt
    for i in range(count):
        ct = cipher.encrypt(bytes(temp))
        pts[i] = temp
        cts[i] = list(ct)
        temp = list(ct)      
    
    # Wait for the development board to send back the final ciphertext
    while ser.in_waiting != 16:
        continue
        
    ct = ser.read(ser.in_waiting)
    # Confirm that the retrieved ciphertext is the same as the calculated one
    assert(list(ct) == list(cts[-1]))
    
    return list(key), pts, cts

In [8]:
reset_dut()

In [9]:
# Perform 5 AES operations with a random (but fixed) key
key, pts, cts = encrypt_rand_block(5)

print('Key:', key)
print('Plaintexts:', pts)
print('Ciphertexts:', cts)

Key: [241, 238, 76, 194, 32, 86, 115, 210, 144, 4, 236, 4, 74, 111, 64, 232]
Plaintexts: [[240 250  79 230 156 193 152 180  35 206  58 214  40 133 198  64]
 [216 116  79 227 217 218 210 210 145 245  73   3 159 170 216 205]
 [ 30 158  81  44  45 244 152 155  29 180 103 247  24  50 117 159]
 [204 182 111 174  74 128  77  97 100  24   3  81  53  95  24 243]
 [165  51  72  85 140 244  56 221  30 251 139  52 107 124  93 242]]
Ciphertexts: [[216 116  79 227 217 218 210 210 145 245  73   3 159 170 216 205]
 [ 30 158  81  44  45 244 152 155  29 180 103 247  24  50 117 159]
 [204 182 111 174  74 128  77  97 100  24   3  81  53  95  24 243]
 [165  51  72  85 140 244  56 221  30 251 139  52 107 124  93 242]
 [230  25 250  59  25  70  41 236  80 214 191 223 245  12 117 109]]


## Acquiring a few traces for visualization

At this point you may want to adjust the gain to ensure that you using the full ADC range without clipping.

In the plot you will likely see that not all traces align nicely. Section 5.1 in the paper provides a strategy to cope with this phenomenon. However, to keep this notebook simple we will not be bothering with splitting the traces into two groups, aligning and standardizing each group and then recombining the traces. Instead we will consider the observed effects to be noise and compensate for that noise by using more traces in the attack.

In [10]:
scope.gain.db = 30

# Feel free to change the samples and offset parameters, but this window should be enough to perform the attack
# Note that these values were selected for the CC2640R2F running from an external 12MHz clock
scope.adc.samples = 800
scope.adc.offset= 1000

scope.adc.clear_clip_errors()
scope.adc.segments = 1

In [11]:
ntraces = 100
traces = np.zeros((ntraces, scope.adc.samples), dtype='uint16')

for i in tqdm(range(ntraces)):
    scope.arm()
    
    #a = scope.adc.trig_count
    key, pt, ct = encrypt_rand_block(count=1)
    #print(scope.adc.trig_count-a) 
    
    scope.capture(poll_done=True)
    traces[i] = scope.get_last_trace(as_int=True)

  0%|          | 0/100 [00:00<?, ?it/s]

In [12]:
output_notebook()
p = figure(sizing_mode='scale_width', plot_height=300, plot_width=900)

x_range = range(0, traces.shape[1])
colors = itertools.cycle(palette) 

for i, color in zip(range(5), colors):
    p.line(x_range, traces[i], color=color, legend_label=str(i))

p.line(x_range, np.mean(traces, axis=0), color='blue', legend_label='mean')
    
p.legend.click_policy="hide"
show(p)

## Acquiring traces for attack
Let's acquire 200,000 traces, just to make sure that the attack will work. This also shows the power of the segmented memory, as acquiring that amount of traces takes only two minutes!

Before we start the actual acquistion we have to configure the segmented memory feature.
We have to figure out how many of these AES operations can fit in the ChipWhisperer's memory.

In [13]:
buffer_size = scope.adc.oa.hwMaxSegmentSamples
max_operations = buffer_size // scope.adc.samples

print(max_operations)

122


In [14]:
framesize = 100 # we will fit 100 trace segments in the CW memory
scope.adc.segments = framesize

In [15]:
ntraces = 200000
assert(ntraces % framesize == 0)
nframes = ntraces // framesize

# The key we will try to recover later
targetkey = list(np.random.randint(0,256, size=16))

traces = np.zeros((ntraces, scope.adc.samples), dtype='uint16')
plaintexts = np.zeros((ntraces, 16), dtype='uint8')
ciphertexts = np.zeros((ntraces, 16), dtype='uint8')

for i in tqdm(range(nframes)):
    scope.arm()

    keys, pts, cts = encrypt_rand_block(count=framesize, key=targetkey)
    plaintexts[(i*framesize):(i*framesize)+framesize,:] = pts
    ciphertexts[(i*framesize):(i*framesize)+framesize,:] = cts
    
    scope.capture(poll_done=True)
    trace = scope.get_last_trace(as_int=True)
    
    traces[(i*framesize):(i*framesize)+framesize,:] = trace.reshape((framesize, scope.adc.samples))

  0%|          | 0/2000 [00:00<?, ?it/s]

In [16]:
output_notebook()
p = figure(sizing_mode='scale_width', plot_height=300, plot_width=900)

x_range = range(0, traces.shape[1])
colors = itertools.cycle(palette) 

for i, color in zip(range(5), colors):
    t = np.random.randint(traces.shape[0])
    p.line(x_range, traces[t], color=color, legend_label=str(i))

p.line(x_range, np.mean(traces, axis=0), color='blue', legend_label='mean')
    
p.legend.click_policy="hide"
show(p)

## Recover one key byte
For the attack phase we will use the [eShard Scared](https://gitlab.com/eshard/scared) library, mainly for its numpy based progressive CPA implementation. This will help us to process the 200,000 traces in no time.

As documented in the paper we found that this specific hardware implementation leaks the Hamming distance between the AddRoundKey operation output in round 9 and the ciphertext. This is a fairly common leakage model, in fact it is already covered by a [ChipWhisperer tutorial](https://github.com/newaetech/chipwhisperer-jupyter/blob/master/courses/sca201/Lab%202_2%20-%20CPA%20on%20Hardware%20AES%20Implementation.ipynb). I suggest reading through that tutorial if you want to understand the leakage model in more details.

In [17]:
import scared

# A numpy array to perform Hamming weight lookups
HW = np.array([bin(n).count("1") for n in range(0,256)], dtype='uint8')

# Loading a few AES constants from the Scared library
inv_sbox = scared.aes.base.INV_SBOX # You can also use the scared.aes.base.inv_sub_bytes function
undo_invshift = scared.aes.base.SHIFT_ROWS

In [18]:
b = 0 # Change this value to attack a different key byte

# Create an array of labels, one column for each possible value of the key byte
labels = np.zeros((traces.shape[0], 256), dtype='uint8')

# Compute the labels for each partial key guess
for k in range(labels.shape[1]):
    labels[:, k] = HW[ciphertexts[:, undo_invshift[b]] ^ inv_sbox[ciphertexts[:,b] ^ k]]

In [19]:
# The CPA implementation provided by scared is progressive so we can add batches of traces to the CPADistinguisher

d = scared.CPADistinguisher()
batchSize = 5000

for i in tqdm(range(traces.shape[0]//batchSize)):
    tracesBatch = traces[i*batchSize:i*batchSize+batchSize]
    labelsBatch = labels[i*batchSize:i*batchSize+batchSize]
    d.update(tracesBatch, labelsBatch)
    
cors = d.compute()

keyguess = np.argmax(np.max(np.abs((cors)), axis=1))

  0%|          | 0/40 [00:00<?, ?it/s]

At this point we have an array that contains 256 correlation traces, one for each possible value of the key.
The correct key guess is most likely the one that results in the highest absolute correlation value. The following plot overlays all correlation traces and plots the most likely key guess in red.

Hopefully you will see that the red correlation curve is clearly distinguishable from all others.

In [21]:
output_notebook()
p = figure(sizing_mode='scale_width', plot_height=300, plot_width=900)

x_range = range(0, traces.shape[1])
colors = itertools.cycle(palette) 

for i in range(256):
    if i == keyguess:
        p.line(x_range, np.abs(cors[keyguess,:]), color='red', legend_label=str(keyguess))
    else:
        p.line(x_range, np.abs(cors[i,:]), color='gray')

p.legend.click_policy="hide"
show(p)

## Recovering the full key

At this point we believe to have recovered the first key byte of the last round key. To recover the full key we have to repeat the attack 16 times, once for each key byte.

In [22]:
results = np.zeros((16,256))
keyguess = np.zeros((16), dtype='uint8')

batchSize = 5000

for b in tqdm(range(16)):
    if b == 0:
        print('index\tguess\tcorrelation difference')
    
    # Compute labels
    labels = np.zeros((traces.shape[0], 256), dtype='uint8')
    for k in range(labels.shape[1]):
        labels[:, k] = HW[ciphertexts[:, undo_invshift[b]] ^ inv_sbox[ciphertexts[:,b] ^ k]]
    
    # Compute correlation
    d = scared.CPADistinguisher()
    for i in range(traces.shape[0]//batchSize):
        tracesBatch = traces[i*batchSize:i*batchSize+batchSize]
        labelsBatch = labels[i*batchSize:i*batchSize+batchSize]
        d.update(tracesBatch, labelsBatch)
    cors = np.abs(d.compute())
    
    # Make a key guess and provide some statistics
    results[b,:] = np.max(cors, axis=1)
    ksort = np.argsort(np.max(cors, axis=1))[::-1]
    kguess_cor = np.max(cors[ksort[0],:])
    runnerup_cor = np.max(cors[ksort[1],:])
    
    keyguess[b] = ksort[0]
    
    print('%d\t%d\t%f' % (b, ksort[0], kguess_cor - runnerup_cor))

  0%|          | 0/16 [00:00<?, ?it/s]

index	guess	correlation difference
0	244	0.016284
1	200	0.017090
2	25	0.012659
3	3	0.009674
4	9	0.020025
5	18	0.013968
6	108	0.011804
7	247	0.015365
8	181	0.015608
9	42	0.009456
10	205	0.040275
11	236	0.023644
12	217	0.002489
13	173	0.022615
14	109	0.038697
15	197	0.022731


In [23]:
print(keyguess)

[244 200  25   3   9  18 108 247 181  42 205 236 217 173 109 197]


At this point we have a full key guess, so lets check if our key guess is correct.
Recall that we attacked the last round of the AES operation and thus obtained a key guess for the last round.
Luckily we can reverse the key schedule and obtain the first round key.

The key used to acquire the side-channel traces is stored in the `targetkey` variable.

In [24]:
roundkeys = scared.aes.base.inv_key_schedule(keyguess, round_in=10)
print(roundkeys[0,0])
print(targetkey)

[ 52 222 106 159 151  35 187 176 197 100  65   2 193 170  30 144]
[52, 222, 106, 159, 151, 35, 187, 176, 197, 100, 65, 2, 193, 170, 30, 144]


In some cases the first round key obtained from our keyguess and the actual targetkey are not the same, so the attacked failed.
This doesn't mean that all of our key byte guesses are wrong though! A single incorrectly guessed key byte will can cause this. While doing these experiments the first time around we noticed that key byte twelve is particularly difficult to recover with this attack. You can also see from the output of the attack (under correlation difference) that the guess of key byte 12 has the lowest 'confidence'.

We could try to improve the attack in several ways, but if one or a few key bytes are guessed incorrectly we might as well try all possible values. Alternatively you could look into key enumeration strategies, but we won't be needing those here.

In [26]:
# if the keyguess for byte 12 had been wrong...
keyguess[12] = ~keyguess[12]
roundkeys = scared.aes.base.inv_key_schedule(keyguess, round_in=10)
print(roundkeys[0,0])
print(targetkey)

[136 129  97 115 151  48 146 219 197 121 112   2  24 124 119 137]
[52, 222, 106, 159, 151, 35, 187, 176, 197, 100, 65, 2, 193, 170, 30, 144]


In [27]:
for k in range(256):
    keyguess[12] = k
    roundkeys = scared.aes.base.inv_key_schedule(keyguess, round_in=10)
    if list(roundkeys[0,0]) == list(targetkey):
        print('Key recovered!')
        print('Last round key:', keyguess)
        print('First round key:', roundkeys[0,0])

Key recovered!
Last round key: [244 200  25   3   9  18 108 247 181  42 205 236 217 173 109 197]
First round key: [ 52 222 106 159 151  35 187 176 197 100  65   2 193 170  30 144]
