# Topic 2, Part 2 - CPA on Hardware AES Implementation

---
NOTE: This lab references some (commercial) training material on [ChipWhisperer.io](https://www.ChipWhisperer.io). You can freely execute and use the lab per the open-source license (including using it in your own courses if you distribute similarly), but you must maintain notice about this source location. Consider joining our training course to enjoy the full experience.

---

**SUMMARY:** *By now you should have a pretty good understanding of how software implementations of AES are vulnerable to CPA attacks. You might be wondering: are hardware implementations of AES also vulnerable to CPA attacks?*

*In this lab, we'll perform a CPA attack on the hardware AES implementation in the STM32F415. We'll also introduce LASCAR for increased performance when analyzing large datasets.*

**LEARNING OUTCOMES:**
* Understanding how leakage differs between software AES and hardware AES implementations
* Using LASCAR for CPA attacks
* Identifying different leakage points

Capture traces as normal. We'll need to select the HWAES crypto target instead of TINYAES or MBEDTLS. Also we don't need to capture as many traces - the whole AES block will fit in less than 2000 traces. We'll also boost the gain a little bit - HWAES won't result in as big of power spikes:

In [1]:
SCOPETYPE = 'OPENADC'
PLATFORM = 'CW308_STM32F4'
CRYPTO_TARGET = 'HWAES'

In [2]:
%%bash -s "$PLATFORM" "$CRYPTO_TARGET"
cd ../../../hardware/victims/firmware/simpleserial-aes
make PLATFORM=$1 CRYPTO_TARGET=$2

Building for platform CW308_STM32F4 with CRYPTO_TARGET=HWAES
SS_VER set to SS_VER_1_1
Blank crypto options, building for AES128
rm -f -- simpleserial-aes-CW308_STM32F4.hex
rm -f -- simpleserial-aes-CW308_STM32F4.eep
rm -f -- simpleserial-aes-CW308_STM32F4.cof
rm -f -- simpleserial-aes-CW308_STM32F4.elf
rm -f -- simpleserial-aes-CW308_STM32F4.map
rm -f -- simpleserial-aes-CW308_STM32F4.sym
rm -f -- simpleserial-aes-CW308_STM32F4.lss
rm -f -- objdir/*.o
rm -f -- objdir/*.lst
rm -f -- simpleserial-aes.s simpleserial.s stm32f4_hal.s stm32f4_hal_lowlevel.s stm32f4_sysmem.s aes-independant.s
rm -f -- simpleserial-aes.d simpleserial.d stm32f4_hal.d stm32f4_hal_lowlevel.d stm32f4_sysmem.d aes-independant.d
rm -f -- simpleserial-aes.i simpleserial.i stm32f4_hal.i stm32f4_hal_lowlevel.i stm32f4_sysmem.i aes-independant.i
.
Welcome to another exciting ChipWhisperer target build!!
arm-none-eabi-gcc.exe (GNU Tools for ARM Embedded Processors 6-2017-q1-update) 6.3.1 20170215 (release) [ARM/embedded-

In file included from .././hal/stm32f4/stm32f4_hal.c:3:0:
 #define STM32F415xx
 
<command-line>:0:0: note: this is the location of the previous definition
In file included from .././hal/stm32f4/stm32f4_hal_lowlevel.c:39:0:
 #define STM32F415xx
 
<command-line>:0:0: note: this is the location of the previous definition


In [2]:
%run "../../Helper_Scripts/Setup_Generic.ipynb"

Serial baud rate = 38400
INFO: Found ChipWhisperer😍


In [3]:
fw_path = '../../../hardware/victims/firmware/simpleserial-aes/simpleserial-aes-{}.hex'.format(PLATFORM)
cw.program_target(scope, prog, fw_path)

Serial baud rate = 115200
Detected known STMF32: STM32F40xxx/41xxx
Extended erase (0x44), this can take ten seconds or more
Attempting to program 4367 bytes at 0x8000000
STM32F Programming flash...
STM32F Reading flash...
Verified flash OK, 4367 bytes
Serial baud rate = 38400


In [56]:
project = cw.create_project("32bit_AES.cwp", overwrite=True)

In [57]:
#Capture Traces
from tqdm import tnrange, trange
import numpy as np
import time

ktp = cw.ktp.Basic()

traces = []
N = 15000  # Number of traces
scope.adc.samples=2000

scope.gain.db = 38


for i in trange(N, desc='Capturing traces'):
    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here

    trace = cw.capture_trace(scope, target, text, key)
    if trace is None:
        continue
    project.traces.append(trace)

print(scope.adc.trig_count)

Capturing traces: 100%|██████████████████| 15000/15000 [06:20<00:00, 39.42it/s]

1832





## Introducing LASCAR

With how many traces we're capturing, analyzing our traces will take a lot of time with ChipWhisperer - Analyzer wasn't designed for performance. It is for this reason that we will be using LASCAR, an open source side channel analysis library with a bigger emphasis on speed than ChipWhisperer Analyzer. Normally, it would take a bit of work to massage ChipWhisperer into the LASCAR format; however, ChipWhisperer has recently integrated some basic LASCAR support, making it easy to combine LASCAR and ChipWhisperer projects! Note that this support is a WIP and not offically documented - the interface can change at any time!

Basic setup is as follows:

In [71]:
import chipwhisperer.common.api.lascar as cw_lascar
from lascar import *
cw_container = cw_lascar.CWContainer(project, project.textouts, start=None, end=None) #optional start and end args set start and end points for analysis
guess_range = range(256)

## Leakage Model

Thus far, we've been exclusively focusing on software AES. Here, each AES operation (shift rows, add round key, mix columns, etc) is implemented using one basic operation (XOR, reads/writes, multiplies, etc.) per clock cycle. With a hardware implementation, it's often possible to not only combine basic operations into a block that can run in a single clock cycle, but also combine multiple AES operations and run them in a single block! For example, the CW305 FPGA board can run each round of AES in a single clock cycle!

Because of this, running a CPA attack on hardware AES is much trickier than on software AES. In software, we found that it was easy to search for the outputs of the s-boxes because these values would need to be loaded from memory onto a high-capacitance data bus. This is not necessarily true for hardware AES, where the output of the s-boxes may be directly fed into the next stage of the algorithm. In general, we may need some more knowledge of the hardware implementation to successfully complete an attack. That being said, if we take a look at a block diagram of AES:

![](https://wiki.newae.com/images/8/8e/AES_Encryption.png)

the last round jumps out for a few reasons:

* It's not far removed from the ciphertext or the plaintext
* It's got an AddRoundKey and a SubBytes, meaning we get a nonlinear addition of the key between the ciphertext and the input of the round
* There's no Mix Columns

Let's make a guess at the implementation and say that it'll do the last round in a single clock cycle and store the input and output in the same memory block. Our reset assumption that allowed us to simply use the Hamming weight instead of the Hamming distance also probably won't be valid here. As such, let's use the Hamming distance between the output and the input of the last round.

ChipWhisperer now includes a few leakage models for use with LASCAR:

In [72]:
leakage = cw_lascar.lastround_HD_gen

Then, we can actually run the analysis. It should chew through our 15k traces in only a minute or two!

In [73]:
cpa_engines = [CpaEngine("cpa_%02d" % i, leakage(i), guess_range) for i in range(16)]
session = Session(cw_container, engines=cpa_engines).run(batch_size=50)

2020-06-25 13:54:15,231 - lascar.session - INFO - Session Session: 15000 traces, 18 engines, batch_size=50, leakage_shape=(2000,)
INFO:lascar.session:Session Session: 15000 traces, 18 engines, batch_size=50, leakage_shape=(2000,)
Session |100%||15000 trc/15000 | (18 engines, batch_size=50, leakage_shape=(2000,)) |Time:  0:00:50


Let's print out our results and plot the correlation of our guesses:

In [89]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()
p = figure()
key_guess = []
for i in range(16):
    results = cpa_engines[i].finalize()
    xrange = range(len(results[0xD0]))
    guess = abs(results).max(1).argmax()
    print("Best Guess is {:02X} (Corr = {})".format(guess, abs(results).max()))
    p.line(xrange, results[guess], color="red")
    key_guess.append(guess)
    
show(p)

Best Guess is 46 (Corr = 0.09317418248614152)
Best Guess is 14 (Corr = 0.0441912760822443)
Best Guess is F9 (Corr = 0.0676244343932001)
Best Guess is A8 (Corr = 0.04301744339084774)
Best Guess is A2 (Corr = 0.08204123866585493)
Best Guess is EE (Corr = 0.055352669848042844)
Best Guess is 25 (Corr = 0.07524755042227176)
Best Guess is 89 (Corr = 0.04430631852227416)
Best Guess is B9 (Corr = 0.09432552107253776)
Best Guess is 3F (Corr = 0.06062796030617779)
Best Guess is 0C (Corr = 0.06657933737090871)
Best Guess is C8 (Corr = 0.05832810065924525)
Best Guess is BB (Corr = 0.08423980929501468)
Best Guess is 63 (Corr = 0.04881991916738606)
Best Guess is 0C (Corr = 0.04492449427658079)
Best Guess is A6 (Corr = 0.048426521423534846)


ChipWhisperer also includes a class to interpret the results of the analysis:

In [75]:
import chipwhisperer.analyzer as cwa
last_round_key = cwa.aes_funcs.key_schedule_rounds(list(project.keys[0]),0,10)
disp = cw_lascar.LascarDisplay(cpa_engines, last_round_key)
disp.show_pge()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,46 0.093,14 0.044,F9 0.068,A8 0.043,A2 0.082,EE 0.055,25 0.075,89 0.044,B9 0.094,3F 0.061,0C 0.067,C8 0.058,BB 0.084,63 0.049,0C 0.045,A6 0.048
1,BB 0.090,71 0.037,32 0.037,B6 0.041,83 0.082,D7 0.043,57 0.041,31 0.040,BB 0.082,D3 0.039,F5 0.041,BC 0.040,B9 0.083,43 0.041,7F 0.039,4C 0.037
2,B9 0.086,59 0.036,07 0.037,CE 0.039,A9 0.078,C8 0.037,C1 0.038,A8 0.037,A9 0.080,8F 0.039,50 0.040,70 0.040,A9 0.083,48 0.039,09 0.038,0F 0.036
3,44 0.084,6F 0.036,D1 0.036,E2 0.038,EE 0.074,47 0.036,63 0.037,09 0.037,56 0.080,EA 0.038,3B 0.039,10 0.039,A2 0.079,4D 0.038,B8 0.037,B4 0.036
4,A2 0.083,8C 0.036,A9 0.036,47 0.037,B9 0.074,5A 0.036,52 0.037,26 0.036,44 0.079,13 0.037,37 0.038,C5 0.039,44 0.077,BC 0.037,B0 0.037,BC 0.036
5,EC 0.083,F6 0.035,D8 0.036,93 0.037,56 0.074,1F 0.036,53 0.036,AA 0.036,83 0.078,23 0.036,DA 0.037,E1 0.037,EC 0.077,86 0.036,A0 0.037,47 0.035
6,A9 0.082,18 0.035,66 0.035,0F 0.036,54 0.074,6E 0.036,C8 0.036,50 0.036,13 0.077,20 0.036,3E 0.036,E8 0.037,56 0.076,06 0.035,1C 0.036,DB 0.035
7,56 0.080,4D 0.035,F7 0.035,E8 0.036,B2 0.073,03 0.036,15 0.035,FA 0.036,5D 0.077,FF 0.036,57 0.036,7B 0.036,AB 0.070,73 0.035,A7 0.035,C7 0.035
8,54 0.080,63 0.034,8F 0.035,FA 0.035,99 0.072,CC 0.035,F5 0.035,41 0.036,93 0.074,4B 0.035,41 0.036,E0 0.036,46 0.070,6C 0.035,32 0.035,8A 0.034
9,13 0.078,B8 0.034,6B 0.035,48 0.035,11 0.071,CB 0.035,6D 0.035,0F 0.036,B2 0.073,C2 0.034,12 0.036,69 0.036,54 0.070,F4 0.035,3F 0.034,F9 0.034


Interestingly, you should see that the attack has worked fairly well for most of the bytes. All of them, in fact, except bytes 0, 4, 8, and 12. Looking the correlation plot, you should see two large spikes instead of one like you might expect. Try focusing the attack on either one of these points by adjusting `start=` and `end=` when making the `cw_container` and try answering the following questions:

* Which spike was our expected leakage actually at (last round state diff)?
* How might you be able to tell that the attack failed for certain bytes at the incorrect leakage point?
* Why might this other spike be occuring?

In [90]:
scope.dis()
target.dis()