# Topic 2, Part 2 - CPA on Hardware AES Implementation

---
NOTE: This lab references some (commercial) training material on [ChipWhisperer.io](https://www.ChipWhisperer.io). You can freely execute and use the lab per the open-source license (including using it in your own courses if you distribute similarly), but you must maintain notice about this source location. Consider joining our training course to enjoy the full experience.

---

**SUMMARY:** *By now you should have a pretty good understanding of how software implementations of AES are vulnerable to CPA attacks. You might be wondering: are hardware implementations of AES also vulnerable to CPA attacks?*

*In this lab, we'll perform a CPA attack on the hardware AES implementation in the STM32F415. We'll also introduce LASCAR for increased performance when analyzing large datasets.*

**LEARNING OUTCOMES:**
* Understanding how leakage differs between software AES and hardware AES implementations
* Using LASCAR for CPA attacks
* Identifying different leakage points

req 19k trace with usual gain

Capture traces as normal. We'll need to select the HWAES crypto target instead of TINYAES or MBEDTLS. Also we don't need to capture as many traces - the whole AES block will fit in less than 2000 traces. We'll also boost the gain a little bit - HWAES won't result in as big of power spikes:

In [1]:
SCOPETYPE = 'OPENADC'
PLATFORM = 'CW308_STM32F4'
CRYPTO_TARGET = 'HWAES'

In [2]:
%%bash -s "$PLATFORM" "$CRYPTO_TARGET"
cd ../../../hardware/victims/firmware/simpleserial-aes
make PLATFORM=$1 CRYPTO_TARGET=$2

Building for platform CW308_STM32F4 with CRYPTO_TARGET=HWAES
SS_VER set to SS_VER_1_1
Blank crypto options, building for AES128
rm -f -- simpleserial-aes-CW308_STM32F4.hex
rm -f -- simpleserial-aes-CW308_STM32F4.eep
rm -f -- simpleserial-aes-CW308_STM32F4.cof
rm -f -- simpleserial-aes-CW308_STM32F4.elf
rm -f -- simpleserial-aes-CW308_STM32F4.map
rm -f -- simpleserial-aes-CW308_STM32F4.sym
rm -f -- simpleserial-aes-CW308_STM32F4.lss
rm -f -- objdir/*.o
rm -f -- objdir/*.lst
rm -f -- simpleserial-aes.s simpleserial.s stm32f4_hal.s stm32f4_hal_lowlevel.s stm32f4_sysmem.s aes-independant.s
rm -f -- simpleserial-aes.d simpleserial.d stm32f4_hal.d stm32f4_hal_lowlevel.d stm32f4_sysmem.d aes-independant.d
rm -f -- simpleserial-aes.i simpleserial.i stm32f4_hal.i stm32f4_hal_lowlevel.i stm32f4_sysmem.i aes-independant.i
.
Welcome to another exciting ChipWhisperer target build!!
arm-none-eabi-gcc.exe (GNU Tools for ARM Embedded Processors 6-2017-q1-update) 6.3.1 20170215 (release) [ARM/embedded-

In file included from .././hal/stm32f4/stm32f4_hal.c:3:0:
 #define STM32F415xx
 
<command-line>:0:0: note: this is the location of the previous definition
In file included from .././hal/stm32f4/stm32f4_hal_lowlevel.c:39:0:
 #define STM32F415xx
 
<command-line>:0:0: note: this is the location of the previous definition


In [2]:
%run "../../Helper_Scripts/Setup_Generic.ipynb"

Serial baud rate = 38400
INFO: Found ChipWhisperer😍


In [3]:
fw_path = '../../../hardware/victims/firmware/simpleserial-aes/simpleserial-aes-{}.hex'.format(PLATFORM)
cw.program_target(scope, prog, fw_path)

Serial baud rate = 115200
Detected known STMF32: STM32F40xxx/41xxx
Extended erase (0x44), this can take ten seconds or more
Attempting to program 4367 bytes at 0x8000000
STM32F Programming flash...
STM32F Reading flash...
Verified flash OK, 4367 bytes
Serial baud rate = 38400


In [56]:
project = cw.create_project("32bit_AES.cwp", overwrite=True)

In [57]:
#Capture Traces
from tqdm import tnrange, trange
import numpy as np
import time

ktp = cw.ktp.Basic()

traces = []
N = 15000  # Number of traces
scope.adc.samples=2000

scope.gain.db = 38


for i in trange(N, desc='Capturing traces'):
    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here

    trace = cw.capture_trace(scope, target, text, key)
    if trace is None:
        continue
    project.traces.append(trace)

print(scope.adc.trig_count)

Capturing traces: 100%|██████████████████| 15000/15000 [06:20<00:00, 39.42it/s]

1832





## Introducing LASCAR

With how many traces we're capturing, analyzing our traces will take a lot of time with ChipWhisperer - Analyzer wasn't designed for performance. It is for this reason that we will be using LASCAR, an open source side channel analysis library with a bigger emphasis on speed than ChipWhisperer Analyzer. Normally, it would take a bit of work to massage ChipWhisperer into the LASCAR format; however, ChipWhisperer has recently integrated some basic LASCAR support, making it easy to combine LASCAR and ChipWhisperer projects! Note that this support is a WIP and not offically documented - the interface can change at any time!

Basic setup is as follows:

In [58]:
import chipwhisperer.common.api.lascar as cw_lascar
from lascar import *
cw_container = cw_lascar.CWContainer(project, project.textouts, 1140, 1160) #optional start and end args set start and end points for analysis
guess_range = range(256)

## Leakage Model

Thus far, we've been exclusively focusing on software AES. Here, each AES operation (shift rows, add round key, mix columns, etc) is implemented using one basic operation (XOR, reads/writes, multiplies, etc.) per clock cycle. With a hardware implementation, it's often possible to not only combine basic operations into a block that can run in a single clock cycle, but also combine multiple AES operations and run them in a single block! For example, the CW305 FPGA board can run each round of AES in a single clock cycle!

Because of this, running a CPA attack on hardware AES is much trickier than on software AES. In software, we found that it was easy to search for the outputs of the s-boxes because these values would need to be loaded from memory onto a high-capacitance data bus. This is not necessarily true for hardware AES, where the output of the s-boxes may be directly fed into the next stage of the algorithm. In general, we may need some more knowledge of the hardware implementation to successfully complete an attack. That being said, if we take a look at a block diagram of AES:

![](https://wiki.newae.com/images/8/8e/AES_Encryption.png)

the last round jumps out for a few reasons:

* It's not far removed from the ciphertext or the plaintext
* It's got an AddRoundKey and a SubBytes, meaning we get a nonlinear addition of the key between the ciphertext and the input of the round
* There's no Mix Columns

Let's make a guess at the implementation and say that it'll do the last round in a single clock cycle and store the input and output in the same memory block. Our reset 

In [64]:
leakage = cw_lascar.sboxInOut_HD_gen

In [65]:
cpa_engines = [CpaEngine("cpa_%02d" % i, leakage(i), guess_range) for i in range(16)]
session = Session(cw_container, engines=cpa_engines).run(batch_size=50)

2020-06-25 13:04:36,798 - lascar.session - INFO - Session Session: 15000 traces, 18 engines, batch_size=50, leakage_shape=(20,)
INFO:lascar.session:Session Session: 15000 traces, 18 engines, batch_size=50, leakage_shape=(20,)
Session |100%||15000 trc/15000 | (18 engines, batch_size=50, leakage_shape=(20,)) |Time:  0:00:10


In [66]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()
p = figure()
key_guess = []
for i in range(16):
    results = cpa_engines[i].finalize()
    xrange = range(len(results[0xD0]))
    guess = abs(results).max(1).argmax()
    print("Best Guess is {:02X} (Corr = {})".format(guess, abs(results).max()))
    p.line(xrange, results[guess], color="red")
    key_guess.append(guess)
    
show(p)

Best Guess is 6B (Corr = 0.03762624551445041)
Best Guess is CC (Corr = 0.03487534689737563)
Best Guess is C1 (Corr = 0.032644450776893395)
Best Guess is 4F (Corr = 0.033203414741610045)
Best Guess is 59 (Corr = 0.0329549415243069)
Best Guess is 1D (Corr = 0.03947250867842839)
Best Guess is 52 (Corr = 0.039308554620955054)
Best Guess is 7A (Corr = 0.03366780365339469)
Best Guess is B0 (Corr = 0.03608062434073711)
Best Guess is 3D (Corr = 0.030201100111510056)
Best Guess is 54 (Corr = 0.03127024507727141)
Best Guess is 02 (Corr = 0.029886033325560155)
Best Guess is 06 (Corr = 0.030932734035530183)
Best Guess is 85 (Corr = 0.03248011022736156)
Best Guess is 8D (Corr = 0.03319938237937852)
Best Guess is D7 (Corr = 0.033099765515097454)


In [67]:
key_guess

[107, 204, 193, 79, 89, 29, 82, 122, 176, 61, 84, 2, 6, 133, 141, 215]

In [68]:
import chipwhisperer.analyzer as cwa
last_round_key = cwa.aes_funcs.key_schedule_rounds(list(project.keys[0]),0,10)
disp = cw_lascar.LascarDisplay(cpa_engines, last_round_key)

In [69]:
disp.show_pge()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,6B 0.038,CC 0.035,C1 0.033,4F 0.033,59 0.033,1D 0.039,52 0.039,7A 0.034,B0 0.036,3D 0.030,54 0.031,02 0.030,06 0.031,85 0.032,8D 0.033,D7 0.033
1,DB 0.033,18 0.030,9B 0.031,39 0.029,C6 0.031,C4 0.033,3F 0.030,7C 0.033,0F 0.031,CB 0.030,19 0.031,EC 0.030,F0 0.029,01 0.029,3C 0.031,09 0.030
2,74 0.032,08 0.029,C8 0.028,33 0.029,56 0.030,7B 0.029,B4 0.030,8C 0.032,F6 0.030,B1 0.029,76 0.029,A5 0.029,F7 0.029,E9 0.027,AA 0.031,35 0.028
3,29 0.031,AD 0.029,24 0.028,09 0.029,BA 0.030,CA 0.028,EA 0.029,29 0.028,4E 0.030,9D 0.028,3A 0.028,C7 0.028,C4 0.028,AD 0.027,56 0.030,01 0.028
4,39 0.031,C9 0.028,A5 0.028,53 0.027,EA 0.028,96 0.027,71 0.029,03 0.027,96 0.030,E6 0.028,55 0.027,3C 0.028,0E 0.028,D2 0.026,6D 0.030,C2 0.027
5,F9 0.030,AE 0.028,93 0.027,E4 0.026,D3 0.027,46 0.027,68 0.029,5F 0.027,BE 0.029,65 0.027,49 0.027,5B 0.028,DB 0.026,3D 0.026,6E 0.029,FD 0.026
6,65 0.029,A2 0.027,07 0.025,E5 0.026,05 0.027,F9 0.027,95 0.029,47 0.027,03 0.029,21 0.026,66 0.026,42 0.027,33 0.026,BD 0.026,45 0.029,AD 0.026
7,D0 0.029,94 0.027,37 0.025,2A 0.026,02 0.027,3A 0.027,91 0.027,E8 0.027,CF 0.029,3F 0.026,90 0.026,C8 0.026,CD 0.026,26 0.026,22 0.028,7E 0.026
8,21 0.028,42 0.027,8E 0.025,32 0.026,7B 0.027,1A 0.027,B1 0.026,5A 0.027,54 0.028,D4 0.026,4C 0.026,63 0.026,D9 0.026,9B 0.025,AC 0.028,1A 0.026
9,43 0.027,48 0.026,77 0.025,A2 0.026,C8 0.026,0D 0.026,7C 0.026,F9 0.026,AE 0.028,E8 0.026,EE 0.026,A1 0.026,07 0.025,F2 0.025,FC 0.027,88 0.026
