# Breaking Hardware AES on CW305 FPGA

This tutorial relies on previous knowledge from the SCA101 course notebooks (in `../courses/sca101/`); make sure you go through these first to understand how a CPA attack works.

In this notebook, we'll apply knowledge from sca101 to break a hardware AES implementation on the CW305 Artix FPGA.

Some out-of-date background on the target FPGA project is can be found here: [Tutorial CW305-1 Building a Project](http://wiki.newae.com/Tutorial_CW305-1_Building_a_Project) (ignore the "capture setup" section, which uses the obsolete ChipWhisperer GUI; this notebook shows all you need to know about capture setup on the CW305 with Jupyter).

This notebook can also be used with the newer CW312T-A35 FPGA target.

## Background Theory
During this tutorial, we'll be working with a hardware AES implementation. This type of attack can be much more difficult than a software AES attack. In the software AES attacks, we needed hundreds or thousands of clock cycles to capture the algorithm's full execution. In contrast, a hardware AES implementation may have a variety of speeds. Depending on the performance of the hardware, a whole spectrum of execution speeds can be achieved by executing many operations in a single clock cycle. It is theoretically possible to execute the entire AES encryption in a single cycle, given enough hardware space and provided that the clock is not too fast. Most hardware accelerators are designed to complete one round or one large part of a round in a single cycle.

This fast execution may cause problems with a regular CPA attack. In software, we found that it was easy to search for the outputs of the s-boxes because these values would need to be loaded from memory onto a high-capacitance data bus. This is not necessarily true on an FPGA, where the output of the s-boxes may be directly fed into the next stage of the algorithm. In general, we may need some more knowledge of the hardware implementation to successfully complete an attack.

In our case, let's suppose that every round of AES is completed in a single clock cycle. Recall the execution of AES:

<img src="img/aes_operations.png" width="250">

Here, every blue block is executed in one clock cycle. This means that an excellent candidate for a CPA attack is the difference between the input and output of the final round. It is likely that this state is stored in a port that is updated every round, so we expect that the Hamming distance between the round input and output is the most important factor on the power consumption. Also, the last round is the easiest to attack because it has no MixColumns operation. We'll use this Hamming distance as the target in our CPA attack.

## Capture Notes

Most of the capture settings used below are similar to the standard ChipWhisperer scope settings. However, there are a couple of interesting points:

- We're only capturing 129 samples (the minimum allowed with CW-lite), and the encryption is completed in less than 60 samples with an x4 ADC clock. This makes sense - as we mentioned above, our AES implementation is computing each round in a single clock cycle.
- We're using EXTCLK x4 for our ADC clock. This means that the FPGA is outputting a clock signal, and we aren't driving it.

Other than these, the last interesting setting is the number of traces. By default, the capture software is ready to capture 5000 traces - many more than were required for software AES! It is difficult for us to measure the small power spikes from the Hamming distance on the last round: these signals are dwarfed by noise and the other operations on the chip. To deal with this small signal level, we need to capture many more traces.

## Capture Setup

Setup is somewhat similar to other targets, except that we are using an external clock (driven from the FPGA-- unless you're using the CW312T-A35 target). We'll also do the rest of the setup manually:

In [1]:
import chipwhisperer as cw
scope = cw.scope()
scope.adc.samples = 129
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
scope.trigger.triggers = "tio4"
scope.io.tio1 = "serial_rx"
scope.io.tio2 = "serial_tx"
scope.io.hs2 = "disabled"



Before setting the ADC clock, we connect to the CW305 board. Here we'll need to specify our bitstream file to load as well as the usual scope and target_type arguments.

Pick the correct bitfile for your CW305 board (e.g. either '35t' or '100t'). By setting `force=False`, the bitfile will only be programmed if the FPGA is uninitialized (e.g. after powering up). Change to `force=True` to always program the FPGA (e.g. if you have generated a new bitfile).

In [2]:
#TARGET_PLATFORM = 'CW305_100t'
#TARGET_PLATFORM = 'CW305_35t'
TARGET_PLATFORM = 'CW312T_A35'

In [3]:
if TARGET_PLATFORM == 'CW312T_A35':
    scope.gain.db = 45 # this is a good setting for the inductive shunt; if using another, adjust as needed
    scope.io.hs2 = 'clkgen'
    fpga_id = 'cw312t_a35'
    platform = 'ss2'
else:
    scope.gain.db = 25
    scope.io.hs2 = "disabled"
    platform = 'cw305'
    if TARGET_PLATFORM == 'CW305_100t':
        fpga_id = '100t'
    elif TARGET_PLATFORM == 'CW305_35t':
        fpga_id = '35t'

bitfile = '/home/cw/Desktop/ss2_aes_wrapper.bit'
target = cw.target(scope, cw.targets.CW305, force=True, fpga_id=fpga_id, platform=platform, bsfile = bitfile)
#target = cw.target(scope, cw.targets.CW305, force=True, fpga_id=fpga_id, platform=platform)



Next we set all the PLLs. We enable CW305's PLL1; this clock will feed both the target and the CW ADC. As explained [here](http://wiki.newae.com/Tutorial_CW305-1_Building_a_Project#Capture_Setup), **make sure the DIP switches on the CW305 board are set as follows**:
- J16 = 0
- K16 = 1

In [4]:
if TARGET_PLATFORM == 'CW305':
    target.vccint_set(1.0)
    # we only need PLL1:
    target.pll.pll_enable_set(True)
    target.pll.pll_outenable_set(False, 0)
    target.pll.pll_outenable_set(True, 1)
    target.pll.pll_outenable_set(False, 2)

    # run at 10 MHz:
    target.pll.pll_outfreq_set(10E6, 1)

    # 1ms is plenty of idling time
    target.clkusbautooff = True
    target.clksleeptime = 1

CW-Husky requires a different setup when the ADC clock is driven by the target:

In [5]:
if TARGET_PLATFORM == 'CW305':
    if scope._is_husky:
        scope.clock.clkgen_freq = 40e6
        scope.clock.clkgen_src = 'extclk'
        scope.clock.adc_mul = 4
        # if the target PLL frequency is changed, the above must also be changed accordingly
    else:
        scope.clock.adc_src = "extclk_x4"

If using the CW312T-A35 target, the capture hardware needs to drive the target clock:

In [6]:
if TARGET_PLATFORM == 'CW312T_A35':
    scope.clock.clkgen_freq = 7.37e6
    scope.io.hs2 = 'clkgen'
    if scope._is_husky:
        scope.clock.clkgen_src = 'system'
        scope.clock.adc_mul = 4
        scope.clock.reset_dcms()
    else:
        scope.clock.adc_src = "clkgen_x4"
    import time
    time.sleep(0.1)
    target._ss2_test_echo()
    

Finally, ensure the ADC clock is locked:

In [7]:
import time
for i in range(5):
    scope.clock.reset_adc()
    time.sleep(1)
    if scope.clock.adc_locked:
        break 
assert (scope.clock.adc_locked), "ADC failed to lock"

Occasionally the ADC will fail to lock on the first try; when that happens, the above assertion will fail (and on the CW-Lite, the red LED will be on). Simply re-running the above cell again should fix things.

## Trace Capture
Below is the capture loop. The main body of the loop loads some new plaintext, arms the scope, sends the key and plaintext, then finally records and appends our new trace to the `traces[]` list.

Because we're capturing 5000 traces, this takes a bit longer than the attacks against software AES implementations.

Note that the encryption result is read from the target and compared to the expected results, as a sanity check.

In [8]:
project_file = "projects/Tutorial_HW_CW305.cwp"
project = cw.create_project(project_file, overwrite=True)

In [9]:
from tqdm.notebook import tnrange
import numpy as np
import time
from Crypto.Cipher import AES

ktp = cw.ktp.Basic()

key, text = ktp.next()

In [10]:
ret = cw.capture_trace(scope, target, text, key)

In [11]:
ret.textout

[120, 113, 225, 62, 21, 35, 32, 168, 44, 235, 7, 102, 38, 230, 142, 133]

In [12]:
key

CWbytearray(b'2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c')

In [13]:
text

CWbytearray(b'53 0f f4 28 3d 8d f2 0e 87 1c 12 ee 2f 29 c1 b9')

In [14]:
0x2b^0x53

120

In [30]:
project.traces.append(traces)

(ChipWhisperer Other ERROR|File ProjectFormat.py:683) Invalid type appended to traces. Try appending cw.Trace(trace_data, textin, textout, key)


TypeError: Expected Trace object, got <class 'list'>.

In [31]:
traces

[array([ 0.18432617,  0.08862305, -0.07080078,  0.1027832 ,  0.20751953,
         0.06762695, -0.08911133,  0.06884766,  0.0390625 , -0.11694336,
        -0.28149414,  0.02368164,  0.0378418 , -0.06201172, -0.10205078,
         0.15893555,  0.36181641,  0.27148438,  0.08349609,  0.14916992,
         0.24072266,  0.06884766, -0.15722656,  0.01489258,  0.14282227,
         0.06079102, -0.11962891,  0.09423828,  0.22729492,  0.12744141,
        -0.02929688,  0.1328125 ,  0.21826172,  0.10107422, -0.09472656,
         0.04467773,  0.15087891,  0.02807617, -0.16772461,  0.02661133,
         0.1796875 ,  0.07861328, -0.07983398,  0.10473633,  0.2277832 ,
         0.12353516, -0.03369141,  0.10180664,  0.19677734,  0.09960938,
        -0.10205078,  0.07836914,  0.16333008,  0.05688477, -0.07983398,
         0.06982422,  0.18041992,  0.11279297, -0.10327148,  0.09130859,
         0.17919922,  0.0637207 , -0.13793945,  0.06176758,  0.17138672,
         0.08349609, -0.10302734,  0.05932617,  0.2

In [35]:
np.save('/home/cw/Desktop/traces.npy', traces)

In [39]:
np.save('/home/cw/Desktop/p.npy', textin)
np.save('/home/cw/Desktop/k.npy', keys)

In [None]:
np.save('/home/cw/Desktop/k.npy', textout)

In [41]:
t = np.load('/home/cw/Desktop/p.npy')

In [42]:
len(t)

50000

In [43]:
t[0]

array([ 78, 194, 181, 146,  42, 226, 156, 187, 231, 238, 241,  54,  45,
       238, 190,  80], dtype=uint8)

In [10]:
from tqdm.notebook import tnrange
import numpy as np
import time
from Crypto.Cipher import AES

ktp = cw.ktp.Basic()
N = 20000  # Number of traces
key, text = ktp.next()
cipher = AES.new(bytes(key), AES.MODE_ECB)

traces = []
textin = []
keys = []
textout = []
rets = []


for i in tnrange(N, desc='Capturing traces'):
    # run aux stuff that should come before trace here

    key1, text1 = ktp.next()  # manual creation of a key, text pair can be substituted here
    key2, text2 = ktp.next()
    key = text1
    text = text2

    
    
    ret = cw.capture_trace(scope, target, key, text)
    if not ret:
        print("Failed capture")
        continue
#     print(ret.textout[0])
#     print(text)
#     print(key)
    if ret.textout[0] != text[0]^key[0]:
        print('failed')

    #assert (list(ret.textout) == list((text^key))), "Incorrect encryption result!\nGot {}\nExp {}\n".format(ret.textout, list(text))
    #trace += scope.getLastTrace()
    textout.append(ret.textout)
    rets.append(ret)
    traces.append(ret.wave)
    project.traces.append(ret)


Capturing traces:   0%|          | 0/20000 [00:00<?, ?it/s]

In [26]:
np.save('/home/cw/Desktop/p.npy', textin)
np.save('/home/cw/Desktop/k.npy', keys)
np.save('/home/cw/Desktop/textout.npy', textout)
np.save('/home/cw/Desktop/traces.npy', traces)
np.save('/home/cw/Desktop/ret.npy', keys)

In [13]:
ret

Trace(wave=array([ 0.17260742,  0.06933594, -0.06542969,  0.08520508,  0.13183594,
        0.02319336, -0.08544922,  0.0625    ,  0.04760742, -0.03344727,
       -0.15698242,  0.09765625,  0.10009766, -0.01708984, -0.10131836,
        0.1496582 ,  0.31445312,  0.25146484,  0.10766602,  0.18701172,
        0.20874023,  0.04736328, -0.17504883, -0.03540039,  0.07983398,
        0.03393555, -0.09545898,  0.06762695,  0.19311523,  0.12109375,
       -0.0456543 ,  0.11865234,  0.18774414,  0.09545898, -0.04150391,
        0.08959961,  0.18261719,  0.1027832 , -0.02783203,  0.10766602,
        0.18164062,  0.10229492, -0.06811523,  0.08203125,  0.15698242,
        0.05126953, -0.08666992,  0.04931641,  0.17041016,  0.03198242,
       -0.08813477,  0.05224609,  0.14501953,  0.05004883, -0.09765625,
        0.07543945,  0.17651367,  0.09033203, -0.07885742,  0.08496094,
        0.18139648,  0.07763672, -0.07470703,  0.08081055,  0.20703125,
        0.13330078, -0.03686523,  0.09350586,  0.1801

In [25]:
for i in rets:
    keys.append(i.key)
    textin.append(i.textin)
    textout.append(i.textout)
    if (i.key[0]^i.textin[0] != i.textout[0]):
        print('error')
    

In [21]:
keys

[CWbytearray(b'd9 9c 03 c2 35 cf 85 ba e8 b1 ca 64 97 90 92 59'),
 CWbytearray(b'e6 a7 b7 6a 85 ea 3e e0 cc 41 c2 14 65 80 a3 a2'),
 CWbytearray(b'4c b1 f0 30 76 af 2d cd 7e c7 c9 96 b7 24 df f7'),
 CWbytearray(b'20 00 0f d4 64 53 45 fb 53 0f 5f 0d f6 77 9b 99'),
 CWbytearray(b'6b 27 cc 69 c7 70 bb ad 80 dd a5 9f 4e 5a ee bb'),
 CWbytearray(b'7d 40 b0 1d c3 6a 3c 21 66 c4 ed 08 dd 8e 0f 07'),
 CWbytearray(b'f5 1f cf b8 36 86 09 c0 dd 3f 06 3d 07 45 12 54'),
 CWbytearray(b'1d 76 e9 04 63 2b 0e ae 06 73 ef f5 35 7b 44 25'),
 CWbytearray(b'63 e4 25 5e 5b e3 d8 6a f0 d5 7c a6 bd 19 fe 54'),
 CWbytearray(b'2e 52 d1 21 8d 94 29 04 0a e4 7e 35 6e 31 0b 37'),
 CWbytearray(b'97 64 d9 de 10 a9 e1 29 fc 9b 96 b0 90 45 a9 0c'),
 CWbytearray(b'70 2c 7d 80 5e bc a6 5d a1 c8 f5 ee 60 ae 9c 39'),
 CWbytearray(b'6b c9 3a eb 7d 31 7e 81 5e 3d 9b fd ad 8d 40 86'),
 CWbytearray(b'24 c1 66 c2 0a e4 83 42 42 96 1a 20 94 31 f7 0e'),
 CWbytearray(b'a1 5a 46 32 9f 1e 6b eb f1 b6 bd 73 a4 e2 7c 3e'),
 CWbytearr

In [23]:
keys = []
textin = []
textout = []

In [20]:
project.save()
scope.dis()
target.dis()

In [21]:
traces[0]

array([ 0.18432617,  0.08862305, -0.07080078,  0.1027832 ,  0.20751953,
        0.06762695, -0.08911133,  0.06884766,  0.0390625 , -0.11694336,
       -0.28149414,  0.02368164,  0.0378418 , -0.06201172, -0.10205078,
        0.15893555,  0.36181641,  0.27148438,  0.08349609,  0.14916992,
        0.24072266,  0.06884766, -0.15722656,  0.01489258,  0.14282227,
        0.06079102, -0.11962891,  0.09423828,  0.22729492,  0.12744141,
       -0.02929688,  0.1328125 ,  0.21826172,  0.10107422, -0.09472656,
        0.04467773,  0.15087891,  0.02807617, -0.16772461,  0.02661133,
        0.1796875 ,  0.07861328, -0.07983398,  0.10473633,  0.2277832 ,
        0.12353516, -0.03369141,  0.10180664,  0.19677734,  0.09960938,
       -0.10205078,  0.07836914,  0.16333008,  0.05688477, -0.07983398,
        0.06982422,  0.18041992,  0.11279297, -0.10327148,  0.09130859,
        0.17919922,  0.0637207 , -0.13793945,  0.06176758,  0.17138672,
        0.08349609, -0.10302734,  0.05932617,  0.21411133,  0.12

In [28]:
len(traces)

50000

In [None]:
from tqdm.notebook import tnrange
import numpy as np
import time
from Crypto.Cipher import AES

ktp = cw.ktp.Basic()

traces = []
textin = []
keys = []
N = 5000  # Number of traces

# initialize cipher to verify DUT result:
key, text = ktp.next()
cipher = AES.new(bytes(key), AES.MODE_ECB)

for i in tnrange(N, desc='Capturing traces'):
    # run aux stuff that should come before trace here

    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here
    textin.append(text)
    keys.append(key)
    
    ret = cw.capture_trace(scope, target, text, key)
    if not ret:
        print("Failed capture")
        continue

    assert (list(ret.textout) == list(cipher.encrypt(bytes(text)))), "Incorrect encryption result!\nGot {}\nExp {}\n".format(ret.textout, list(text))
    #trace += scope.getLastTrace()
        
    traces.append(ret.wave)
    project.traces.append(ret)

This shows how a captured trace can be plotted:

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()
p = figure(plot_width=800)

xrange = range(len(traces[0]))
p.line(xrange, traces[0], line_color="red")
show(p)

Finally we save our traces and disconnect. By saving the traces, the attack can be repeated in the future without having to repeat the trace acquisition steps above.

In [None]:
project.save()
scope.dis()
target.dis()

## Attack
Now we re-open our saved project and specify the attack parameters. For this hardware AES implementation, we use a different leakage model and attack than what is used for the software AES implementations.

Note that this attack requires only the ciphertext, not the plaintext.

In [None]:
import chipwhisperer as cw
import chipwhisperer.analyzer as cwa
project_file = "projects/Tutorial_HW_CW305"
project = cw.open_project(project_file)
attack = cwa.cpa(project, cwa.leakage_models.last_round_state_diff)
cb = cwa.get_jupyter_callback(attack)

This runs the attack:

In [None]:
attack_results = attack.run(cb)

The attack results can be saved for later viewing or processing without having to repeat the attack:

In [None]:
import pickle
pickle_file = project_file + ".results.pickle"
pickle.dump(attack_results, open(pickle_file, "wb"))

You may notice that we didn't get the expected key from this attack, but still got a good difference in correlation between the best guess and the next best guess. This is because we actually recovered the key from the last round of AES. We'll need to use analyzer to get the actual AES key: 

In [None]:
from chipwhisperer.analyzer.attacks.models.aes.key_schedule import key_schedule_rounds
recv_lastroundkey = [kguess[0][0] for kguess in attack_results.find_maximums()]
recv_key = key_schedule_rounds(recv_lastroundkey, 10, 0)
for subkey in recv_key:
    print(hex(subkey))

## Tests
Check that the key obtained by the attack is the key that was used.
This attack targets the last round key, so we have to roll it back to compare against the key we provided.

In [None]:
key = list(key)
assert (key == recv_key), "Failed to recover encryption key\nGot:      {}\nExpected: {}".format(recv_key, key)

## Next steps

The `jupyter/demos/CW305_ECC/` folder contains a series of tutorials for attacking hardware ECC on the CW305.

This CW305 appnote contains additional details on the CW305 platform: http://media.newae.com/appnotes/NAE0010_Whitepaper_CW305_AES_SCA_Attack.pdf