# Part 4, Topic 3: ChipWhisperer Analyzer CPA Attack (MAIN)


---
NOTE: This lab references some (commercial) training material on [ChipWhisperer.io](https://www.ChipWhisperer.io). You can freely execute and use the lab per the open-source license (including using it in your own courses if you distribute similarly), but you must maintain notice about this source location. Consider joining our training course to enjoy the full experience.

---

**SUMMARY:** *Last time, we saw how correlation can be used to recover an AES key, as well as the effectiveness of such an attack. In this lab, we'll repeat the attack with ChipWhisperer Analyzer and gain some additional information about the attack*

**LEARNING OUTCOMES:**

* Use ChipWhisperer Analyzer to perform a CPA attack
* Plot additional information about the attack

## Prerequisites

Hold up! Before you continue, check you've done the following tutorials:

* ☑ CPA on Firmware Implementation of AES (you should understand how a CPA attack works).
* ☑ SCA101 Intro (you should have an idea of how to get hardware-specific versions running).

## Projects

There's no need for any models or SBox implementaions, or anything like that this time. Instead, everything's contained in ChipWhisperer Analyzer. Another change from previous tutorials is that we're using ChipWhisperer projects instead of numpy arrays, since most of ChipWhisperer Analyzer only works with ChipWhisperer projects.

As usual, see the associated notebook for details of the trace capture (or existing project) and copy below:

In [1]:
SCOPE="OPENADC"
PLATFORM="CWLITEXMEGA"
CRYPTO_TARGET="AVRCRYPTOLIB"
SS_VER = "SS_VER_1_1"

In [2]:
%run "Lab 4_3 - ChipWhisperer Analyzer CPA Attack (HARDWARE).ipynb"

INFO: Found ChipWhisperer😍
Building for platform CWLITEXMEGA with CRYPTO_TARGET=AVRCRYPTOLIB
SS_VER set to SS_VER_1_1
Blank crypto options, building for AES128
rm -f -- simpleserial-aes-CWLITEXMEGA.hex
rm -f -- simpleserial-aes-CWLITEXMEGA.eep
rm -f -- simpleserial-aes-CWLITEXMEGA.cof
rm -f -- simpleserial-aes-CWLITEXMEGA.elf
rm -f -- simpleserial-aes-CWLITEXMEGA.map
rm -f -- simpleserial-aes-CWLITEXMEGA.sym
rm -f -- simpleserial-aes-CWLITEXMEGA.lss
rm -f -- objdir-CWLITEXMEGA/*.o
rm -f -- objdir-CWLITEXMEGA/*.lst
rm -f -- simpleserial-aes.s simpleserial.s XMEGA_AES_driver.s uart.s usart_driver.s xmega_hal.s aes-independant.s aes_enc.s aes_keyschedule.s aes_sbox.s aes128_enc.s
rm -f -- simpleserial-aes.d simpleserial.d XMEGA_AES_driver.d uart.d usart_driver.d xmega_hal.d aes-independant.d aes_enc.d aes_keyschedule.d aes_sbox.d aes128_enc.d
rm -f -- simpleserial-aes.i simpleserial.i XMEGA_AES_driver.i uart.i usart_driver.i xmega_hal.i aes-independant.i aes_enc.i aes_keyschedule.i aes_sb

Capturing traces:   0%|          | 0/50 [00:00<?, ?it/s]

Before we continue on with our CPA attack, let's take a quick look at the projects:

In [3]:
# we can access wave(a.k.a. trace_array), textin, etc as a whole with proj.traces
for trace in proj.traces:
    print(trace.wave, trace.textin, trace.textout, trace.key)
# can also access individually with proj.waves, proj.textins, etc.
for wave in proj.waves:
    print(wave[0])
# print(np.shape(proj.waves[:]))
# proj.keys

[0.10742188 0.12597656 0.13964844 ... 0.15722656 0.03417969 0.04101562] CWbytearray(b'22 2b 1b d9 7b b3 13 49 5e 10 71 c3 1d 84 3d cd') CWbytearray(b'8e 55 45 54 2d c3 ae a9 84 f9 df 17 24 a5 91 94') CWbytearray(b'2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c')
[0.11914062 0.14257812 0.14550781 ... 0.15625    0.04003906 0.0390625 ] CWbytearray(b'6c 8d bc 18 bc 1a c9 22 5b f0 9d cd 54 a5 25 6d') CWbytearray(b'59 7d b8 78 7b b9 1d 19 c2 96 27 15 20 7d f4 23') CWbytearray(b'2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c')
[0.11425781 0.13183594 0.14257812 ... 0.15527344 0.03808594 0.04394531] CWbytearray(b'29 a8 75 30 0e c4 46 1f 1b 02 a2 a3 30 a9 dd 64') CWbytearray(b'21 0a 2a 4f 72 74 1a 1f 42 6d 6d 70 e8 69 c3 16') CWbytearray(b'2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c')
[0.11523438 0.140625   0.14355469 ... 0.15722656 0.03320312 0.03125   ] CWbytearray(b'a5 ba 92 63 b8 f2 20 14 18 62 a8 5e aa dc 18 50') CWbytearray(b'b1 06 5f 04 a6 a0 60 65 e3 2f ff 77 98 1c d4 e0') CWbytear

## ChipWhisperer Analyzer

We can access Analyzer via `chipwhisperer.analyzer`:

In [4]:
import chipwhisperer.analyzer as cwa

We also have to set our leakage model to be the SBox output. ChipWhisperer Analyzer includes a bunch of different leakage models which are useful in different situations. We'll look more at that in SCA201.

In [5]:
leak_model = cwa.leakage_models.sbox_output

The rest of the setup only takes 1 line:

In [6]:
attack = cwa.cpa(proj, leak_model)
# from chipwhisperer.analyzer.attacks.cpa_new import CPA
# print(attack.known_key())

If you want to see the attack settings, you can print the cpa object:

In [7]:
print(attack)

<chipwhisperer.analyzer.attacks.cpa_new.CPA object at 0x7f5304583590>
project     = <chipwhisperer.common.api.ProjectFormat.Project object at 0x7f5304583990>
leak_model  = <chipwhisperer.analyzer.attacks.models.AES128_8bit.AES128_8bit object at 0x7f530459ae10>
algorithm   = <chipwhisperer.analyzer.attacks.cpa_algorithms.progressive.CPAProgressive object at 0x7f52d6550f10>
trace_range = [0, 50]
point_range = [0, 5000]
subkey_list = range(0, 16)



Running the attack is also done in a single line:

In [8]:
results = attack.run()

Let's see if we got the AES key:

In [9]:
print(results)

Subkey KGuess Correlation
  00    0x2B    0.81440
  01    0x7E    0.86644
  02    0x15    0.75698
  03    0x16    0.78012
  04    0x28    0.82423
  05    0xAE    0.82611
  06    0xD2    0.83488
  07    0xA6    0.81697
  08    0xAB    0.88867
  09    0xF7    0.73794
  10    0x15    0.92607
  11    0x88    0.85840
  12    0x09    0.82099
  13    0xCF    0.79437
  14    0x4F    0.76138
  15    0x3C    0.83755



We can get the full information from the attack by calling `results.find_maximums()`, which returns:

```Python
find_maxiums() ->
    [subkey0_data, subkey1_data, subkey2_data, ...]
    
subkey0_data ->
    [guess0, guess1, guess2, ...]
    
guess0 ->
    (key_guess, location_of_max, correlation)
```

For example, if you want to print the correlation of the third best guess of the 4th subkey, you would run:

```python
print(attack_results.find_maximums()[4][3][2])
```

Note the "point location of the max" is normally not calculated/tracked, and thus returns as a 0. Using the pandas library lets us print them nicely in a DataFrame. We have to transpose the frame to get our expected orientation:

In [10]:
import pandas as pd
stat_data = results.find_maximums()
df = pd.DataFrame(stat_data).transpose()
print(df.head())

                             0                             1   \
0   [43, 0, 0.8143974955500567]  [126, 0, 0.8664412808369742]   
1  [243, 0, 0.6374081197542437]  [127, 0, 0.6788186033422249]   
2   [42, 0, 0.6318849326681818]  [153, 0, 0.6380487878932812]   
3  [154, 0, 0.6295544531539015]  [135, 0, 0.6276715094543264]   
4  [145, 0, 0.6275370018568226]   [77, 0, 0.6220499502392447]   

                             2                             3   \
0   [21, 0, 0.7569833791333525]   [22, 0, 0.7801172968532625]   
1  [177, 0, 0.6749333754828293]  [131, 0, 0.6470365937058445]   
2    [5, 0, 0.6311578735032589]     [54, 0, 0.62772344723351]   
3  [168, 0, 0.6106052765023195]   [86, 0, 0.6208677451043874]   
4   [74, 0, 0.6051186017178136]     [1, 0, 0.620280682086912]   

                             4                             5   \
0   [40, 0, 0.8242318813940666]  [174, 0, 0.8261100735666651]   
1  [222, 0, 0.6191076544296469]   [50, 0, 0.6394739051635645]   
2   [176, 0, 0.61755125

Even better, we can use the `.style` method to customize this further. This also lets us chain formatting functions. For example, we can remove the extra 0 and clean up the data. Since we know the correct key, we can even do things like printing the key in a different colour! 

You can do lots of formatting thanks to the pandas library! Check out https://pandas.pydata.org/pandas-docs/stable/style.html for more details.

In [11]:
import pandas as pd
key = proj.keys[0]
def format_stat(stat):
    return str("{:02X}<br>{:.3f}".format(stat[0], stat[2]))

def color_corr_key(row):
    global key
    ret = [""] * 16
    for i,bnum in enumerate(row):
        if bnum[0] == key[i]:
            ret[i] = "color: green"
        else:
            ret[i] = ""
    return ret
df.head().style.format(format_stat).apply(color_corr_key, axis=1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,2B 0.814,7E 0.866,15 0.757,16 0.780,28 0.824,AE 0.826,D2 0.835,A6 0.817,AB 0.889,F7 0.738,15 0.926,88 0.858,09 0.821,CF 0.794,4F 0.761,3C 0.838
1,F3 0.637,7F 0.679,B1 0.675,83 0.647,DE 0.619,32 0.639,D3 0.714,A7 0.685,5B 0.655,BA 0.698,14 0.762,60 0.637,5C 0.634,5C 0.640,D3 0.678,3D 0.624
2,2A 0.632,99 0.638,05 0.631,36 0.628,B0 0.618,79 0.627,FE 0.690,44 0.634,B7 0.638,8A 0.660,78 0.645,70 0.617,D5 0.618,09 0.620,42 0.618,BF 0.622
3,9A 0.630,87 0.628,A8 0.611,56 0.621,2E 0.613,1C 0.609,DA 0.629,AF 0.631,13 0.629,45 0.641,44 0.616,57 0.611,98 0.603,6F 0.615,41 0.617,88 0.593
4,91 0.628,4D 0.622,4A 0.605,01 0.620,6C 0.603,80 0.600,57 0.623,06 0.626,41 0.624,A2 0.610,2F 0.614,CA 0.593,78 0.599,49 0.602,7F 0.615,C1 0.587


You should see red numbers printed at the top of a table. Congratulations, you've now completed a successful CPA attack against AES!

Next, we'll look at how we can use some of Analyzer's other features to improve the attack process, as well as better interpret the data we have.

## Reporting Intervals

When we ran `attack.run()`, we processed all of the traces before getting any information back. ChipWhisperer Analyzer actually uses the "online" correlation calculation that we mentioned last time, meaning we can get feedback during the attack. This can be done by creating a callback function and passing it to `attack.run()`. This function is called each time we pass the update interval (default 25, which is the second parameter for `attack.run()`).

Let's use this to update our table every 10 traces. Most of this is just putting our existing code into the callback function. We also need use the `clear_output` function to clear the table, as well as `display()` to actually get it to show up:

In [12]:
from IPython.display import clear_output
import numpy as np
import pandas as pd
def stats_callback():
    results = attack.results
    results.set_known_key(key)
    stat_data = results.find_maximums()
    df = pd.DataFrame(stat_data).transpose()
    clear_output(wait=True)
    display(df.head().style.format(format_stat).apply(color_corr_key,axis=1))
    
results = attack.run(stats_callback, 10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,2B 0.814,7E 0.866,15 0.757,16 0.780,28 0.824,AE 0.826,D2 0.835,A6 0.817,AB 0.889,F7 0.738,15 0.926,88 0.858,09 0.821,CF 0.794,4F 0.761,3C 0.838
1,F3 0.637,7F 0.679,B1 0.675,83 0.647,DE 0.619,32 0.639,D3 0.714,A7 0.685,5B 0.655,BA 0.698,14 0.762,60 0.637,5C 0.634,5C 0.640,D3 0.678,3D 0.624
2,2A 0.632,99 0.638,05 0.631,36 0.628,B0 0.618,79 0.627,FE 0.690,44 0.634,B7 0.638,8A 0.660,78 0.645,70 0.617,D5 0.618,09 0.620,42 0.618,BF 0.622
3,9A 0.630,87 0.628,A8 0.611,56 0.621,2E 0.613,1C 0.609,DA 0.629,AF 0.631,13 0.629,45 0.641,44 0.616,57 0.611,98 0.603,6F 0.615,41 0.617,88 0.593
4,91 0.628,4D 0.622,4A 0.605,01 0.620,6C 0.603,80 0.600,57 0.623,06 0.626,41 0.624,A2 0.610,2F 0.614,CA 0.593,78 0.599,49 0.602,7F 0.615,C1 0.587


A default jupyter callback is also available - the following **three lines** are all you need to run an attack!

In [13]:
import chipwhisperer as cw
cb = cwa.get_jupyter_callback(attack)
results = attack.run(cb, 5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
PGE=,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,2B 0.814,7E 0.866,15 0.757,16 0.780,28 0.824,AE 0.826,D2 0.835,A6 0.817,AB 0.889,F7 0.738,15 0.926,88 0.858,09 0.821,CF 0.794,4F 0.761,3C 0.838
1,F3 0.637,7F 0.679,B1 0.675,83 0.647,DE 0.619,32 0.639,D3 0.714,A7 0.685,5B 0.655,BA 0.698,14 0.762,60 0.637,5C 0.634,5C 0.640,D3 0.678,3D 0.624
2,2A 0.632,99 0.638,05 0.631,36 0.628,B0 0.618,79 0.627,FE 0.690,44 0.634,B7 0.638,8A 0.660,78 0.645,70 0.617,D5 0.618,09 0.620,42 0.618,BF 0.622
3,9A 0.630,87 0.628,A8 0.611,56 0.621,2E 0.613,1C 0.609,DA 0.629,AF 0.631,13 0.629,45 0.641,44 0.616,57 0.611,98 0.603,6F 0.615,41 0.617,88 0.593
4,91 0.628,4D 0.622,4A 0.605,01 0.620,6C 0.603,80 0.600,57 0.623,06 0.626,41 0.624,A2 0.610,2F 0.614,CA 0.593,78 0.599,49 0.602,7F 0.615,C1 0.587


Here we used a reporting interval of 10 traces. Depending on the attack and what you want to learn from it, you may want to use higher or lower values: in general reporting less often is faster, but more frequent reporting can allow you to end a long attack early. More frequent reporting also increases the resolution of some plot data (which we will look at next).

## Plot Data

Analyzer also includes a module to create plots to help you interpret the data. These act on one subkey at a time and return some data that we can plot using bokeh (or your graphing module of choice). Let's start by grabbing the class that does all the calculations:

In [14]:
plot_data = cwa.analyzer_plots(results)

We'll start by looking at the Output Vs. Time module, which will allow us to plot correlation of our guesses in time. This is useful for finding exactly where the operations we're attacking are. Like in previous tutorials, we'll use bokeh to plot the data we get back.

The method we're interested in is `get_plot_data(bnum)`, which returns in a list: `[xrange, correct_key, incorrect_key_data, incorrect_key_data]` for the position `bnum` passed to it. The method returns two sets of incorrect key data because one is for the key guesses below the correct one, and the other is for guesses above the correct one.

We'll have a lot of points, so we'll plot as usual, but at the end decimate the output:

In [19]:
def byte_to_color(idx):
    return hv.Palette.colormaps['Category20'](idx/16.0)

import holoviews as hv
from holoviews.operation.datashader import datashade, shade, dynspread, rasterize
from holoviews.operation import decimate
import pandas as pd, numpy as np

a = []
b = []
c = []
hv.extension('bokeh')
for i in range(0, 16):
    data = plot_data.output_vs_time(i)
    a.append(np.array(data[1]))
    b.append(np.array(data[2]))
    c.append(np.array(data[3]))
    
pda = pd.DataFrame(a).transpose().rename(str, axis='columns')
pdb = pd.DataFrame(b).transpose().rename(str, axis='columns')
pdc = pd.DataFrame(c).transpose().rename(str, axis='columns')

# pda = pda[:2000]
# pdb = pdb[:2000]
# pdc = pdc[:2000]

curve = hv.Curve(pdb['0'], "Sample").options(color='black')
for i in range(1, 16):
    curve *= hv.Curve(pdb[str(i)]).options(color='black')
for i in range(0, 16):
    curve *= hv.Curve(pdc[str(i)]).options(color='black')
for i in range(0, 16):
    curve *= hv.Curve(pda[str(i)]).options(color=byte_to_color(i))
decimate(curve.opts(width=900, height=600))



You should see some distinctive spikes in your plot. The largest of these is where the sbox lookup is actually happening (the smaller ones are typically other AES operations that move the sbox data around). We are normally talking absolute values, so you'll see negatives in there.

This information can be useful in many ways. For example, you can probably see the first 16 spikes that make up the sbox lookup are a small portion of the total trace length. If we ever needed to rerun the attack, we could capture a much smaller number of samples and speed up analysis significantly!

### PGE vs. Traces

The next data we'll look at is a plot of partial guessing entropy (PGE) vs. the number of traces. As mentioned before, PGE is just how many spots away from the top the actual subkey is in our table of guesses. For example, if there are 7 subkey guesses that have a higher correlation than the actual subkey, the subkey has a PGE of 7.

This plot is useful for seeing how many traces were needed to actually break the AES implementation. Keep in mind, however, that the resolution of the plot is determined by the reporting interval (also note that `attack_results.find_maximums()` must be called in the callback function). In our case, we have a reporting interval of 10, so we'll have a resolution of 10 traces.

This method is similar to the previous plot in that it takes `bnum` as an argument and returns a list of `[xrange, PGE]`. 

In [20]:
ret = plot_data.pge_vs_trace(0)
curve = hv.Curve((ret[0],ret[1]), "Traces Used in Calculation", "Partial Guessing Entrop of Byte")
for bnum in range(1, 16):
    ret = plot_data.pge_vs_trace(bnum)
    curve *= hv.Curve((ret[0],ret[1])).opts(color=byte_to_color(bnum))
curve.opts(width=900, height=600)

You should see a number of lines that start off with high values, then rapidly drop off. You may notice that we broke the AES implementation without needing to use all of our traces. 

Even though we may have broken the AES implementation in fewer traces, we may not want to reduce how many traces we capture. Remember that, while we know the key here, for a real attack we won't and therefore must use the correlation to determine when we've broken a key. Our next plot will help us to determine how feesible capturing fewer traces is.

### Correlation vs. Traces

The last plot we'll take a look at is correlation vs the number of traces. Like with PGE vs. Traces, this plot's resolution is determined by the reporting interval (10 in our case). This method returns a list of `[xrange, [data_for_kguess]]`, so we'll need to plot each guess for each subkey. Like before, we'll do the plot for the correct subkey in a changing color and the rest in black.

As you will see, all the subkey guesses start of with large correlations, but all of them except for the correct guess quickly drop off. If you didn't know the key, at what point would you be sure that the guess with the highest correlation was actually the correct subkey?

Let's continue and plot the correlations for the right guess and the next best one:

In [41]:
a = []
b = []
for bnum in range(0, 16):
    data = plot_data.corr_vs_trace(bnum)
    best = [0] * len(data[1][0])
    for i in range(256):
        if i == key[bnum]:
            a.append(np.array(data[1][i]))
        else:
            if max(best) < max(data[1][i]): best = data[1][i]
    b.append(np.array(best))

print(np.shape(a))
print(np.shape(b))


pda = pd.DataFrame(a).transpose().rename(str, axis='columns')
pdb = pd.DataFrame(b).transpose().rename(str, axis='columns')

print(pda)

curve = hv.Curve(pdb['0'].tolist(), "Iteration Number", "Max Correlation").options(color='black')
for i in range(1,len(pdb.columns)):
    curve *= hv.Curve(pdb[str(i)]).options(color='black')
    
for i in range(len(pda.columns)):
    curve *= hv.Curve(pda[str(i)]).options(color=byte_to_color(i))
            
curve.opts(width=900, height=600)

(16, 10)
(16, 10)
          0         1         2         3         4         5         6  \
0  0.998288  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000   
1  0.979535  0.934784  0.892428  0.897837  0.921250  0.883280  0.954796   
2  0.889171  0.855818  0.812705  0.828411  0.883856  0.834709  0.903747   
3  0.882907  0.863381  0.801078  0.824567  0.870050  0.772571  0.884029   
4  0.811313  0.874652  0.766828  0.787167  0.832245  0.782416  0.895305   
5  0.803823  0.865363  0.783086  0.772198  0.822227  0.765777  0.870665   
6  0.829158  0.851119  0.772597  0.751609  0.832520  0.842748  0.850765   
7  0.818677  0.867492  0.782823  0.736600  0.820687  0.863859  0.832516   
8  0.828665  0.879365  0.766181  0.732570  0.832498  0.836501  0.839269   
9  0.814397  0.866441  0.756983  0.780117  0.824232  0.826110  0.834884   

          7         8         9        10        11        12        13  \
0  0.996790  0.997712  1.000000  0.993422  1.000000  0.997780  0.998583   
1  0.9

## Conclusions & Next Steps

As you've seen, Analyzer makes launching a CPA attack much easier than our manual way. It also has the advantage of capturing some interesting data for us, and reporting the attack success every so often.

Congratulations, you've reached the end of the main part of SCA101! If you've got a ChipWhisperer-Lite or ChipWhisperer 1200 (Pro), there's a bonus lab that will showcase using a more realisitc trigger. This is also discussed in the slides and training videos.

---
<small>NO-FUN DISCLAIMER: This material is Copyright (C) NewAE Technology Inc., 2015-2020. ChipWhisperer is a trademark of NewAE Technology Inc., claimed in all jurisdictions, and registered in at least the United States of America, European Union, and Peoples Republic of China.

Tutorials derived from our open-source work must be released under the associated open-source license, and notice of the source must be *clearly displayed*. Only original copyright holders may license or authorize other distribution - while NewAE Technology Inc. holds the copyright for many tutorials, the github repository includes community contributions which we cannot license under special terms and **must** be maintained as an open-source release. Please contact us for special permissions (where possible).

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</small>