# Correlation Power Analysis Attack
## Hardware Security Project
### Master of Cybersecurity

The goal of the project is to implement a correlation power analysis (CPA) attack to the 128-bit AES encryption algorithm and retrieve the 16 bytes of the key.

The students will be provided with a set of power consumption traces acquired when a microcontroller (PIC18F4520) is running the AES encryption algorithm over several plaintexts. The current is probed via the voltage drop across a series connected resistor as indicated in Figure 1.

![Figure 1: Diagram of the experimental setup to acquire the power consumption traces](assets/image.png)

_Figure 1: Diagram of the experimental setup to acquire the power consumption traces_

Experimental setup information: The microcontroller runs at 250 kHz (clock signal) and the
oscilloscope samples at a rate of 25 Msamples/s. The power supply is 5 V and the series connected
resistor has a value of 100 Ω.

## Dataset 1

Dataset 1 contains:

- **cleartext.txt** 

of 150 lines and 16 columns, contains the 150 plaintexts of 16 bytes fed to the AES encryption algorithm implemented in the microcontroller

- **trace{N}.txt** 

of 150 lines and 50000 columns, contains the current consumption traces for every plaintext during the time span the byte *{N}* of the key is used in the first key addition and first SBox. Each line is a trace made up of 50000 oscilloscope samples. The units of the current consumption traces are arbitrary

## Performing the attack

- read the data into numpy arrays

Initialize 2D numpy array for the cleartexts
- Type: int
- Dimmensions: 150 lines * 16 bytes

Initilize a 3D numpy array to hold all the traces
- Type: float
- Dimmensions: 16 files * 150 lines * 50000 values

- compute hamming weights

For each possible key byte value (0x00 to 0xFF), we calculated what the output of the AES S-box was after the first ``AddRoundKey`` operation for each plaintext.

We used the hamming weights model to estimate the current consumption for each key byte.

- Compute correlation matrix

To find the key bytes we correlated the trace sets we had with the hamming weight model estimating the current consumption that we computed just before.



## Observations / Issues

- Compute correlation matrix

This is were we had most of the issues. Our code wasn't very efficient and our computers not the fastests, the first version was estimated to finish in about ~2 million hours.

After some optimization we went down to ~30 hours. But it was still too slow.

Finally one of us had the idea to parallelize the computation of each key bytes using the computer CPU cores. The time needed went down to less than a minute!

## Dataset 2

Dataset 2 contains:

- **cleartext.txt**
- **trace{N}.txt**
- **clock{N}.txt** 

of 150 lines and 50000 columns, contains the microcontroller clock signal acquisition corresponding to each of the consumption traces

Steps:

What we tried

1.  We first tried to run the attack as we did in dataset1 to verify the real need of the clock files. As expected the key checksum wasn't matching with our results.

2. We plotted the clock files to try to see something but we couldn't see much (join graph).

3. We plotted the clock files along the trace files to try to see a correlation but we didn't identify any.

4. We tried to plot only some part of the traces along the corresponding clocks but we couldn't exploit any information since the traces are unaligned

5. We tried to align the traces by shifting each trace so that it's maximum aligns with the minimum index. After saving the aligned traces to files we ran the attack again as we did for dataset1 but the key checksum didn't match.

6. Finally we decided to look for flanks in the clock signals and capture the associated trace data.

How we achieved it

1. Data reading

- Initialize 2D numpy array for the cleartexts
    - Type: int
    - Dimmensions: 150 lines * 16 bytes
- Initilize a 3D numpy array to hold all the traces
    - Type: float
    - Dimmensions: 16 files * 150 lines * 50000 values
- Initilize a 3D numpy array to hold all the clocks
    - Type: float
    - Dimmensions: 16 files * 150 lines * 50000 values

2. Hamming Weights

...

3. Rising edge detections

The code is designed to detect rising edges (rising flanks) in a clock signal and extract a specific window of data from the trace signal associated with each detected edge

The first loop is iterating over all the 150 traces of a file. The second loop iterates over the 50,000 values of each trace.

After some test we defined a limit value after which the sample is considered as a flank.

- a value too high is not detecting any flank
- a value too low ...

To detect a flank we set the following conditions:

- The previous sample must be under the fixed limit
- The current sample must be above the fixed limit

If these condition are met we capture the trace samples arround the current clock peak (from j-5 to j+4) for further analysis