# **AES Hardware Implementation**

**Note:** The architecture shown here corresponds to the design I have implemented in my code and the names of the variables are shown to be same as per the code for easy reference.

#### Design



aes\_top.v

The design consists of 3 modules

- aes\_rkg()
- 2. aes\_enc()
- 3. aes\_dec()

Each of the 3 modules have associated control signals and are controlled by the control logic in the aes\_top() module. Each of the 3 modules run independent FSMs.

The aes\_rkg() module computes the 10 round key values and these values are stored in both the aes\_enc() and aes\_dec() separately to keep the modules independent.

This design was targeted for avnet minized board which is not very resource intensive and hence LUT usage has been kept at the less as possible.

Eg. The column mixing operation in both encryption and decryption is implemented using **Galois Multiplication lookup tables** and is not an actual multiplication or shift operation!!!

All the 3 modules take 10 cycles each to complete one rkg generation/encryption/decryption.

The clock used in the implementation is 50MHz.

#### Simulation

The following screenshot shows the test example for following inputs

Key: 0x000102030405060708090a0b0c0d0e0f

Plane Text: 0x00112233445566778899aabbccddeeff

Cipher Text: 0x69c4e0d86a7b0430d8cdb78070b4c55a



As can be seen from the waveform,

## Throughput:

Throughput = 
$$\frac{50 \text{ Mhz}}{10 \text{ cycles}}$$
 = 5 \* 10<sup>6</sup> enc/dec per second (neglecting initial 10 cycles for rkg)

### Latency:

#### Area:

| Log Reports Design Runs Utilization × Package Pins I/O Ports |                       |                            |                    |                    |                        |                    |                  |  |
|--------------------------------------------------------------|-----------------------|----------------------------|--------------------|--------------------|------------------------|--------------------|------------------|--|
| Q                                                            |                       |                            |                    |                    |                        |                    |                  |  |
| Name 1                                                       | Slice LUTs<br>(14400) | Slice Registers<br>(28800) | F7 Muxes<br>(8800) | F8 Muxes<br>(4400) | Block RAM<br>Tile (50) | Bonded<br>IOB (54) | BUFGCTRL<br>(32) |  |
| ∨ aes_top                                                    | 3711                  | 1834                       | 336                | 128                | 1.5                    | 651                | 1                |  |
| > DEC (aes_dec)                                              | 1622                  | 143                        | 0                  | 0                  | 0                      | 0                  | 0                |  |
| > ENC (aes_enc)                                              | 1921                  | 1552                       | 336                | 128                | 0                      | 0                  | 0                |  |
| RKG (aes_rkg)                                                | 168                   | 137                        | 0                  | 0                  | 0.5                    | 0                  | 0                |  |

### **Testing**

aes\_top() module was then packaged into an AXI4 IP and interfaced with ARM Cortex A9 processor on the ZYNQ chip on Minized Board.



The block diagram of the entire system looks as follows



The inputs and outputs of the aes\_top() module were mapped to registers of the AXI packaged IP.

| Register       | Offset    |  |
|----------------|-----------|--|
| Status         | 0x00      |  |
| Control        | 0x04      |  |
| Key            | 0x08-0x20 |  |
| PlaneText      | 0x24-0x36 |  |
| Unknown Cipher | 0x40-0x52 |  |
| CipherText     | 0x56-0x68 |  |
| Recovered Text | 0x72-0x84 |  |

This design was then exported to Xilinx SDK and a simple test program was written in C to test the aes accelerator.

```
AES MINIZED.sdk - C/C++ - aes/src/helloworld.c - Xilinx SDK
 File Edit Navigate Search Project Run Xilinx Window Help
 □ 😘 🔻 ▽ □ 🗆 R helloworld.c 🗵 @ test.c   B aes.h   @ aes.c   🕏 time.h   B xtime_l.h   B xparam
 Project Explorer ≅
 ➤ Project Explor

➤ № aes

→ ₩ Binaries

→ № Includes

→ ▶ Debug

    *void display_status()

                                                                                       *void display_input()

    *void display_output(uint8_t type)

        > src
> le helloworld.c
> le platform_config.h
> le platform.c
                                                                                             int main()
                                                                                                    init_platform();
        > in platform.h
is lscript.ld
iii Xilinx.spec
                                                                                                    //Get system out of reset, start RKG
print("Starting RKG\n\r");
Xil_Out32(XPAR_AES_0_Se0_XXI_BASEADDR+8, exee010203);
Xil_Out32(XPAR_AES_0_Se0_XXI_BASEADDR+12, exe4050607);
Xil_Out32(XPAR_AES_0_Se0_XXI_BASEADDR+12, exe4050607);
Xil_Out32(XPAR_AES_0_Se0_XXI_BASEADDR+12, exec0406067);
Xil_Out32(XPAR_AES_0_Se0_XXI_BASEADDR+20, exec0406067);
Xil_Out6(XPAR_AES_0_Se0_XXI_BASEADDR+20, exec0406067);
Xil_Out6(XPAR_AES_0_Se0_XXI_BASEADDR+44, exe3);
    aes_bsp

    B AES_MINIZED_wrapper_hw_platform_0
    S aes_software
    ★ Binaries
     > 🔊 Includes
                                                                                                     sleep(2);
display_input();
display_status();
       > Debug

➢ STC
                                                                                                    //start ENC print("\n\rstarting ENC\n\r"); Xil Our32(XPAR AFS 0 S00 AXT RASFADDR+24. 0x00112233):
   8 aes_software_bsp
```





## Appendix A: Hardware vs Software

Additionally the aes encryption was run on the Single Core ARM Cortex A9 processor entirely in software as well!



#### Result:



#### Comparison:

|                    | Software | Hardware |
|--------------------|----------|----------|
| Clock(MHz)         | 666.67   | 50       |
| Execution Time(us) | 401      | 0.6      |

The hardware Implementation was almost 668.3 times faster.

# Appendix B: Deliverables

# Name

