Cryptography: Man In The Middle attack on a block cipher

by Theophile Molinatti and Gabriel Dos Santos

PRESENT24 block cipher specifications

The block cipher implemented in this project is a scaled-down version of PRESENT, a lightweight cipher designed in 2007 by Bogdanov et al. and standardized in the ISO/IEC 29192-2:2019. The version implemented here (PRESENT24) will take an input message of 24 bits and will produce a 24 bits ciphertext.

PRESENT is part of the Substitution-Permutation Network (SPN) ciphers family. It is the second main structure type for designing block ciphers, the other being Feistel Networks.

The concept behind SPN ciphers is simple, each round is composed of three layers:

A XOR of the subkey to the register.
A substitution to ensure confusion.
A permutation to ensure diffusion.

This implementation of PRESENT24 will perform eleven rounds.

Usage

Dependencies

Even though the application has been written to be able to run on any operating system, we strongly advise you to run it on a Linux-based OS (with kernel 2.6.28 or higher).

As the provided Makefile uses compiler-specific optimization flags, you will need to have gcc installed on your system to able to compile the source code (version 7.0 or higher).

Building from source

To build the executable, please use the provided Makefile:

make

Running the algorithm

You can then run the program with one of the following options:

-e lets you use encryption mode, provided a message and a key (24 bits each).

./present24 -e MESSAGE KEY

-d lets you use decryption mode, provided a cipher and a key (24 bits each).

./present24 -d CIPHER KEY

-a lets you use attack mode on 2PRESENT24. The attack option takes two pairs of plain and cipher texts as inputs, (m1, c1) and (m2, c2). You can also provide the extra -t option to set the numbers of threads used to run the application.

./present24 -a MESSAGE1 CIPHER1 MESSAGE2 CIPHER2 [-t NB_THREADS]

If the -t option is not specified, the program will run with 4 threads by default.

The Makefile also comes with pre-made commands to test the application. These are the following:

make encrypt runs PRESENT24 on the 4 test vectors provided in the subject.
make decrypt runs PRESENT24 on the 4 test vectors provided in the subject.
make attack_theophile runs the attack on PRESENT24 with Theophile's pairs of plain/cipher texts.
make attack_gabriel runs the attack on PRESENT24 with Gabriel's pairs of plain/cipher texts.

Feel free to run it with values of your own!

Performance analysis

Compiler optimization flags

For this project, we used the widely available GNU Compiler Collection (gcc) which provides numerous optimization flags that helped us get faster execution times. We will hereafter describe some of them:

-O3 enables all of the optimizations that gcc provides, apart from specific math and floating-point arithmetic flags (which are not relevant in our case). Note: The compiler cannot guarantee that all the requested flags will be applied as the source code may not be optimized enough to allow gcc to recognize code patterns. However, the code outputting the -O3 option may be suitable for enhancements that were not available before.
-march=native and -mtune=native enable architecture specific optimizations that will allow the compiler to generate code that uses specific instructions (e.g. SSE and/or AVX instructions used for vectorization, if they are available).
-finline-functions enables function in-lining whenever it is possible which gives better execution times as it removes the call to the function by copying its code where needed. This works especially well for small functions that are called often or in a loop. The use of this flag is paired with the presence of the keyword inline in the C source code which helps the compiler do better work.
Loop optimization flags such as -funroll-loops or -ftree-loop-vectorize are requested to ensure that the code is as optimized as possible, in case the compiler could not apply them on the first pass with -O3.

The permutation layer

The permutation box (P-box) is arguably the optimization that has the most significant impact on performance. Because it is so heavily used throughout the attack, both during encryption and decryption, it can become the bottleneck of the application if not carefully implemented.

The naive approach we first took was to use a static array that transposes every bits. In conjonction, we used a for loop to permute each bit of the input text to its final position. However, using a loop means that we need to perform as many iterations as they are bits in the block, each of them having to compute multiple memory addresses which are known to be very expensive operations.

Using the performance analysis and optimization framework developed at University of Versailles (UVSQ), MAQAO (Modular Assembly Quality Analyzer and Optimizer), we noticed that over 50% of the execution time was spent in the P-box layer, so we decided to take another approach.

Instead of using of arrays and loops, we implemented the entire P-box manually by directly placing the bits of the input message to their final position using masks and shifts. This method also reduces the number of memory accesses, thus drastically improving performance and delivering an overall speedup of 3.6x.

Choosing the sorting algorithm

To be able to perform a binary search on the dictionaries, we first need to sort them. As advised by Mrs Christina BOURA, we started by implementing a quick sort algorithm, as it often yields very good results on any type of data, averaging a time complexity of O(nlog(n)).

Nonetheless, we decided to switch to a radix sort algorithm which is more efficient on the type of data (unsigned integers) we are dealing with. Indeed, radix sort is noticeably better at sorting numbers, with an average time complexity of (O(nk)), making it very fitting in our case.

We achieved execution times, on the sorting part, up to 9.6 times faster (allowing for an overall acceleration of 1.26x).

Parallelizing the attack

The last and probably most important optimization we made is to parallelize the attack. Thanks to Mr Salah IBN AMAR (teacher-researcher in HPCS and at the LI-PaRAD and Exascale Computing Research labs), we had the opportunity to measure the performances of our program on bigger machines, comparing the execution time with different numbers of threads. We ran the benchmarks on the following machines:

AMD Zen2 (EPYC 7302), 64 cores.
Intel Skylake (Xeon Platinum 8170), 52 cores.
AMD Zen3 (Ryzen 7 3800X), 8 cores.
Intel Tigerlake (i5-1135G7), 4 cores.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Rapport.pdf		Rapport.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Rapport.pdf

Rapport.pdf

Repository files navigation

Cryptography: Man In The Middle attack on a block cipher

PRESENT24 block cipher specifications

Usage

Dependencies

Building from source

Running the algorithm

Performance analysis

Compiler optimization flags

The permutation layer

Choosing the sorting algorithm

Parallelizing the attack

About

Contributors 2

Languages

License

dssgabriel/PRESENT24-attack

Folders and files

Latest commit

History

Repository files navigation

Cryptography: Man In The Middle attack on a block cipher

PRESENT24 block cipher specifications

Usage

Dependencies

Building from source

Running the algorithm

Performance analysis

Compiler optimization flags

The permutation layer

Choosing the sorting algorithm

Parallelizing the attack

About

Topics

Resources

License

Stars

Watchers

Forks

Languages