From 3466bc526186189b04c3ed9c5421928ca9f8eb1c Mon Sep 17 00:00:00 2001 From: Florian Hofhammer Date: Tue, 21 Nov 2023 11:36:39 +0100 Subject: [PATCH] Add Florian's projects --- epflprojects/index.html | 518 ++++++++++++++++++++++++++++++++-------- 1 file changed, 422 insertions(+), 96 deletions(-) diff --git a/epflprojects/index.html b/epflprojects/index.html index bcb6015..6283a92 100644 --- a/epflprojects/index.html +++ b/epflprojects/index.html @@ -44,34 +44,76 @@

HexHive PhD, MSc, BSc projects

Library Fuzzing
-

Unlike fuzzing CLI programs, whose input is modeled as a stream of bytes, fuzzing libraries requires drivers (library consumers) to bridge an input into a sequence of APIs. The code coverage and error discovery depend on the API combinations within the driver. Therefore, it is crucial having interesting drivers to deeply test a target library. Unfortunately, building such drivers is challenging due to a lack of semantic information about the APIs and their usage. Moreover, insidious errors may appear only with rare API sequences. Current techniques infer API usage from already-existing programs, however, the quality of the new drivers is inevitably limited by the existing consumers. In this project, we aim at generating library drivers without looking into existing consumers. Precisely, we use a combination of static analysis and automatic testing to mine the API usage and automatically build drivers able to explore a vaster library portion of code and trigger more complex errors.

+

Unlike fuzzing CLI programs, whose input is modeled as a stream of +bytes, fuzzing libraries requires drivers (library consumers) to bridge +an input into a sequence of APIs. The code coverage and error discovery +depend on the API combinations within the driver. Therefore, it is +crucial having interesting drivers to deeply test a target library. +Unfortunately, building such drivers is challenging due to a lack of +semantic information about the APIs and their usage. Moreover, insidious +errors may appear only with rare API sequences. Current techniques infer +API usage from already-existing programs, however, the quality of the +new drivers is inevitably limited by the existing consumers. In this +project, we aim at generating library drivers without looking into +existing consumers. Precisely, we use a combination of static analysis +and automatic testing to mine the API usage and automatically build +drivers able to explore a vaster library portion of code and trigger +more complex errors.

The research questions in this project are:

-

The candidate will require to assist the design and develop of a prototype for testing different driver building strategies. The prototype will be a combination of different technologies, such as static analysis over LLVM IR, Python modules for the driver generation, and fuzzer for the automatic testing.

-

A candidate should be interested in (or familiar with) at least one of the following topics.

+

The candidate will require to assist the design and develop of a +prototype for testing different driver building strategies. The +prototype will be a combination of different technologies, such as +static analysis over LLVM IR, Python modules for the driver generation, +and fuzzer for the automatic testing.

+

A candidate should be interested in (or familiar with) at least one +of the following topics.

Android acropalypse
-

You might have heard about the recent security disaster that is aCropalypse. Well, it turns out that the reason behind this bug is Google silently updating some Android’s API for opening files which causes files not to be truncated anymore when opening them.

-

This is pretty wild and we think that there might be many more applications of aCropalypse, not just cropped screenshots. This project is about writing tooling to automatically analyze Android apks and searching for potential alternative data leaks.

+

You might have heard about the recent security disaster that is aCropalypse. +Well, it turns out that the reason behind this bug is Google silently +updating some Android’s +API for opening files which causes files not to be truncated anymore +when opening them.

+

This is pretty wild and we think that there might be many more +applications of aCropalypse, not just cropped screenshots. This project +is about writing tooling to automatically analyze Android apks and +searching for potential alternative data leaks.

A candidate should be interested in:

-
Software Compartmentalization Benchmark suite
+
Software +Compartmentalization Benchmark suite
-

Compartmentalization is a software-development principle to reduce a program’s attack surface, and limit the exploitability of bugs. A compartmentalized program is separated into a number of compartments, each of which executes with minimal privileges and rights, and communicates through structured API only. Essentially, an exploit in one compartment should not trivially compromise other compartments.

-

We propose a semester/thesis project for masters students with software development expertise to compartmentalize high-risk software. Prime examples of such software are webservers, browsers and operating systems. We are open to other suggestions. We would like to eventually have a set of representative software comprising a benchmark suite against which to evaluate the different compartmentalization techniques.

-

A benchmark suite would preferably be portable, running on different operating systems/libraries, hardware, and be amenable to be ported onto hardware or software research proposals for better compartmentalization.

-
WebAssembly-based protection, strengths and limitations
+

Compartmentalization is a software-development principle to reduce a +program’s attack surface, and limit the exploitability of bugs. A +compartmentalized program is separated into a number of compartments, +each of which executes with minimal privileges and rights, and +communicates through structured API only. Essentially, an exploit in one +compartment should not trivially compromise other compartments.

+

We propose a semester/thesis project for masters students with +software development expertise to compartmentalize high-risk software. +Prime examples of such software are webservers, browsers and operating +systems. We are open to other suggestions. We would like to eventually +have a set of representative software comprising a benchmark suite +against which to evaluate the different compartmentalization +techniques.

+

A benchmark suite would preferably be portable, running on different +operating systems/libraries, hardware, and be amenable to be ported onto +hardware or software research proposals for better +compartmentalization.

+
WebAssembly-based +protection, strengths and limitations
-

WebAssembly is an standard virtual architecture in which a program can be compiled to. Thanks to its high performance and isolation through a sandbox, a developer can compile regular source code (e.g., written in C or Rust) to WebAssembly, ensuring that the interaction with the WebAssembly module is limited to the interfaces it exports. Software known for containing vulnerabilities can therefore be set in an external module.

-

In this project tailored for a MSc project/thesis, the student will analyze existing code and determine the shortcomings produced by its conversion to WebAssembly for security purposes. Ideally, a monolithic program can be split in such a way that the resulting version will be composed by several WebAssembly modules. This study requires the characterization of the limitations of running WebAssembly code and a fine-grained runtime analysis of the resulting software. The outcome shall be compared with other existing techniques.

-

This project also can also be accomplished by extending the features of the WebAssembly standard to support more software.

+

WebAssembly is an standard virtual architecture in which a program +can be compiled to. Thanks to its high performance and isolation through +a sandbox, a developer can compile regular source code (e.g., written in +C or Rust) to WebAssembly, ensuring that the interaction with the +WebAssembly module is limited to the interfaces it exports. Software +known for containing vulnerabilities can therefore be set in an external +module.

+

In this project tailored for a MSc project/thesis, the student will +analyze existing code and determine the shortcomings produced by its +conversion to WebAssembly for security purposes. Ideally, a monolithic +program can be split in such a way that the resulting version will be +composed by several WebAssembly modules. This study requires the +characterization of the limitations of running WebAssembly code and a +fine-grained runtime analysis of the resulting software. The outcome +shall be compared with other existing techniques.

+

This project also can also be accomplished by extending the features +of the WebAssembly standard to support more software.

Type confusion test suite
-

Type confusion is a common vulnerability in C/C++ programs. It occurs when a type is incorrectly casted to another type. This can lead to memory corruption and code execution. HexHive has published a number of works trying to detect and mitigate the impact of type confusions. The goal of this project is to create a test suite for type confusion detection tools. Recent works have been evaluated on a common run time performance benchmark but they miss a validation on a common set of type confusion bugs. The test suite will be composed of a set of programs and unit test with type confusion bugs. Some bugs should be based on real world vulnerabilities while others can be purely synthetic.

+

Type confusion is a common vulnerability in C/C++ programs. It occurs +when a type is incorrectly casted to another type. This can lead to +memory corruption and code execution. HexHive has published a number of works trying to detect +and mitigate the impact of type confusions. The goal of this project is +to create a test suite for type confusion detection tools. Recent works +have been evaluated on a common run time performance benchmark but they +miss a validation on a common set of type confusion bugs. The test suite +will be composed of a set of programs and unit test with type confusion +bugs. Some bugs should be based on real world vulnerabilities while +others can be purely synthetic.

We would aim to:

-

Students should have a basic understanding of how C/C++ programs are built and a good grasp of Linux internals.

+

Students should have a basic understanding of how C/C++ programs are +built and a good grasp of Linux internals.

Fuzzing C++ libraries
-

Unlike fuzzing CLI programs, whose input is modeled as a stream of bytes, fuzzing libraries requires drivers (library consumers) to bridge an input into a sequence of APIs. The code coverage and error discovery depend on the API combinations within the driver. Recent work at HexHive has shown promising result for automatically generating these drivers for C libraries. The goal of this project is to extend this work to C++ libraries. In particular, some adaptations will be necessary to handle the object-oriented nature of C++ as well as supporting casting operations.

-

The candidate will be required to identify the necessary adaptations to the existing C library fuzzing tool as well as implement support for them in the existing framework. The prototype will be a combination of different technologies, such as static analysis over LLVM IR, Python modules for the driver generation, and fuzzer for the automatic testing. The candidate will also be in charged of finding and motivating the choice of suitable C++ libraries to test.

-

A candidate should be interested in (or familiar with) the following topics.

+

Unlike fuzzing CLI programs, whose input is modeled as a stream of +bytes, fuzzing libraries requires drivers (library consumers) to bridge +an input into a sequence of APIs. The code coverage and error discovery +depend on the API combinations within the driver. Recent work at HexHive +has shown promising result for automatically generating these drivers +for C libraries. The goal of this project is to extend this work to C++ +libraries. In particular, some adaptations will be necessary to handle +the object-oriented nature of C++ as well as supporting casting +operations.

+

The candidate will be required to identify the necessary adaptations +to the existing C library fuzzing tool as well as implement support for +them in the existing framework. The prototype will be a combination of +different technologies, such as static analysis over LLVM IR, Python +modules for the driver generation, and fuzzer for the automatic testing. +The candidate will also be in charged of finding and motivating the +choice of suitable C++ libraries to test.

+

A candidate should be interested in (or familiar with) the following +topics.

-
ARM64 Kernel Driver Retrowriting
+
ARM64 Kernel Driver +Retrowriting
-

A common feature of the Android ecosystem are proprietary binary blobs. Vendors may not update these and may not compile them with the latest exploit mitigations. A particular cause of concern are kernel modules given their privileged access.

-

Hexhive’s Retrowrite project is a state-of-the-art binary rewriting tool that can retrofit mitigations to legacy binaries without the need for source code. This currently works on ARM64 and x86-64 platforms, and x86-64 in kernel mode. The goal of this project would be to target ARM64 kernel modules, with the ability to add for example kASAN. We would aim to:

+

A common feature of the Android ecosystem are proprietary binary +blobs. Vendors may not update these and may not compile them with the +latest exploit mitigations. A particular cause of concern are kernel +modules given their privileged access.

+

Hexhive’s Retrowrite project is a state-of-the-art binary rewriting +tool that can retrofit mitigations to legacy binaries without the need +for source code. This currently works on ARM64 and x86-64 platforms, and +x86-64 in kernel mode. The goal of this project would be to target ARM64 +kernel modules, with the ability to add for example kASAN. We would aim +to:

-

Students should have a basic understanding of how Linux kernel modules are built and loaded, and a good grasp of Linux internals. Ambitious students may also have Android Internals knowledge and be interested in testing their work on Android hardware.

-
Leveraging application security through memory tagging
+

Students should have a basic understanding of how Linux kernel +modules are built and loaded, and a good grasp of Linux internals. +Ambitious students may also have Android Internals knowledge and be +interested in testing their work on Android hardware.

+
Leveraging +application security through memory tagging
-

Memory tagging is a hardware extension that adds a level of restriction when dereferencing memory addresses: the key held should match the memory key. This extension can be found implemented both by Memory Protection Keys (MPK) and Memory Tagging Extension (MTE), corresponding respectively to x86-64 and ARM64 architectures, which have a different granularity (page vs 16 bytes) and way to store the key (register or per-pointer), resulting in a substantially different programming model.

-

The adoption of such a technology would be decisive for finding memory safety bugs in existing pieces of code such as databases, cryptographic toolkits, operating system kernels, web servers, web browsers… Albeit this technologies are acknowledged (like MPK for which the Linux kernel provides an interface), their adoption from the application side requires a previous study which remains to be done.

+

Memory tagging is a hardware extension that adds a level of +restriction when dereferencing memory addresses: the key held should +match the memory key. This extension can be found implemented both by +Memory Protection Keys (MPK) and Memory Tagging Extension (MTE), +corresponding respectively to x86-64 +and ARM64 +architectures, which have a different granularity (page vs 16 bytes) and +way to store the key (register or per-pointer), resulting in a +substantially different programming model.

+

The adoption of such a technology would be decisive for finding +memory safety bugs in existing pieces of code such as databases, cryptographic +toolkits, operating system kernels, web servers, web browsers… Albeit +this technologies are acknowledged (like MPK for which the Linux kernel +provides an +interface), their adoption from the application side requires a +previous study which remains to be done.

This project includes:

-

This project can be performed by either bachelor or master students, as there are different challenging codebases that can be addressed. It is also possible to do a master thesis out of it by creating a compiler-based framework that outlines in a sound way the possible protections an application can receive and analyzes them.

-
Benchmarking Fuzzers for Structured Text Input Software
+

This project can be performed by either bachelor or master students, +as there are different challenging codebases that can be addressed. It +is also possible to do a master thesis out of it by creating a +compiler-based framework that outlines in a sound way the possible +protections an application can receive and analyzes them.

+
Benchmarking +Fuzzers for Structured Text Input Software
-

Fuzzing is an effective technique for finding bugs in software. Prior works have created benchmarks to assess the performance of fuzzers. However, these benchmarks are biased towards targets that accept binary inputs and towards fuzzers that mutate at the byte level. Additionally, they suffer from saturation, meaning the performance differences between top fuzzers are often insignificant. It is a known issue that existing byte-level fuzzers do not perform well on targets accepting structured text inputs. Current fuzzing benchmarks do not include state-of-the-art structure-aware fuzzers, such as grammar fuzzers, in their baselines. This is due to the fact that these fuzzers typically require additional grammars, dictionaries, or large seed corpora. Furthermore, existing structure-aware fuzzers have been evaluated on a limited set of disparate targets, run with different specifications, making it challenging to compare their performance quantitatively or even qualitatively.

-

In this project, you will create an extensive benchmark for targets that accept structured text inputs. You are expected to integrate at least 8 structure/syntax-aware fuzzers and 16 new targets (latest version), along with the required grammars, dictionaries, and corpora. It is suggested to use the Nix build system, as its build configurations are written declaratively and build artifacts are deterministic. This choice is anticipated to streamline the benchmarking process and ensure reproducibility. You will then conduct fuzzing campaigns and analyze the results quantitatively. A potential focus could be assessing the impact of the provided grammars, dictionaries, and corpora on the performance of the fuzzers. The build, run, and analysis scripts will be open-sourced to facilitate future research.

+

Fuzzing is an effective technique for finding bugs in software. Prior +works have created benchmarks to assess the performance of fuzzers. +However, these benchmarks are biased towards targets that accept binary +inputs and towards fuzzers that mutate at the byte level. Additionally, +they suffer from saturation, meaning the performance differences between +top fuzzers are often insignificant. It is a known issue that existing +byte-level fuzzers do not perform well on targets accepting structured +text inputs. Current fuzzing benchmarks do not include state-of-the-art +structure-aware fuzzers, such as grammar fuzzers, in their baselines. +This is due to the fact that these fuzzers typically require additional +grammars, dictionaries, or large seed corpora. Furthermore, existing +structure-aware fuzzers have been evaluated on a limited set of +disparate targets, run with different specifications, making it +challenging to compare their performance quantitatively or even +qualitatively.

+

In this project, you will create an extensive benchmark for targets +that accept structured text inputs. You are expected to integrate at +least 8 structure/syntax-aware fuzzers and 16 new targets (latest +version), along with the required grammars, dictionaries, and corpora. +It is suggested to use the Nix build system, as its build configurations +are written declaratively and build artifacts are deterministic. This +choice is anticipated to streamline the benchmarking process and ensure +reproducibility. You will then conduct fuzzing campaigns and analyze the +results quantitatively. A potential focus could be assessing the impact +of the provided grammars, dictionaries, and corpora on the performance +of the fuzzers. The build, run, and analysis scripts will be +open-sourced to facilitate future research.

Examples of interesting fuzzers and targets for integration:

Recommended Background:

@@ -186,42 +380,143 @@
Benchmarking Fu
  • Familiarity with NixOS and Nix-based build tools.
  • Experience with fuzzing and triaging compiler/interpreter bugs.
  • -
    Emulating Trusted Applications
    +
    Emulating Trusted +Applications
    -

    To safely manage a user’s secrets, modern Android devices leverage TAs (trusted applications), running in a TEE (Trusted Execution Environment). These TAs are closed-source and hard to analyze, since they run isolated from the rest of the Android framework.

    -

    The goal of this project is to build an emulator that can run TAs. By emulating TAs we’ll be able to debug or even fuzz the TAs. For this project we’ll focus on TAs from the beanpod TEE. The beanpod TEE implementation runs on low-end xiaomi devices. We will build our emulator on top of qiling, an emulator written in python.

    -

    Project tasks (in no particular order): - Reverse-engineering of TAs to check if emulation is working correctly. - Implementing emulation support for Global Platform APIs and standard libc functions. (The Global Platform API is a standard for TAs) - Reverse-engineering of the relevant beanpod libraries to add emulation support for custom beanpod specific APIs used by TAs. - Adding cross-TA communication support. - (optional) implement a fuzzing framework on top of our emulator using AFLs unicorn mode.

    -

    Students interested in this project should be comfortable with both reverse engineering (think ghidra, binja or ida) and programming in python. Familiarity with ARM or TEE/TAs is a plus but not required.

    -
    SECCOMP implementation for double fetch protection
    +

    To safely manage a user’s secrets, modern Android devices leverage +TAs (trusted applications), running in a TEE (Trusted Execution +Environment). These TAs are closed-source and hard to analyze, since +they run isolated from the rest of the Android framework.

    +

    The goal of this project is to build an emulator that can run TAs. By +emulating TAs we’ll be able to debug or even fuzz the TAs. For this +project we’ll focus on TAs from the beanpod TEE. The beanpod TEE +implementation runs on low-end xiaomi devices. We will build our +emulator on top of qiling, an emulator written in python.

    +

    Project tasks (in no particular order): - Reverse-engineering of TAs +to check if emulation is working correctly. - Implementing emulation +support for Global Platform APIs and standard libc functions. (The +Global Platform API is a standard for TAs) - Reverse-engineering of the +relevant beanpod libraries to add emulation support for custom beanpod +specific APIs used by TAs. - Adding cross-TA communication support. - +(optional) implement a fuzzing framework on top of our emulator using +AFLs unicorn mode.

    +

    Students interested in this project should be comfortable with both +reverse engineering (think ghidra, binja or ida) and programming in +python. Familiarity with ARM or TEE/TAs is a plus but not required.

    +
    Modeling Embedded +Peripherals in Software
    +

    In contrast to your usual userspace program that leverages kernel +APIs, embedded firmware oftentimes accesses hardware peripherals for +communication with the outside world directly. This behavior makes +dynamic analysis of embedded firmware difficult, since such hardware +behavior needs to be replicated with sufficient precision if we want to +execute embedded software in a virtualized environment. Previous work in +this area suffers from a significant tradeoff: either the hardware’s +behavior is only approximated with low precision or the engineering +effort to implement more precise hardware modeling is prohibitively +high.

    +

    In this project, we aim to close the gap and reduce the tradeoffs +that need to be taken in such an environment. For this reason, an +interested student should be familiar with low-level software (device +drivers in normal OSs or embedded firmware are a plus), should be +willing reverse engineer code interacting with hardware and link the +behavior to hardware specifications, and have a strong background in +systems programming languages (mainly C, but ideally also decent +knowledge of C++). Familiarity with load-store RISC architectures as +commonly used in embedded systems (Arm, MIPS, RISC-V, PPC) is a +plus.

    +
    BLE Protocol Analysis
    + +

    In our connected world, Bluetooth and Bluetooth Low Energy (BLE) play +an important role for exchanging information between devices. This +exchange of information is not always properly secured. Even under the +generous assumption that BLE itself is secure, the protocols implemented +on top of this transmission channel might be broken and not adhere to +proper security standards.

    +

    In this project, the student is tasked to analyse the security of +BLE-enabled devices with regard to the application protocols deployed on +top of BLE. This includes (among others) questions such as:

    + +

    Students interested in this project should be familiar with +networking principles such as layered protocols, should be able to +reverse engineer protocol implementations in software and correlate +their findings with recorded traffic traces, and should be willing to +extend their knowledge about protocol stacks and software across the +whole stack (firmware, OS, application code).

    +
    SECCOMP +implementation for double fetch protection
    + -

    System call filtering is a crucial part of protection policies ubiquitous in cloud, desktop and mobile environments (Android, Docker, etc.). The existing SECCOMP filter system is unable to inspect arguments passed by reference since the user can modify the values in memory, resulting in a TOCTTOU exploit.

    -

    Midas is a novel mitigation for TOCTTOU bugs in the kernel, exploiting the user memory access API to provide double fetch protection. In this project, you will implement and evaluate SECCOMP filtering for system call arguments passed by reference, leveraging Midas to protect the kernel from the double fetch introduced in the process.

    +

    System call filtering is a crucial part of protection policies +ubiquitous in cloud, desktop and mobile environments (Android, Docker, +etc.). The existing SECCOMP filter system is unable to inspect arguments +passed by reference since the user can modify the values in memory, +resulting in a TOCTTOU exploit.

    +

    Midas is a novel mitigation for TOCTTOU bugs in the kernel, +exploiting the user memory access API to provide double fetch +protection. In this project, you will implement and evaluate SECCOMP +filtering for system call arguments passed by reference, leveraging +Midas to protect the kernel from the double fetch introduced in the +process.

    -
    Leveraging Static Analysis on Binaries to Uncover Time-of-Check-Time-of-Use Bugs
    +
    Leveraging +Static Analysis on Binaries to Uncover Time-of-Check-Time-of-Use +Bugs
    -

    TOCTOU bugs can lead to severe memory corruptions. These memory corruptions might allow adversaries to compromise and take full control of the affected system. In this project, we want to port and adapt an exisiting binary static analysis to uncover TOCTOU bugs in proprietary real-world software.

    -

    A candidate should be interested in (and ideally already be familiar with):

    +

    TOCTOU bugs can lead to severe memory corruptions. These memory +corruptions might allow adversaries to compromise and take full control +of the affected system. In this project, we want to port and adapt an +exisiting binary static analysis to uncover TOCTOU bugs in proprietary +real-world software.

    +

    A candidate should be interested in (and ideally already be familiar +with):