Skip to content

KaidanDEV/tensorflow-rce-hack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

RCE via Unsigned Binaries: TensorFlow Fails to Validate Integrity of Core Compiled Libraries (.so, .dll)

Author: KaidanDEV Date: August - September 2025 Contact: [kaidanddev@proton.me]


0. FOREWORD

I made this public because the Google bug hunters team said in this report:

Hi,

Thanks for the report, but we do not consider this a valid security bug.
Creating a typo-squatted malicious library has nothing to do with TensorFlow. The scenario you described can be abused against any other package.

Thanks again for your report and time,
The Google Bug Hunter Team

And by the way. It is not legal to use this—I am providing it for informational purposes only. Here's to white hats! Stay safe—don't let yourself be hacked using this method.

1. Abstract

This paper details a high-impact supply-chain attack vector that applies a well-established attack pattern to the AI/ML ecosystem—an industry that has critically overlooked this foundational threat. By modifying the core C-based libraries (.pyd, .dll, .so) of a framework like TensorFlow, an attacker can achieve arbitrary code execution. The attack is "phantom" in nature, as the malicious code resides within the trusted runtime itself, rather than in the model files. This method bypasses security measures focused on model file analysis and is not detected by static analyzers of Python code, posing a critical threat to the entire AI/L industry.

The ability to exploit this vulnerability depends on the software supply chain. An attacker can deliver a maliciously modified binary package through various distribution channels. The essence of the problem is that the runtime environment implicitly trusts the library on the disk. In the absence of any form of integrity or authenticity verification (e.g., digital signatures), the runtime environment will load and execute compromised native code with full privileges of the parent process. Although pip does use a RECORD file, my research has shown that it is extremely easy to forge.

The security flaw described is not an implementation bug, but a systemic architectural choice. The practice of loading unsigned native code from a high-level language environment is common among major AI/ML frameworks, making this a cross-ecosystem vulnerability. As such, this report should be interpreted as an exposé of a critical weakness in the industry's software delivery model and serves to underscore the necessity of adopting cryptographic code signing as a baseline security requirement. While the industry has invested heavily in securing the ML model supply chain (the data), this report demonstrates a critical oversight in securing the runtime supply chain (the code), leaving a foundational layer of the AI ecosystem vulnerable.

2. The Attack Vector Explained

The industry's security focus has historically been on mitigating risks from serialized artifacts, such as malicious model files. While security measures have been implemented to address that specific vector, this report demonstrates a more fundamental vulnerability: the implicit trust placed in the execution environment itself. When the core libraries of the framework are compromised, the user has no indicators of compromise until the payload is already active, as the trust boundary has been breached at a lower level.

2.1. Exploitation via the Software Supply Chain

This vulnerability is exploited through a supply-chain attack. The following walkthrough details the process from the attacker's preparation to the end-user's compromise.

Step 1: Weaponization of the Package

The attacker's goal is to create a malicious wheel (.whl) package that is functionally identical to the official one but contains a hidden payload.

  1. Source Code Modification: The attacker obtains the official TensorFlow source code and injects a payload into a core C++ file (e.g., tensorflow/core/platform/cpu_feature_guard.cc).
  2. Compilation: They compile the modified code, producing a malicious binary library (.so, .pyd, or .dll) that is a drop-in replacement for the original.
  3. Package Forgery: The attacker downloads the official .whl package. They unpack it, replace the legitimate binary with their compromised version, and then proceed to the critical step: forging the package manifest.
    • The RECORD file inside the package's .dist-info directory contains a list of all files and their corresponding SHA262 hashes. This is pip's primary mechanism for verifying package integrity.
    • The attacker simply recalculates the SHA256 hash of their new, malicious binary and updates the corresponding line in the RECORD file.
  4. Repackaging: The modified files are repacked into a new .whl file. From pip's perspective, this package is now perfectly valid and internally consistent, as the RECORD file correctly matches the contents of the archive.

Step 2: Distribution

The attacker now distributes the weaponized .whl file. This does not require prior compromise of the victim's machine. Common distribution vectors include:

  • PyPI Typosquatting: Publishing the package under a similar name (e.g., tensroflow).
  • Dependency Confusion: Targeting internal corporate networks where an internal package name might overlap with a public one.
  • Direct Social Engineering: Engaging with developers on platforms like Stack Overflow, GitHub, or Discord, and offering the malicious .whl as a "pre-release," "patched," or "performance-optimized" version to solve a specific problem. For example: "Hey, I had that same bug. The official release is broken, but try this build, it works with my model. Here is the .whl and the code."

Step 3: Compromise

  1. Installation: The victim, trusting the source, installs the package (e.g., pip install malicious_tensorflow.whl). pip inspects the RECORD file, confirms that all hashes match the files within the package, and installs it without any warnings.
  2. Execution: The victim runs their Python script. The simple act of import tensorflow loads the compromised C++ library into memory, and the attacker's payload is executed with the full permissions of the user.

This attack succeeds because there is no external, authoritative chain of trust. pip's integrity check is internal to the package and is therefore controlled by whoever created the package. The absence of mandatory code signing by the original vendor (Google) is the specific architectural flaw that makes this entire scenario possible.

The trust model of the Python packaging ecosystem implicitly relies on the integrity of the package contents—a trust that TensorFlow currently does not validate for its native components, unlike other security-critical software. In essence, the ecosystem has trained developers to trust the pip install process, creating a false sense of security that this attack vector directly exploits.

3. Proof of Concept

The viability of this attack vector is demonstrated through proof-of-concept modifications to the core libraries of TensorFlow. Each modification embeds a benign, non-destructive payload to confirm that arbitrary code execution is achieved. As the primary subject of this report, the TensorFlow case study is presented first. The trust model of the Python packaging ecosystem implicitly relies on the integrity of the package contents—a trust that TensorFlow currently does not validate for its native components, unlike other security-critical software.

Case Study: TensoFlow

For the TensorFlow Proof of Concept, the file tensorflow/core/platform/cpu_feature_guard.cc was selected as the injection point. This component is part of the library's early initialization sequence, guaranteeing payload execution immediately upon package import.

A single line of code was injected into the public section of the CPUFeatureGuard class:

*(int*)0 = 0;

This payload is designed to intentionally trigger a segmentation fault by dereferencing a null pointer.

Results

1. Execution with the Unmodified TensorFlow Library:

The standard, unmodified library imports successfully, printing informational messages to the console:

(venv) whoami@root:/home/whoami/tf$ python -c "import tensorflow"
I0000 00:00:1756730745.507629   67483 cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2. Execution with the Compromised TensorFlow Library:

The modified library triggers a fatal exception and terminates with a core dump upon import:

(venv) whoami@root:/home/whoami/tf$ python -c "import tensorflow"
Trace/breakpoint trap (core dumped)

Analysis of Results

The results show a clear and immediate divergence in behavior. The successful import of the original library versus the fatal crash of the modified one confirms that the injected native code was executed during the library's initialization phase, triggered by a simple import tensorflow statement in Python. This provides irrefutable proof that arbitrary code execution is achievable.

4. Impact Analysis

Achieving arbitrary code execution within the security context of the ML framework's host process leads to several critical impact scenarios:

  • Complete Data Exfiltration: The most immediate impact is the theft of sensitive data. An attacker can exfiltrate any information accessible to the compromised process, including:

    • Proprietary AI models and training datasets.
    • API keys, database credentials, and other secrets loaded into the environment.
    • Sensitive user data being processed by the application.
  • Persistence and Lateral Movement: The compromised library can function as a persistent backdoor. The initial payload can be used to establish a permanent foothold on the system, for instance, by launching a reverse shell. From this point, the attacker can move laterally across the internal network to compromise other systems.

  • Ransomware and Data Destruction: The payload could execute a ransomware routine, encrypting not only user files but also mission-critical assets like proprietary AI models and datasets, demanding a ransom for their recovery. Alternatively, the payload could be designed for simple data destruction.

5. Mitigation Strategies

Addressing this vulnerability requires the implementation of a robust chain of trust, originating from the vendor and verifiable at runtime. The current model of implicit trust in on-disk binaries must be replaced with an explicit verification mechanism.

  • Primary Mitigation: Mandatory Code Signing (OS-Level Verification) The most critical and effective mitigation is the adoption of mandatory cryptographic code signing for all pre-compiled binaries (.pyd, .dll, .so). Every binary shipped within the official TensorFlow package must be signed with a private key controlled by Google.

    This allows the host operating system to perform an integrity and authenticity check before the library is loaded into memory. Any modification to the binary would invalidate the signature, causing the OS to block its execution. This single, industry-standard control would effectively neutralize the described attack vector at the operating system level.

  • Secondary Mitigation: Runtime Integrity Check via Hardcoded Hash (Application-Level Verification) To provide a defense-in-depth layer independent of the OS, TensorFlow should implement a self-verification mechanism at runtime.

    1. Build-Time Hashing: During the official build process, a cryptographic hash (e.g., SHA256) of the primary C++ core library (e.g., _pywrap_tensorflow_internal.so) must be generated.
    2. Hardcoding the Root of Trust: This trusted hash value must be hardcoded as a constant string within a separate, pure Python module of the framework (e.g., tensorflow.python.util.integrity_check). This module effectively becomes the "root of trust" for the installation.
    3. Runtime Verification: Upon import tensorflow, before the core C++ library is fully utilized, this Python module must: a. Locate the on-disk C++ library file. b. Compute its SHA256 hash. c. Compare the computed hash against the hardcoded trusted hash.
    4. Action on Mismatch: If the hashes do not match, the framework must immediately raise a fatal IntegrityError and terminate, explicitly warning the user that their TensorFlow installation has been tampered with and is unsafe to use.

    An attacker capable of modifying the compiled .so file would also need to reverse-engineer and modify the Python bytecode of the verification module to replace the hardcoded hash. While not impossible, this raises the complexity of the attack by several orders of magnitude compared to a simple file replacement.

6. Conclusion

In conclusion, this report has detailed a critical architectural flaw in TensorFlow: the absence of integrity verification for its core compiled libraries. This creates a high-impact supply-chain attack vector that allows for arbitrary code execution by replacing legitimate binaries with malicious ones.

The fundamental issue is the framework's implicit trust in its own on-disk components—a trust that, as demonstrated, can be easily subverted. This attack vector operates below the detection threshold of conventional security tools that are focused on analyzing model files, thereby compromising the entire execution environment in a manner invisible to the end-user.

The mitigation of this systemic vulnerability is straightforward and aligns with established industry best practices. The adoption of mandatory cryptographic code signing for all distributed binaries is the primary and necessary control to neutralize this threat completely.

About

my research.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published