Obfuscation_Tool is a Python-based framework for analyzing and quantifying obfuscation in Ethereum smart contract bytecode. This repository implements the methods described in the paper “Obfuscation Unmasked: Revealing Hidden Logic in Ethereum Scam Contracts via Bytecode-Level Transfer Analysis.” The tool extracts seven bytecode-level obfuscation features (F1–F7) from smart contracts, computes a Z-score for each contract, and outputs detailed per-contract metrics.
Key capabilities include:
- Single-contract analysis: Parse a raw bytecode string and extract F1–F7 as defined in the paper:
- F1. Number of steps in address generation
Backward dataflow analysis on theaddressvariable, counting distinct arithmetic, hash, bitwise, and external-call steps. - F2. Number of string operations
Count of all string-manipulation and hash instructions involved inaddressgeneration. - F3. Presence of external call
Binary flag indicating whether anyCALL,DELEGATECALL, orSTATICCALLappears in theaddr/valuedataflow. - F4. Height of branch tree
Maximum nesting depth of conditional branches (JUMPI) along the transfer’s control-flow path. - F5. Transfer-related instruction ratio (TIR)
Ratio of effective transfer- and state-update instructions to total instructions in the transfer-residing function. - F6. Transfer operation similarity
Cosine similarity between R-GCN–embedded PDG representations of transfer-containing functions. - F7. Relevance of log events
Binary flag indicating whether logs emitted within two CFG hops of a transfer are semantically relevant.
- F1. Number of steps in address generation
- Batch analysis (multi-threaded): Use Dask to process a CSV of bytecode strings in parallel and store all per-contract results (F1–F7 and Z-score) in
output.csv. - Reproducibility: Includes scripts for training Word2Vec embeddings (
word2vec_Train/) and for running on a Dask cluster. - Note: The current code outputs a total of 8 parameters. The 5th parameter is used to indicate whether the parameters of the Transfer instruction are controlled externally, and the 8th parameter is used to indicate whether the conditions for executing the Transfer instruction are controlled externally. These two together constitute the F3 feature.
-
rattle-cli.py: The single‐contract CLI entrypoint. Reads a raw bytecode string from STDIN (no extension), invokes the analysis pipeline, and prints a human‐readable report plus a final numeric summary (as a list of values).
-
rundask.py: The batch‐mode script. It expects a CSV file with columns:
bytecode,address
\<hex string without 0x>,<contract address>
Modify the csv_path variable near the top of rundask.py to point to your CSV (e.g., test.csv). Then run with:
python rundask.py
This will:
- Spin up a local Dask cluster.
- Partition the dataset (by default, from 1 to 10 partitions).
- Dispatch each partition to a worker, run the same analysis pipeline as
rattle-cli.py, collect results. - Save all per‐contract metrics into
output.csvin the repository root. - Print progress messages like:
Dashboard: http://127.0.0.1:8787/status Initial partitions: 1 Final partitions: 10 ✅ Process 0 completed: ... … ✅ All tasks finished! Saved to output.csv
- Clone the repository:
git clone https://github.com/dcszhang/Obfuscation_Tool.git
cd Obfuscation_Tool-
Set up a Python 3.10+ environment (recommended: use
venvorconda):python3 -m venv venv source venv/bin/activate # macOS/Linux # OR # venv\Scripts\activate # Windows PowerShell
-
Install required packages. Run:
pip install \ dask[complete] \ pandas \ numpy \ networkx \ scikit‐learn \ gensim \ pyparsing \ tqdmAdjust versions as needed. For GPU‐accelerated experiments (RGCN, t‐SNE with CUDA), install the appropriate CUDA‐enabled libraries.
This mode is useful when you have a single raw bytecode string (no file extension). For example:
# Suppose you have a file `example_bytecode.txt` containing:
# 608060405234801561001057600080fd5b506040516101003803806101008339818101604052
# You can run:
python rattle-cli.py < example_bytecode.txtPrints a formatted report, for example:
--------------------------------------------------------------------------------------
Smart contract analysis process
--------------------------------------------------------------------------------------
This is the 1 transfer
(1) Found Transfer Address instruction:
%145 = AND(%144, #4c36d2919e407f0cc2ee3c993ccf8ac26d9ce64e)
(2) trace_step: 3
(3) Tree height: 4
(4) String Operation times: 0
-----------------------------------------------------------
This is the 2 transfer
…
--------------------------------------------------------------------------------------
END
--------------------------------------------------------------------------------------
[3, 4, 0, 91.01775288581848, False, 0.5882352941176471, False, False]
When you have many contracts to analyze, you can process them in parallel:
-
Prepare a CSV (
test.csvor your own) with two columns and a header row:bytecode,address 6080604052348015610010...,0xAbCdEf123... 60806040526004361...,0xDeFaCe456... …
bytecode: Hex string (no0xprefix, no file extension) per row.address: Contract address or identifier (used for labeling output).
-
Edit
rundask.py:-
Open
rundask.pyin a text editor. -
Modify the
csv_pathvariable (near the top) to point to your CSV file. For example:csv_path = "test.csv"
-
-
Run the batch script:
python rundask.py
You will see output like:
Dashboard: http://127.0.0.1:8787/status Initial partitions: 1 Final partitions: 10 ✅ Process 0 completed: <contract_address_1> ✅ Process 1 completed: <contract_address_2> ✅ Process 2 completed: <contract_address_3> … ✅ All tasks finished! Saved to output.csv- A Dask dashboard will be available at
http://127.0.0.1:8787/status(open in browser to monitor real‐time progress). - The script automatically repartitions the dataset (default: from 1 to 10 partitions) and distributes tasks across available CPU cores.
- Once all partitions are processed, a single file
output.csvwill be created in the repository root.
- A Dask dashboard will be available at
Contributions are welcome! If you find a bug or want to add a new feature (e.g., additional obfuscation metrics, support for alternative EVM versions), please:
- Fork the repository.
- Create a new feature branch (
git checkout -b feature/YourFeature). - Make your changes, ensuring all existing tests pass.
- Submit a Pull Request with a clear description of your changes.
Please follow PEP 8 style guidelines and add appropriate documentation or unit tests for new modules.
This project is released under the MIT License. See LICENSE for details.