TaintMini is a framework for detecting flows of sensitive data in Mini-Programs with static taint analysis. It is a novel universal data flow graph approach that captures data flows within and across mini-programs.
We implemented TaintMini based on pdg_js
(from DoubleX by Aurore Fass et al.). For more implementation details, please refer to our paper and the DoubleX paper.
For optimal performance, we recommend allocating at least 4 cores and 16 GiB of memory to run the tool. Additionally, for best IO performance during analysis, we recommend using SSDs rather than hard disk drives, due to the large number of small files (less than one page size) that Mini-Programs typically have. As a reference, we used 16 vCPUs of Intel Xeon Silver 4314, 128 GiB of 3200 MHz DDR4 memory, and 2 TiB of NVMe SSD (700 KIOPS) as the host for building and validating our artifact evaluation submission.
Install Node.js dependencies for pdg_js
first.
# make sure node.js and npm is installed
node --version && cd pdg_js && npm i
Install requirements for python.
# install requirements
pip install -r requirements.txt
TaintMini operates on unpacked WeChat Mini-Programs, necessitating the use of a WeChat Mini-Program unpacking tool in advance. Please note that we are unable to provide such a tool directly due to potential legal implications. We recommend seeking it out on external websites.
usage: mini-taint [-h] -i path [-o path] [-c path] [-j number] [-b]
optional arguments:
-h, --help show this help message and exit
-i path, --input path
path of input mini program(s). Single mini program directory or index files will both be fine.
-o path, --output path
path of output results. The output file will be stored outside of the mini program directories.
-c path, --config path
path of config file. See default config file for example. Leave the field empty to include all results.
-j number, --jobs number
number of workers.
-b, --bench enable benchmark data log. Default: False
Results will be written to the directory provided by the -o/--output
flag.
Result files are named $(basename <directory>)-result.csv
,
along with $(basename <directory>)-bench.csv
if -b/--bench
option is present.
The config.json
is a JSON formatted file, which includes two fields: sources
and sinks
:
sources
is an array, indicating the source APIs that need to be included. Please note there is a special value named[double_binding]
which indicates the data flows fromWXML
.sinks
is an array, indicating the sink APIs that need to be included.
For examples, please refer to the config.json
file.
Analyze a single MiniProgram; Include all sources and sinks; Enable multi-processing (all available CPU cores); No benchmark required.
python main.py -i /path/to/miniprogram -o ./results -j $(nproc)
Analyze multiple MiniPrograms; Include all sources and sinks; Enable multi-processing (all available CPU cores); Benchmarks required.
# generate index
find /path/to/miniprograms -maxdepth 1 -type d -name "wx*" > index.txt
# start analysis
python main.py -i ./index.txt -o ./results -j $(nproc) -b
If you find TaintMini useful, please consider citing our paper and DoubleX:
@inproceedings{wang2023taintmini,
title={TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis},
author={Wang, Chao and Ko, Ronny and Zhang, Yue and Yang, Yuqing and Lin, Zhiqiang},
booktitle={Proceedings of the 45th International Conference on Software Engineering},
year={2023}
}
@inproceedings{fass2021doublex,
author="Aurore Fass and Doli{\`e}re Francis Som{\'e} and Michael Backes and Ben Stock",
title="{\textsc{DoubleX}: Statically Detecting Vulnerable Data Flows in Browser Extensions at Scale}",
booktitle="ACM CCS",
year="2021"
}
This project is licensed under the terms of the AGPLV3 license.
- pdg_js is credit to DoubleX