We implemented a prototype of ProbetheProto in "Probe the Proto: Measuring Client-Side PrototypePollution Vulnerabilities of One Million Real-world Websites" and evaluated it on one million websites. The research reveals 2,917 zero-day, exploitable prototype pollution vulnerabilities in 2,738 real-world websites—including ten among the top 1,000 Tranco websites. 48 vulnerabilities further lead to XSS, 736 to cookie manipulations, and 830 to URL manipulations. A mostly complete list (excluding some websites that cannot be reached or are still in the process of vulnerability patching) is here.
This repository contains source codes of ProbetheProto: the Chromium-based dynamic-taint-analysis engine, the Exploit Generator module, the Result Validation module and the Defense analysis module. Should you have any questions about the instructions below, do feel free to create a GitHub issue and we will respond ASAP.
The engine is based on Melicher et al.'s work "Riding out DOMsday: Toward Detecting and Preventing DOM Cross-Site Scripting". https://github.com/wrmelicher/ChromiumTaintTracking
We provide a compiled version of the engine under this link: https://drive.google.com/file/d/1NYySaSmhDP-kE4xupcl_TMUKvjfjj7Q9/view?usp=sharing, as well as its source codes under this link: https://drive.google.com/file/d/1v-H7DifvlKdjV5lzYfVXJ5XYCrCzaXxp/view?usp=sharing. Please download and unzip the source code tar file here as ./taint_engine/
, and the compiled version as ./taint_engine/out/PP/
. You can just download the compiled binary files if you don't need the source code to build from scratch.
Afterwards, please check INSTALL.md
for installation and compilation instructions; CRAWL.md
for crawling instructions. Besides, you may refer to https://github.com/wrmelicher/ChromiumTaintTracking/blob/master/TAINT_TRACKING_README for installing depot_tools
, capnp
(https://capnproto.org) and other necessary issues for setting up the environment.
Important: The engine is only for academic use. It is based on a very old version of Chromium and will run without sandbox, so security and privacy shall suffer. We provide NO guarantee about its security and privacy and take NO responsibility for any possible consequence if you choose to run it.
Important: ProbetheProto respects robots.txt
during crawling. Every information that we obtained is public, just like a normal web bot.
ProbetheProto depends on log files generated during crawling by the Chromium-based taint engine. After you have successfully compiled, run the taint engine and then had the log_files generated by it:
First, do cd python-analysis
and change configurations in match_configs.py
using either vim match_configs.py
or nano match_configs.py
. Set the stem
as your root path to this directory, e.g. /home/client-pp/Documents/ProbetheProto/python-analysis
. When generating exploits, set generating_exploits
and write_to_txt_files
as True
. Remember to always set recursive_pp_log_dir
as the log_file path you assigned in the previous crawling step. (If you keep both of them as the default values, you will be fine! )
Second, run python3 generate_exploits.py
in command line. It only uses popular Python modules so it won't be troublesome speaking of the environment. When it finishes, you should get not only a list of potentially-vulnerable websites but also the exploits for the storage type in extensions/cookie_storage_modify_extension/storage_data.js
. To generate exploits for the message type, uncomment line #321 and comment line #322 of generate_exploits.py
and run it again. As for the URL-based types, we have a default exploit stored in extensions/crawler-extension-pp/content.js
.
Important: The exploits only contain dummy values. Our exploitation posted no real damages to anybody on the web and it happened only at the client-side without in-curring any additional network traffic. We take NO responsibility if you change the dummy values to something else on your own.
Third, before running the result validation, you may consider crawling the discovered websites once more by running:
sudo bash ./sh/check-pp-auto-recursive.sh 0 1000 2 20
The parameters 0 1000 2 20
share the same meaning as of the command for crawling mentioned in taint_engine/crawl.md
, for sh/recursive-pp-key1key2-auto.sh
. The engine will crawl the websites with the URL-based exploit and try to pollute the prototype with a dummy value. Certainly no harm will be dealt in such a process when just a dummy value is taken into account.
Then, try running other two bash scripts for other types: sudo bash ./sh/check-pp-auto-recursive.sh 0 1000 2 10
for the cookie type, and sudo bash ./sh/postMessage-auto.sh 0 100 2 1
for the message type. This step is to simulate an adversary taking control of document.cookie
, or being able to pretend as the message origin and send messages via postMessage
, in order to pollute the prototype with a dummy value in the target website. Certainly no harm will be dealt in such a process when just a dummy value is taken into account.
Fourth, do cd python-analysis
and change configurations in match_configs.py
. Remember to always set check_pp_log_dir
as the log_file path you assigned in the previous re-crawling step. (If you keep both of them as the default values, you will be fine! )
Fifth, run python3 generate_exploits.py
again. When it finishes, you should get not only a list of vulnerable websites but also the # of vulnerabilities and taint flows. You may also want to change the log path in generate_exploits.py
-- just set count_flow_log_file
as you desired in match_configs.py
and adjust line #571 of generate_exploits.py
accordingly. Rerun python3 generate_exploits.py
after you apply the changes.
Lastly, if generate_exploits.py
runs with the CONFIG of check_other_prototype = True
, you may find in count_flow_log_file
the statistics about # of taint flows with defense. The result is in Python dict format, whose values follow the form of [0, set(), 0, set(), 0, set()]
, where the last two elements contain # of taint flows with defense and a set of corresponding websites.
python-analysis/miscellaneous.py
envelops useful helper-functions for the evaluation of ProbetheProto.
sh/show-load-time-key1key2-1k.sh
is responsible for acquiring the page load times for the taint engine of ProbetheProto during crawling.