README

This is the implementation of the ICLR 4064 submission "COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks". The code is adapted on the basis of the offline RL training repo https://github.com/google-research/batch_rl.

Basically, we provide two certification (per-state action certification and reward certification) for three aggregation protocols (PARL, TPARL, DPARL). Below we present the example commands for running these certifications.

Dataset Partitioning via Hashing

Generate the trajectory indices for each hash num in $[100]$ :

python split.py --train-data-folder /data/common/kahlua/dqn_replay/$1/$2/replay_logs \
                --output-folder /data/common/kahlua/dqn_replay/hash_split/$1_$2

With the above command in split_script.sh, simply run the following commands.

bash split_script.sh highway 1
bash split_script.sh highway 2
bash split_script.sh highway 3
...
bash split_script.sh highway 20

For each hash number, generate the corresponding datasets

python gen_split.py --train-data-folder /data/common/kahlua/dqn_replay/$1/$2/replay_logs \
	                --epi-index-path /data/common/kahlua/dqn_replay/hash_split/$1_$2/partition_$3.pt \
			        --output-folder /data/common/kahlua/dqn_replay/hash_split/$1_$2/dataset/hash_$3 \
					--start-id 0 --end-id 400

With the above command in gen_split_script.sh, simply run the following commands.

bash gen_split_script.sh highway 1 0
bash gen_split_script.sh highway 2 0
bash gen_split_script.sh highway 3 0
...
bash gen_split_script.sh highway 20 0

The above commands generate the $5$ datasets for hash number $0$ . We would repeat the above commands for $100$ times to generate the datasets for hash number $0 \sim 99$ .

For each hash number, merge the $20$ Datasets

python merge_splits.py --input-folder /data/common/kahlua/dqn_replay/hash_split/highway --hash-num 0

The above command merges the $20$ Datasets for hash number $0$ . Repeat it for $100$ times for all hash numbers.

Model Training

The following command trains the model based on the datasets highway of hash number $1$ using RL algorithm DQN for $100$ iterations.

CUDA_VISIBLE_DEVICES=2 python -um batch_rl.fixed_replay.train \
    --base_dir=/data/common/kahlua/COPA/highway/hash_1 \
    --replay_dir=/data/common/kahlua/dqn_replay/hash_split/highway/hash_1/ \
    --agent_name=dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

Certifying Per-State Action

PARL

python -um batch_rl.fixed_replay.test \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg tight \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

where base_dir is the path for storing experimental logs and results, and model_dir is the path of trained $u$ subpolicies.

TPARL

python -um batch_rl.fixed_replay.test \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg window \
    --window_size 4 \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

For TPARL, we explicitly pass the cert_alg option as window and configure the predetermined window size $W$ .

DPARL

python -um batch_rl.fixed_replay.test \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg dynamic \
    --max_window_size 5 \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

For DPARL, we explicitly pass the cert_alg option as dynamic and configure the maximum window size $W_{m a x}$ .

Certifying Cumulative Reward

PARL

python -um batch_rl.fixed_replay.test_reward \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg tight \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

where base_dir is the path for storing experimental logs and results, and model_dir is the path of trained $u$ subpolicies.

TPARL

python -um batch_rl.fixed_replay.test_reward \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg window \
    --window_size 4 \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

For TPARL, we explicitly pass the cert_alg option as window and configure the predetermined window size $W$ .

DPARL

python -um batch_rl.fixed_replay.test_reward \
    --base_dir [base_dir]  \
    --model_dir [model_dir] \
    --cert_alg dynamic \
    --max_window_size 5 \
    --total_num 100 \
    --max_steps_per_episode 30 \
    --agent_name dqn \
    --gin_files='batch_rl/fixed_replay/configs/dqn_highway.gin'

For DPARL, we explicitly pass the cert_alg option as dynamic and configure the maximum window size $W_{m a x}$ .

Name	Name	Last commit message	Last commit date
Latest commit garyxcj first commit Mar 19, 2022 28e2f20 · Mar 19, 2022 History 1 Commit
batch_rl	batch_rl	first commit	Mar 19, 2022
dopamine	dopamine	first commit	Mar 19, 2022
README.md	README.md	first commit	Mar 19, 2022
gen_split.py	gen_split.py	first commit	Mar 19, 2022
gen_split_script.sh	gen_split_script.sh	first commit	Mar 19, 2022
helper.py	helper.py	first commit	Mar 19, 2022
merge_splits.py	merge_splits.py	first commit	Mar 19, 2022
split.py	split.py	first commit	Mar 19, 2022
split_script.sh	split_script.sh	first commit	Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Dataset Partitioning via Hashing

Model Training

Certifying Per-State Action

Certifying Cumulative Reward

About

Releases

Packages

Languages

AI-secure/COPA_Highway

Folders and files

Latest commit

History

Repository files navigation

README

Dataset Partitioning via Hashing

Model Training

Certifying Per-State Action

Certifying Cumulative Reward

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages