# Tactical Approaches to Adversarial attacks on RL

We have already touched upon several methods on reinforcement learning and why these are considered notable breakthroughs in the field. But there is yet another concern left to discuss. If these reinforcement methods are to be implemented and applied to day to day life it has to be assured that they are solid and safe to use. 

So how vulnerable are these algorithms to attacks? How easy would it be to interfere with the model's ability to pick good actions by say hindering its perception? Can agents applied to self-driving cars be manipulated to crash by modifying its observation by e.g. altering traffic signs or lane markings?

## Adversarial attacks

<!-- 
- fundamentals of adversarial attacks
- keywords
- goals of attacks
- how agent and environment are affected 
- focus of this post 
-->

First of all let us introduce the fundamental concepts of adversarial attacks - a special classification of attacks. The overall goal of an adversarial attack is reducing the agent's reward to a minimum by manipulating its choices.

In order to achieve this goal an adversarial attack impairs the performance of a trained model - in our case a RL trained model - by feeding it with false information. This so called **_adversarial sample_** usually consists of a perturbed version of the original observation which itself is returned by the environment. The adversarial sample manipulates the agent to take preferably the least desired action while also being similar enough to a valid observation to not be easily detectable. 

While the **_adversarial perturbation_** is the amount of noise added to the observation during the sample crafting, the instance or agent crafting the samples themselves is called **_adversary_**. Furthermore we differentiate so called **_white-box attacks_** from **_black-box attacks_**. Adversaries of the latter attack models of which they have no information. In some cases (cases in which the adversary has limited information about the target model but never its parameters) attacks are further sub-classified as **_semi-black-box attacks_**. 

This specific post will limit itself on **_tactical approaches_** to adversarial attacks as presented in Lin, _et al._ (2017).

## Different types of adversarial attacks

<!-- 
( strategically timed to critical point)(maybe enchanting,antagonist)

- explain basic idea(strategically timed and enchanting)
- explain attack strategy and present functions(informal or formal)
- effects on agent and environment

- introduce critical point strategy( and antagonist attack) ( what are the differences?)
- basic idea and principle
- attack strategy
- effect on agent and environment compared to strategically timed attack 
-->

Starting off the most approachable and simple way to go about attacking an agent using adversarial methods is the **_uniform attack_**. Here adversarial samples are crafted at each and every timestep. Therefore the agent is attacked a lot resulting in a large adversarial perturbation which somewhat defeats the idea of adversarial attacks being rather difficult to detect.

### Strategically timed attack

Lin, _et al._ introduce the idea of so called **_strategically timed attacks_**. Even for simple examples it is quite intuitive that attacks are not equally efficient at different timesteps, meaning e.g. attacking an agent that acts in OpenAI Gym's **CarRacing** environment (introduced in more detail later on) would be less efficient during longer straight sections of the track compared to curved sections.
To determine when the adversary is to craft an adversarial sample we first compute a function $c$ that essentially compares the rewards of the agent's best and worst action as follows:
$$c(s_t) = \max_{a_t}\pi(s_t, a_t) - \min_{a_t}\pi(s_t, a_t)$$

Note that this method of computing $c$ is only applicable for policy gradient-based methods like A3C or PPO.

Next an adversarial sample is only crafted if $c$ at least matches a certain threshold $\beta$. Overall the number of attacks during an episode depends on wether or not an adversarial sample was crafted in the individual timesteps and therefore directly on said threshold $\beta$. Put simple a large threshold results in few attacks while a small threshold results in many attacks. This of course not only affects the overall adversarial perturbation but also the effectiveness of the adversarial attacks. Choosing $\beta$ wisely therefore determines both the success of an adversary attack and its perceptibility. 

### Enchanting attack

The next idea of attack by Lin, _et al._ is called *__enchanting attack__*. The basic principle of this attack is to __lure__ the agent to a desired state $ s_g$, so that it would minimize its rewards in the process of reaching it. The main difference in this case to strategically timed attack is that it does not try to reduce the rewards of the agent directly, but rather misleading it towards a target state, so it would lose out on reaching its optimal states. <br>

To achieve this, a series of adversarial examples have to be crafted by a planning algorithm and in addition, it also needs a *__generative model__* to predict future states, so that a suitable planned sequence of actions can be crafted by the adversary. The series of adversarial examples is described as this:
$$s_{t+1} + \delta_{t+1}, ..., s_{t+H} + \delta_{t+H}$$ 

$H$ marks the total amount of timesteps and $\delta$ is the perturbation that has to be added to the states. However, this sequence is not crafted as a whole in the beginning, it is crafted progressively after every timestep that has been reached by the agent. For example, the current state $s_t$ is perturbed with $s_t+\delta_t$ and after the agent executes the planned action it receives a new state $s_{t+1}$, which would be perturbed next into $s_{t+1}+\delta_{t+1}$. This process continues until $s_{t+H}+\delta_{t+H}$ is reached and the agent is in the target state $s_g$. The actions that are taken by the agent are based on the distance between the current state and the target state and therefore, the action that leads closer to the target state is chosen. <br>

As mentioned above, a generative model is needed in order to predict the states and thus the possible sequences of actions.
In this case, a video prediction model $M$ is used, which predicts fututre video frames. It consists of a series of future actions starting from the current action: $A_ {t:t+H} = \{a_t,...,a_{t+H} \}$. This means that the current state $s_t$ is also included in the model hence the model is described as: $M(s_t,A_{t:t+H})$. However, the goal of the generative model is to predict the future states and the series of actions $A_ {t:t+H}$ is described to reach a future state and thus it can be concluded as the predicted future state: $$ s^{M}_{t+H} = M(s_t,A_{t:t+H})$$ 

### Critical point attack

### Antagonist attack


## Implementation and results
- show process and results through implementation on example
- compare results of attacks 

## Conclusion
- evaluate results of attacks
- explain why critical point attack is more "evolved"
- outlook on application of these attacks on Real world scenarios
- (maybe comparison with other attacks

# References

- [Akhtar and Mian (2017)](https://ieeexplore.ieee.org/abstract/document/8294186)
- [Lin, _et al._ (2017)](https://arxiv.org/abs/1703.06748)
- [Carlini and Wagner (2016)](https://ieeexplore.ieee.org/abstract/document/8294186)
- [CarRacing](https://gym.openai.com/envs/CarRacing-v0/)