# Bucket-Probability-Distribution for Threatened Habitats Attack Strategy

### Problem:
The problem consists in assigning probabilities to different buckets to pick a node from for removal.
We want the removal-probability of a node to be proportional to "how much" this node is present in the threatened habitat.
We will then create the needed buckets and fill them with the corresponding nodes. 
In the robustness analysis, we then pick a bucket according to the computed (discrete-) probability distribution, and randomly remove a node from within the bucket.

### Mock Example

Unique Habitats: 
- Grassland
- Cropland; Grassland
- Forest; Grassland
- Forest; Shrubland; Urban
- Shrubland; Urban
- Food-group

Threatened Habitat: 
- Grassland.

### Derivation of Distribution

1. Proportions

Firstly, for each node we compute the proportion of the threatened habitat (Grassland) to the number of habitats, in which a node is present.
We will build the buckets based on the unique combinations of threatened habitat over number of habitats. 
As we do not remove nodes which are part of the habitat food-groups, we simply do not consider them for now.
Later on we will assign these nodes a removal probability of zero.
Proportions $a_i$, where $i=1...n$:
- $a_1 = 1/1 = 100\%$: (Grassland)
- $a_2 = 1/2 = 50\%$: (Cropland; Grassland) and (Forest; Grassland)
- $a_3 = 0/3 = 0\%$: (Forest; Shrubland; Urban)
- not considered for now: Food-group

From these proportions we extract 4 unique buckets, which later on we fill with the corresponding nodes.

2. Smallest Bucket-Removal-Probability

Secondly, we want nodes which are not present in the threatened habitat (Grassland), to have a removal-probability $p$ greater than zero. 
Specifically we define: 
$$y = p({\{habitats \setminus grassland\}})$$

In our case, nodes belonging to the habitats (Forest; Shrubland; Urban) and (Shrubland; Urbana) will be part of the bucket $\{habitats \setminus grassland\}$, with a removal probability of $y$.

3. Probabilites

As the probabilities of removal from the different buckets must sum to 1:
$$
\begin{align}
\sum_{i=1}^{n}p_i = 1 \\
\end{align}
$$

The single probabilities are the equal to:

$$
\begin{align}
p_1 = \frac{1 + x}{\sum_{i=1}^{n}(a_i + x)} \\
p_2 = \frac{0.5 + x}{\sum_{i=1}^{n}(a_i + x)} \\
p_3 = \frac{min_prop + x}{\sum_{i=1}^{n}(a_i + x)} \\
\end{align}
$$

where $x$ is the term that enables to have a target probability of $y$ ($p({\{habitats \setminus grassland\}})$) for the bucket where the proportion of the threatened habitat is equal to zero: $a_3 = 0$.

To find $x$ we solve the following equation:
$$
\begin{align}
y = \frac{min_prop + x}{\sum_{i=1}^{n}(a_i + x)} \\
min_prop + x = y \sum_{i=1}^{n}(a_i + x) \\
min_prop + x = y(nx) + y \sum_{i=1}^{n}a_i \\
min_prop(1-yn) + x(1-yn) = y \sum_{i=1}^{n}a_i \\
x = \frac{(1-yn)}{y \sum_{i=1}^{n}a_i} - min_prop(1-yn)\\
\end{align}
$$


For example, if we pick $y = 5\%$:

$$
x = \frac{(1-0.05n)}{0.05 \sum_{i=1}^{n}a_i} - 0(1-0.05n)
x = 0.088
$$

The distribution will then look as follows:

$$
\begin{align}
p_1 = \frac{1 + 0.088}{\sum_{i=1}^{n}(a_i + 0.088)} \\
p_2 = \frac{0.5 + 0.088}{\sum_{i=1}^{n}(a_i + 0.088)} \\
p_3 = \frac{0 + 0.088}{\sum_{i=1}^{n}(a_i + 0.088)} \\
\end{align}
$$

In the next cell we make sure that the probabilites sum up to one.

In [8]:
a = [1, 0.5, 0] 
n = len(a)
s = sum(a)
y = 0.05
x = y*s / (1-y*n)

print('x =',x)

denominator = sum(a_i + x for a_i in a)

p_1 = (1 + x) / denominator
p_2 = (0.5 + x) / denominator
p_3 = (0 + x) / denominator

print(f"p_1 = {round(p_1, 4)}")
print(f"p_2 = {round(p_2, 4)}")
print(f"p_3 = {round(p_3, 4)}")

print('sum p_i =', round(p_1 + p_2 + p_3, 3))

x = 0.08823529411764708
p_1 = 0.6167
p_2 = 0.3333
p_3 = 0.05
sum p_i = 1.0
