<img src="img/athena_logo.png" alt="ATHENA" style="width: 800px;" class="left"/>

# Quantification of heterogeneity




### Information-theoretic scores

The quantification of the diversity in an ecosystem or a community is a long-standing problem in ecology and, not surprisingly, a vast body of scientific literature has addressed the problem. The application of the concepts developed in ecology to cancer research is straightforward and there is a direct analogy between species/cell types and ecological niches/tumor micro-environments. In general, the metrics developed in ecology try to describe the number of species and their relative abundance within a ecosystem, weighting both aspects differently depending on the metric. The mathematical foundation of these metrics is rooted in information theory.

![entropic-measures.png](img/entropic-measures.png)


### Spatial adaption
To harness the spatial information about the tumor architecture we adjusted the computation of the diversity indices to consider the phenotype distributions of the single observations (cells). Diversity measures can be computed on a _global_ scope (top) or on a _local_ scope (bottom).

The _global_ scope simply uses the phenotype distribution of the sample and is not
exploiting the spatial information in the data. The _global_ scope quantifies the diversity only a sample-level.
This is how traditional diversity scores in ecology work.

In contrast, the _local_ scope exploits the graph representation to compute individual
phenotype distributions for each single cell based on its neighborhood and enables a cell-level quantification of diversity.
The resulting diversity score distribution can be aggregated / summarised to obtain a sample-level diversity score.

![local-global.png](img/local-global.png)


### Overview
The result column indicates if the metric is computed on a global (sample) level or on a local (cell or spot) level. The input column specifies the input information used by the metrics. A metric that uses the phenotype distribution does not rely on spatial information. In contrast, metrics that require a graph input use the spatial information encoded in this data representation. Results of some methods depend on hyperparameter choices, as indicated by the last column. Every metric depends on the phenotyping process employed in the experimental setting.

<!--![metrics-overview.png](img/metrics-overview.png)-->

<!--
| Metric             | Result | Input                  | Hyperparameter         |
|--------------------|--------|------------------------|------------------------|
| Shannon index      | global | phenotype distribution | --                     |
| Shannon index      | local  | graph                  | graph choice           |
| Shannon's evenness | global | phenotype distribution | --                     |
| Shannon's evenness | local  | graph                  | graph choice           |
| Simpson index      | global | phenotype distribution | --                     |
| Simpson index      | local  | graph                  | graph choice           |
| Simpson's evenness | global | phenotype distribution | --                     |
| Simpson's evenness | local  | graph                  | graph choice           |
| Gini-Simpson index | global | phenotype distribution | --                     |
| Gini-Simpson index | local  | graph                  | graph choice           |
| Renyi entropy      | global | phenotype distribution | $\alpha$               |
| Renyi entropy      | local  | graph                  | $\alpha$, graph choice |
| Hill numbers       | global | phenotype distribution | $q$                    |
| Hill numbers       | local  | graph                  | $q$, graph choice      |
| Ripley's K         | global | graph                  | radius, graph choice   |
| Infiltration       | global | graph                  | graph choice           |
| Classic            | global | graph                  | graph choice           |
| HistoCAT           | global | graph                  | graph choice           |
| Proportion         | global | graph                  | graph choice           |
| kNN score          | global | graph                  | graph choice           |
-->

| Metric             | Result | Input                  | Hyperparameter         |
|--------------------|--------|------------------------|------------------------|
| Shannon index      | global | phenotype distribution | --                     |
| Shannon index      | local  | graph                  | graph choice           |
| Simpson index      | global | phenotype distribution | --                     |
| Simpson index      | local  | graph                  | graph choice           |
| Renyi entropy      | global | phenotype distribution | $\alpha$               |
| Renyi entropy      | local  | graph                  | $\alpha$, graph choice |
| Hill numbers       | global | phenotype distribution | $q$                    |
| Hill numbers       | local  | graph                  | $q$, graph choice      |
| Quadratic Entropy  | global | phenotype distribution | $D(x,y)$               |
| Quadratic Entropy  | local  | phenotype distribution | $D(x,y)$, graph choice |
| Ripley's K         | global | graph                  | radius, graph choice   |
| Infiltration       | global | graph                  | graph choice           |
| Classic            | global | graph                  | graph choice           |
| HistoCAT           | global | graph                  | graph choice           |
| Proportion         | global | graph                  | graph choice           |

## Infiltration
The infiltration score was introduced by Keren _et al._ to measure the degree of immune cell infiltration into the tumor mass.

$\text{score}=\frac{N_{it}}{N_{ii}}$

where $N_{it}$ is the number of edges between tumor and immune cells and $N_{ii}$ the number of edges between immune cells.

## Phenotype interactions

Interaction strength of pairs-wise phenotypes is computed by observing the number or proportion of interactions a given phenotype has with another phenotype on average across a sample. A permutation test is used to determine whether the observed interaction strength is an  enrichment or depletion.

![interactions.png](img/interactions.png)
<!-- ![interactions-quant.png](interactions-quant.png) -->

The framework implements three different flavours to determine the pair-wise interaction strength between phenotypes.

- classic / [histoCAT](http://www.nature.com/articles/nmeth.4391): Methods developed by the Bodenmiller lab. Estimate the pair-wise interaction strength by counting the number of edges between pair-wise phenotypes.
- proportion: Flavour of the classic method that normalises the number of edges between phenotypes by the total number of edges present and thus bounds the score [0,1]. 

All those methods assess the direction of the interaction (attraction / avoidance) by a permutation test.
This is, the phenotype labels are randomly permuted and the interaction strength recomputed.
This is repeated multiple times to generate a null hypothesis against which the observed interaction strength is compared.
If `prediction_type=pvalue`, we compute P-values for the interaction strength based on the two individual one-tailed permutation tests.
If `prediction_type=diff` the score is simply the difference of the average interaction strength across all permutations and the observed interaction strength.

In the following cell we compute the interaction strength between the `meta_id` phenotypes.