# PANIC Model

Prototypical additive neural network for interpretable classification that integrates 3D image and tabular data.

PANIC consists of one neural net for 3D image data, one neural net for each tabular feature, and combines their outputs via summation to yield the final prediction.

Therefore Panic can be seen as GAM extended by functions that measure similarities between an input image and a set of class specific prototypes:

$$
p(c \mid x_1, \ldots, x_N, \mathcal{I}) = \text{softmax}(\mu^c) \\ 
\quad \mu^c = \beta_0^c + \sum_{n=1}^N f_n^c(x_n) + \sum_{k=1}^K g_k^c(\mathcal{I})
\tag{1}
$$


- p(c) is the probability that an individual belongs to class c given the data
- $\mu^c$ is a GAM where $f_n^c(x_n)$ is the class-specific output for feature n and $g_k^c$ is the similarity of one image to the k-th prototype of class c

## Tabular Model $f_n^c(x_n)$

If $x_n$ is continous $f_n^c(x_n)$ is a a multi-layer perceptron (MLP) [1]. \
If $x_n$  is discrete it is estimated linear (step function).

### Regularization

Following [1], $\ell_2$ penalty is applied on the outpuits of $f_n^c(x_n)$.

$$
\mathcal{L}_{\text{Tab}}(x_1, \ldots, x_n) \;=\; \frac{1}{C} \sum_{c=1}^C \sum_{n=1}^N \Big[ f_n^c(x_n) \Big]^2
$$


## Image Model $g_k^c$

Multiple types of prototypes are possible: 
-	ProtoPNet can learn a single prototype for the disease
--> Might not be sufficient as a disease might manifest in multiple regions (e.g. the hippocampus appears in the left and right hemisphere)
-	Deformable ProtoPNet  allows multiple fine-grained prototypical parts to extract prototypes, but is bound to a fixed number of prototypical parts that represent a prototype
-	XProtoNet [11] overcomes this limitation by defining prototypes based on attention masks rather than patches; it has been applied for lung disease classification from radiographic images.


### XProtoNet

An image is classified based on the cosine similarity between a latent feature vector $z^{pc}_k$ and learned class-specific prototypes $p^c$.

$$
g_k^c(\mathcal{I}) = \text{sim}(p_k^c, z_{p_k}^c) 
= \frac{p_k^c \cdot z_{p_k}^c}{\lVert p_k^c \rVert \, \lVert z_{p_k}^c \rVert}
\tag{3}
$$

- Measures the **angle** between two vectors, ignoring their magnitude.
- Range: $[-1, 1]$
  - $1$: vectors perfectly aligned (high similarity)  
  - $0$: orthogonal (no similarity)  
  - $-1$: opposite directions (high dissimilarity)  
- Used to decide how close a latent feature vector is to a class prototype.
- The denominator removes the effect of vector length.


#### Latent Feature Vector $z^{pc}_k$ 

A latent feature vector $z^{pc}_k$ is obtained by passing an image $\mathcal{I}$ into a CNN backbone

$$
U : \mathbb{R}^{1 \times H \times D \times W} \;\to\; \mathbb{R}^{R \times H' \times D' \times W'}
$$  

where $R$ is the number of output channels. The result is passed into two separate modules:  

1. **Feature extractor**  
   $$
   V : \mathbb{R}^{R \times H' \times D' \times W'} \;\to\; \mathbb{R}^{L \times H' \times D' \times W'}
   $$  
   which maps the feature map to the dimensionality of the prototype space $L$.  

2. **Occurrence module**  
   $$
   O^c : \mathbb{R}^{R \times H' \times D' \times W'} \;\to\; \mathbb{R}^{K \times H' \times D' \times W'}
   $$  
   which produces $K$ class-specific attention masks.  

Finally, the latent feature vector is defined as  

$$
z^{pc}_k = \text{GAP}\!\left[ \; \sigma\!\big(O^c(U(\mathcal{I}))_k\big) \;\odot\; \text{softplus}\!\big(V(U(\mathcal{I}))\big) \;\right],
\tag{4}
$$  

where  
- $\odot$ denotes the Hadamard product,  
- $\sigma$ is the sigmoid function,  
- **GAP** is global average pooling.  


**Intuition of each step:**

- **Backbone $U$:** extracts general CNN feature maps from the input image $\mathcal{I}$.
- **Feature extractor $V$:** projects these features into the **prototype space** of dimension $L$.
- **Occurrence module $O^c$:** generates $K$ class-specific attention masks that highlight *where* prototypes are present in the image.  
  - Useful even if anatomy is fixed: abnormalities vary in location, irrelevant regions are suppressed, and robustness is added across datasets.
- **Sigmoid + Softplus:**  
  - Sigmoid $(\sigma)$: squashes mask values to $[0,1]$ (attention weights).  
  - Softplus: ensures feature activations are positive and smoothly scaled.
- **Hadamard product ($\odot$):** combines the attention mask with the feature map elementwise, keeping only the attended regions.
- **Global Average Pooling (GAP):** aggregates the weighted features into a single latent prototype vector $z^{pc}_k$.


# References

1. Agarwal, R., Melnick, L., Frosst, N., et al.: Neural additive models: interpretable
machine learning with neural nets. In: NeurIPS, vol. 34, pp. 4699–4711 (2021)



11. Kim, E., Kim, S., Seo, M., Yoon, S.: XProtoNet: diagnosis in chest radiography
with global and local explanations. In: CVPR, pp. 15719–15728 (2021)