---
title: Infodemics
math: 
    '\abs': '\left\lvert #1 \right\rvert' 
    '\norm': '\left\lvert #1 \right\rvert' 
    '\Set': '\left\{ #1 \right\}'
    '\set': '\operatorname{set}'    
    '\mc': '\mathcal{#1}'
    '\M': '\boldsymbol{#1}'
    '\R': '\mathsf{#1}'
    '\RM': '\boldsymbol{\mathsf{#1}}'
    '\op': '\operatorname{#1}'
    '\E': '\op{E}'
    '\d': '\mathrm{\mathstrut d}'
---

## Problem Formulation

To capture information flow in a network consisting of a discrete set $V$ of nodes, we generalize the SI model to use the following infection rate tuple:

::::{prf:definition} submodular flow rate tuple

A rate tuple $\lambda_V:=(\lambda_u|u\in V)$ where $\lambda_u:2^V\to \mathbb{R}_+$ is (normalized, monotonic, and) submodular iff

$$
\begin{align}
\lambda_u(\emptyset) &= 0\\
\lambda_u(B') &\leq \lambda_u(B) && \forall B'\subseteq B \subseteq V\\
\lambda_u(B_1) + \lambda_u(B_2) &\leq \lambda_u(B_1\cap B_2) + \lambda_u(B_1\cup B_2)&&  \forall B_1,B_2\subseteq V.
\end{align}
$$ (submodular-flow)

::::

::::{prf:example} Information flow

A possible choice of the submodular flow rate tuple is 

$$
\lambda_u(B) := I(\R{X}_B\wedge \R{Y}_u|\R{X}_{V\setminus B}),
$$

where $\R{X}_u$ and $\R{Y}_u$ are discrete input and output random variables of node $u\in V$ such that $H(\R{X}_V)=\sum_{u\in V} H(\R{X}_u)$ and $H(\R{Y}_V|\R{X}_V)=0$.[^sub] For graphical models where $\lambda_{uw}$ denotes the weight of the edge $(u,w)$, we can set 

$$
\lambda_u(B) := \sum_{w\in B} \lambda_{uw}.
$$

::::

[^sub]: Submodularity follows from the submodularity of entropy and rewriting $\lambda_u(B)$ as $H(\R{Y}_u|\R{X}_{V\setminus B})=H(\R{Y}_u,\R{X}_{V\setminus B})+H(\R{X}_B)-H(\R{X}_V)$.

To define the SI model for infodemic, define the time for $u\in V$ to be infected by $B\subseteq V\setminus \Set{u}$ by the exponentially distributed random variable

$$
\R{T}_u(B) \sim \text{Exp}(\lambda_u(B))
$$ 

mutually independent over $u$ and $B$. Let $\R{U}(t)$ be the sequence of infected nodes at time $t\geq 0$, i.e., $\R{U}(t)=u^k$ means $u_i$ is the $i$-th first infected node by time $t$ for $i\in [k]$. Then, the infection times determine the sequence of infected node as follows:

::::{prf:definition} SI model beyond graphs
Given $\R{U}(\tau)=u^{k}$, for all $t>\tau$,

$$
\R{U}(t)=u^{k+1}  \iff u_{k+1} =\arg\min_{w\in V\setminus \set(u^k)} \R{T}_w(\set(u^k))
$$

where the setify function 

$$
\set(u^k):=\Set{u_i}_{i=1}^l
$$ 

turns the input sequence $u^k$ into an unordered set. 
::::

A unique choice of $u_{k+1}$ is possible because, almost surely, $\R{T}_w(\set(u^k))$'s are distinct.

The source detection problem is to find $\set(\R{U}(0))$ given $\set(\R{U}(\R{T}^{(k)}))=S$, where

$$
\R{T}^{(k)} = \inf\Set{t\geq 0| \abs{S(\R{U}(t))}=k},
$$ (eq:Tk)

namely, the time to infect the first $k$ nodes.

A standard approach is to find the maximum likelihood estimator. We propose to the following maximum likelihood estimator:

::::{prf:definition} Maximum likelihood source detection

Let

$$
\begin{align}
L_{S,t}(W) &:= P\left[\set(\R{U}(t))=S\middle|\set(\R{U}(0))=W\right],
\end{align}
$$ (eq:L)

be the likelihood probability of observing the set $S$ of infected node at time $t\geq 0$ given $W$ is the set of infected nodes at time $0$. The maximum likelihood estimate of $\set(\R{U}(0))$ is a solution to

$$
\begin{align}
\max_{W\in \mathcal{W}_S} E[L_{S,\R{T}}(W)]
\end{align}
$$ (eq:ML)

where $S\subseteq V$ is a given set of infected nodes observed at some independently chosen random time $\R{T}$, and $\mathcal{W}_S\subseteq 2^S$ is a set of hypotheses.

::::

The single-source detection problem corresponds to the case $\mathcal{W}_S=\Set{\Set{s}|s\in S}$. In this case, we will show that our proposal [](#eq:ML) can be more meaninful than the existing formulations below

$$
\begin{align}
\max_{s\in S} L_{S,\R{T}^{(k)}}(\Set{s})
\end{align}
$$ (ML:D)

$$
\begin{align}
\max_{s\in S} \sup_{t\geq 0} L_{S,t}(\Set{s})
\end{align}
$$ (ML:I)

where $L_{S,t}$ and $\R{T}^{(k)}$ are defined in [](eq:L) and [](#eq:Tk) respectively.

### Main results

The likelihood probability can be computed by summing the probability of all possible infection sequences:

\begin{align}
L_{S,t}(W) &= \sum_{u^k\in \Pi_{S|W}} P\left[\R{U}(t)=u^k|\set(\R{U}(0))=W\right]
\end{align}

where $\Pi_{S|W}$ is the set of permitted infection sequence $u^k$ from $W$ to $S$ satisfying

$$
\begin{align}
u_l &\in S\setminus (W\cup \set(u^{l-1})) \text{ and }
\lambda_{W\cup \set(u^{l-1}),u_l} > 0 && \forall l\in [k].
\end{align}
$$

To compute the probability of each permitted infection sequence, define

\begin{align}
f_{W,u^k}^{(l)}(t)&:=
P\left[\R{U}(t)=u^k_{l}|\set(\R{U}(0))=W\cup \Set{u^{l-1}}\right]
\end{align}

$f_{W,u^k}^{(l)}(t)$ can be computed recursively due to the following recurrence formula:

::::{prf:proposition}

$L_{S,t}(W) = \sum_{u^k\in \Pi_{S|W}} f_{W,u^k}^{(1)}(t)$ where 

\begin{align}
f_{u^k|W}^{(l)}(t) &=
\begin{cases}
\left(f_{u^k|W}^{(l+1)} * g_{u^l|W}\right)(t) && l\in [k]\\
1 && l=k+1
\end{cases}\\
g_{u^l|W}(t) &:= \left.p_{\R{T}_{B, u_l}}(t)\prod_{u\in (V\setminus B)\setminus \Set{u_l}} P[\R{T}_{B,u}>t]\right|_{B=W\cup\set(u^{l-1})}.
\end{align}

::::

::::{prf:proof}

It suffices to consider $W:=\Set{u_0}$ by contracting all nodes in $W$ into a single node $u_0$. For $u^k\in \Pi_{S|W}$, we have

\begin{align}
P\left[\R{U}(t)=u_l^k\middle |\R{U}(0)=u^{l-1}_0\right]
&= 
\int_{\tau=0}^t P\left[\R{U}(t)=u_l^k\middle |\R{U}(\tau)=u^{l}_0\right]\times \\
&\quad \times P\left[\R{T}_{\set(u^{l-1}_0),u}>\tau, u\in V\setminus \set(u^l_0)\right]dP_{\R{T}_{\set(u^{l-1}_0),u_l}}(\tau).
\end{align}

::::