# The conceptual foundations
There are two main NBDA variants: order-of-acquisition diffusion analysis (OADA), which takes as data the order in which individuals acquired the target behaviour, and time-of-acquisition diffusion analysis (TADA), which uses the times of acquisition of the target behaviour. TADA can be further subdivided into versions that treat time as either a continuous variable (continuous TADA or ‘cTADA’) or as a discrete variable split into units (discrete TADA or ‘dTADA’). Choice of OADA versus cTADA versus dTADA depends on the diffusion data available and the assumptions one is willing to make about how the rate of learning changes over time (i.e. the shape of the baseline rate function, λo(t)). 

OADA makes no assumptions about the shape of $λ_o(t)$, but only assumes that this function is the same for every individual in the diffusion. In contrast, TADA requires a researcher to make assumptions about the form of $λ_o(t)$, and fit parameters controlling its shape. When these assumptions are met, TADA offers more statistical power than OADA (Hoppitt, Boogert, et al., 2010), particularly when the network is densely connected with little variation in connection strength. Indeed, when the network is completely homogeneous (i.e. all possible connections exist and are of equal strength), OADA cannot distinguish social transmission from asocial learning since all orders of acquisition would be equally likely in both models. 

In the simplest case, one can fit a TADA that assumes a constant baseline learning rate, $λ_o(t) = λ_o$, with an extra parameter, $λ_o$, fitted to the data (Franz & Nunn, 2009; Hoppitt, Boogert, et al., 2010). However, this assumption may often not hold—for example, individuals might initially exhibit neophobic responses towards a learning task, but as neophobia fades over time, asocial learning rates should increase. Such circumstances can cause a spurious positive result for social transmission in a TADA (Hoppitt, Kandler, Kendal, & Laland, 2010). 

Fortunately, TADA can be modified to have a non-constant baseline rate. Although any positive function can be specified for $λ_o(t)$, the $NBDA$ package has two functions built-in which will be sufficient in most cases. One corresponds to a *gamma* distribution of latencies under asocial learning (Hoppitt, Kandler, et al., 2010), and the other to a *Weibull* distribution of latencies (a common choice in survival analysis; Moore, 2016). Both offer flexible modelling of λo(t) with a shape parameter that allows for the possibility of systematically increasing, decreasing or constant baseline functions.

![image.png](attachment:image.png)

https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.13307


## Basic formulation

### Previous formulation of time of acquisition diffusion analysis
There are two parameters of interest in the basic time of acquisition diffusion analysis model: the rate of social transmission be-tween individuals per unit of network connection,*s*, and the baseline rate of trait performance in the absence of social transmission, $λ_0$ . Throughout this chapter, we refer to the s parameter as the social transmission parameter, and to $λ_0$  as the baseline parameter. The hazard function for the model is expressed as:

$$
\lambda_i(t) = \lambda_0(t) (1- z_i(t)) R_i(t)
$$

such that:

$$
R_i(t) =   \left[ s \sum_{j = 1}^{N} a_{ij} z_j (t) + 1 \right] 
$$

Where:

- $\lambda_i(t)$ is the rate at which individuals *i* acquire the task solution at time *t*.
  
- $\lambda_0(t)$ is a baseline acquisition function determining the distribution of latencies to acquisition in the absence of social transmission (that is, through asocial learning).
  
- $z_i(t)$ gives the status (1 = informed, 0 = naïve) of individual *i* at time *t*.
  
- $(1- z_i(t))$ and $z_j (t)$ terms ensure that the task solution is only transmitted from informed to uninformed individuals:
$$
z_j(t) =  Y_i \sim \begin{cases} 
0, & \text{if j is naive} \\
1, & \text{if j is informed}
\end{cases}
$$

Previous versions of time of acquisition diffusion analysis allow for an increasing or decreasing baseline rate $\lambda_0 (t)$ (Hoppitt, Kandler,et al. 2010). Here, we restrict ourselves to expanding the version for a constant baseline rate (i.e. $\lambda_0(t) = \lambda_0$) (Hoppitt, Boogert, et al. 2010), although the version for a non-constant baseline rate can be expanded in the same way.

From :  https://besjournals.onlinelibrary.wiley.com/doi/10.1111/1365-2656.13307 supporting information we get the following:

Here, we show how a basic OADA model, containing only a single parameter, s, is fitted to the data by maximum likelihood. Note that this process is carried out automatically by the NBDA package (Hoppitt et al. 2019) when fitting an OADA model, but it is useful for a researcher to understand how the model is fitted. Maximum likelihood works by finding the values of the parameters for which the observed data is most likely. This is done by first deriving a likelihood function that specifies the likelihood of the data for a given set of parameter values. For OADA, the likelihood for a single acquisition event, E, is:

$$L(E) = \frac{\lambda_e (t_E) }{\sum_{l = 1}^{N} \lambda_l (t_E)}$$

Where e is the individual that learns on event E, and $t_E$  is the time immediately prior to event E. In other words,  $L_E$ is the probability that e would be the next individual to learn, which is the rate of learning for e at time  $t_E$, divided by the sum of rates for everyone in the population, $\sum_{l = 1}^{N} \lambda_l (t_E)$. If we define the relative rate of learning to be:
$$
R-i(t) = \frac{\lambda_i (t)}{\lambda_0(t)} = \lambda_i(t) = \lambda_0(t) (1- z_i(t)) \left[ s \sum_{j = 1}^{N} a_{ij} z_j (t) + 1 \right] 
$$

$L(E)$ then reduces to: 

$$L(E) = \frac{\lambda_0 (t) R_e (t_E) }{\lambda_0(t) \sum_{l = 1}^{N} R_e (t_E)} = \frac{R_e (t_E)}{\sum_{l = 1}^{N} R_e (t_E)}$$

Therefore, $\lambda_0 (t)$ drops out of the likelihood function. The likelihood function for the whole diffusion, L, is the product of the likelihoods for all acquisition events. In principle, the value of s could be chosen to directly maximise the likelihood. However, for computational stability, one equivalently takes the negative logarithm of the likelihoods for each event and adds them together, -log(L), then finds the value of s that minimizes -log(L), where:

$$
log(L) = \sum_{E = 1}^{D} log(R_e (t_E)) - \sum_{E = 1}^{D} log\Biggl(\sum_{l = 1}^{N}R_e (t_E)\Biggr) 
$$


This value of s is known as the maximum likelihood estimator for s, and the corresponding value of -log(L) is known as the negative log-likelihood (or -log-likelihood) for the model. When there is more than one parameter in the model, the optimization algorithm finds the combination of parameter values that minimizes -log(L). A review of the likelihood functions for NBDA, including cTADA and dTADA, is found in Hoppitt and Laland (2013).

## Why do we need Bayesian networkbased diffusion analysis?
- The expansion of network-based diffusion analysis to multiple diffusions could also be valuable where researchers have repeated diffusions
across the same group, or groups, of animals (e.g. Boogert et al. 2008), especially when they only have a limited number of animals, allowing them to obtain good statistical power. However, a statistical problem arises if they fail to account for the fact that the same individuals are involved in multiple diffusions. By including an individual random effect on the asocial rate of learning, the model accounts for the fact that the same individuals have the same (or similar) asocial learning ability in each diffusion. However, random effects can be difficult to implement using maximum likelihood methods (used to fit network-based diffusion analysis models thus far), especially when the random effects structure is complex, because one has to integrate the likelihood function across all the possible values the random effects could take. It is easier to include random effects in a Bayesian model, using Markov chain Monte Carlo methods (Gelman et al. 2004).

- To combine NBDA to Bayesian network models that account more accurately for sampling biases.
  



## Bayesian formulation of time of acquisition diffusion analysis
In principle, the formulation of the model can remain the same for a Bayesian approach as for a model fitted by maximum likelihood. However,
here we wish to include random effects, and reparameterize the model in a way that makes it easier to use in a Bayesian context. Thus, we apply a_ Bayesian time of acquisition diffusion analysis_ to the simulated dataset described in ‘Previous formulation of time of acquisition diffusion analysis’ to assess its performance under different circumstances. To illustrate the importance of both random
effects and social transmission, four models were considered based on their inclusion/exclusion. Two of the models (Models 1 and 2) do not include random effects, while Models 3 and 4 do. Likewise two f the models (Models 1 and 3) do not include an *s* parameter, while Models 2 and 4 do. Please see Table 5.1 for details.

The linear predictors are easily adapted to include random effects. For Models 3 and 4 for example, random effects ε at the individual level
were considered such that ε = {ε1, . . . , ε10} and the total number of individuals is ten. The term Ri(t) in Equation 2 is therefore expanded to:


$$
R_i(t) =   \left[ s \sum_{j = 1}^{N} a_{ij} z_j (t) + \exp(\epsilon_k) \right] 
$$

where:
- $k \in {1,...,5}$ and and depends on which task is involved. The rate of trait performance $\lambda_i (t)$ for individual *i*, at time *t* therefore becomes:

$$
\lambda_i(t) = \lambda_0(t) (1-z_i(t))  \left[ s \sum_{j = 1}^{N} a_{ij} z_j (t) + \exp(\epsilon_k) \right] 
$$

To allow us to more easily set a prior distribution reflecting our state of knowledge (see below), Equation 7 is then re-parameterized to obtain :

$$
\lambda_i(t) = (1-z_i(t))  \left[\lambda_0 s \sum_{j = 1}^{N} a_{ij} z_j (t) + \lambda_0\exp(\epsilon_k) \right] 
$$

giving:

$$
\lambda_i(t) = (1-z_i(t))  \left[s' \sum_{j = 1}^{N} a_{ij} z_j (t) + \lambda_0\exp(\epsilon_k) \right]
$$

where:
- $s' = \lambda_0s$

The effect of social interactions on the rate of learning $s'$ and the baseline rate of learning $\lambda_0$ are the two parameters of interest. We refer to the re-parameterized $s′$ as the unscaled social transmission parameter, since it is not scaled such that it is quantified relative to the rate of asocial learning, as $s$ is. The social effect parameter $s'$ may be interpreted as the median effect of the social interactions on the overall hazard rate. For instance, where social transmission underlies the diffusion, $s'$ is a measure of its strength. The baseline rate parameter expresses the hazard
of an individual in an asocial environment (for instance, the rate of asocial learning). This parameter is frequently termed a ‘nuisance’ parameter because the parameter of interest in the NBDA analyses is predominantly $s'$.The full parameter vector θ is defined as $0 = \{ s', \lambda_0, \epsilon, \sigma_{\epsilon}^{2}\} $, where ε refers to random effects at the task level. The variance term $\sigma_{\epsilon}^{2}$ denotes the variance for the distribution of the task level random effects.



(17) (PDF) Quantifying diffusion in social networks: a Bayesian approach. Available from: https://www.researchgate.net/publication/270048687_Quantifying_diffusion_in_social_networks_a_Bayesian_approach [accessed Oct 24 2024].