# Appendix

- Perpay is hiring
- Link to slides
- Resources

## As Seen on Machine Learning

::: {.center style="font-size: .8em;"}
<table>
    <tr>
        <td>
            Constraint
        </td>
        <td></td>
        <td>
            Transformation
        </td>
        <td></td>
    </tr>
    <tr>
        <td>
            $f(\theta)\in [0, 1]$
        </td>
        <td>
            $\rightarrow$
        </td>
        <td>
            $\frac{1}{1+e^{-x}}$
        </td>
        <td>
            sigmoid
        </td>
    </tr>
    <tr>
        <td>
            $f(\theta)\in R^{+}$
        </td>
        <td>
            $\rightarrow$
        </td>
        <td>
            $ln(1+e^{x})$
        </td>
        <td>
            softplus
        </td>
    </tr>
    <tr>
        <td>
            $f(\theta)\in [0, 1]^{n}, \langle f(\theta),\textbf{1}\rangle=1$
        </td>
        <td>
            $\rightarrow$
        </td>
        <td>
            $\frac{e^{x_{i}}}{\sum_{j}^{K}{e^{x_{j}}}}$
        </td>
        <td>
            softmax
        </td>
    </tr>
</table>
:::

## As seen on machine learning transformations

$$\sigma(x)=\frac{1}{1+e^{-x}}$$

$$s\left(\overset{\rightarrow}{z}\right)_{i}=\frac{e^{z_{i}}}{\sum_{j}^{K}{e^{z_{j}}}}$$

$$f(x)=\text{ln}(1+e^{x})$$

$$f(\theta)=\sum_{k=1}^{\infty}{\beta_{k}\cdot\delta_{\theta_{k}}(\theta)}$$

$$\underset{w}{\text{argmin}}\  \mathcal{L}(f(w),y) + \lambda \lVert w\rVert^{2}_{2}$$

# Lifetimes

## CLV

Imagine a customer with $x$ transactions during the period $(0,T]$

- Can we estimate the customer's future lifetime value?
- How many more transactions will they generate before they churn?

![](presentation-assets/clv.png)

## CLV: Beta-Geo Model

![](presentation-assets/beta-geo.png)

## Simple Example
$$
\begin{align*}
\underset{\theta}{\text{argmin}}&\ f({\theta, X}) &\text{Objective Function}\\
\text{s.t.}& &\text{Subject to}\\
&\ \theta_{i}\in{\mathbb{Z}^{+}} &\text{Assign entire resources}\\
&\ \theta_{i}>0 &\text{Non-negativity}\\
&\ \sum_{i=1}^{n}{\theta_{i}} < B &\text{Budget} \\
\end{align*}
$$

## Tips
### Absolute Value

$$
\begin{align}
\underset{A\in{\{0,1\}}^{n,m}}{\text{argmin}}\ \sum_{i=1}^{n}{U_{i}}&&\\
&g_{t}-\sum_{j=1}^{m}{A_{i,j}M_{j}}\leq U_{i} & i=1\ldots n \\
&g_{t}-\sum_{j=1}^{m}{A_{i,j}M_{j}}\geq -U_{i} & i=1\ldots n \\
\end{align}
$$

## Objective function: likelihood

::: {.incremental}
- ~Character frequency~
- ~Word frequency~
- Conditional probability
:::

## Model fitting

```python
from tqdm.notebook import tqdm

def doubly_stochastic_constraint(π):
    return sum(
        (1 - π.sum(axis=axis)).abs().sum()
        for axis in [0, 1]
    )

last_loss = np.inf
with tqdm(range(10_000)) as pbar:
    for _ in pbar:
        π = make_π(σ)
        loss = (
            -score(π, X, V, mask).mean()
            + doubly_stochastic_constraint(π)
        )
        loss.retain_grad()
        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        loss = loss.item()
        pbar.set_postfix(loss=f'{loss:>7f}')
        if abs(last_loss - loss) < 1e-8:
            break
        last_loss = loss
```