# Biterm Topic Modeling Algorithm Introduction

### Joseph Jinn and Keith VanderLinden

This Jupyter Notebook file provides a very simple high-level overview of the Biterm topic modeling algorithm.  We briefly discuss the plate notation diagram, pseudocode, and statistical formula for the model.

### Biterm Model:

Note: This description (text) obtained from (https://github.com/bnosac/BTM)

The Biterm Topic Model (BTM) is a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns (e.g., biterms)

* A biterm consists of two words co-occurring in the same context, for example, in the same short text window.
* BTM models the biterm occurrences in a corpus (unlike LDA models which model the word occurrences in a document).
* It's a generative model. In the generation procedure, a biterm is generated by drawing two words independently from a same topic $z$. In other words, the distribution of a biterm $b=(wi,wj)$ is defined as: $P(b) = \sum_{k}{P(wi|z)*P(wj|z)*P(z)}$ where $k$ is the number of topics you want to extract.
* Estimation of the topic model is done with the Gibbs sampling algorithm. Where estimates are provided for $P(w|k)=phi$ and $P(z)=theta$.

![biterm](../images/biterm_visualization.png)

### Plate Notation for the Biterm Algorithm:

![biterm](../images/biterm_model.png)

Explanation of the notation for Figure 1.c:

$\theta$ - topic distribution for the entire corpus.

$\phi$ - topic-specific word distribution.

$B$ - the entire set of biterms for the corpus.

$K$ - the entire set of topics for the corpus.

$z$ - a single topic.

$w_{i}$ - a single word.

$w_{j}$ - a single word.

### Statistical Formula for Calculating the Biterm algorithm:

1. For each topic z
    - (a) draw a topic-speciﬁc word distribution φz ∼ Dir(β)
    
    
2. Draw a topic distribution θ ∼ Dir(α) for the whole collection


3. For each biterm b in the biterm set B
    - (a)	draw a topic assignment z ∼ Multi(θ)
    - (b)	draw two words: wi,wj ∼ Mulit(φz )

![biterm formula](../images/biterm_equation.png)

$\theta$ - topic distribution for the entire corpus.

$\phi$ - topic-specific word distribution.

$b = (w_{i}, w_{j})$ - a single biterm (word co-occurence pair).

$B$ - the entire set of biterms for the corpus.

$z$ - a single topic.

$w_{i}$ - a single word.

$w_{j}$ - a single word.

### Pseudocode for the Gibbs Sampler for the Biterm algorithm:

![biterm pseudocode](../images/biterm_pseudocode.png)

Utilizes collapsed Gibbs sampling for approximate (not exact) inferencing to compute $\phi$ and $\theta$.

![biterm formula 2](../images/biterm_equation_2.png)

Explanation of the notation present in the pseudocode and equation 5 + 6 above:

$\alpha$ - a Dirichlet distribution prior.

$\beta$ - a Dirichlet distribution prior.

$\theta$ - topic distribution for the entire corpus.

$\phi$ - topic-specific word distribution.

$b = (w_{i}, w_{j})$ - a single biterm (word co-occurence pair).

$B$ - the entire set of biterms for the corpus.

$z$ - a single topic.

$w_{i}$ - a single word.

$w_{j}$ - a single word.

$z_{b}$ - the topic assignment for biterm $b$?

$z_{-b}$ - the topic assignments for all biterms except $b$.

$n_{z}$ - the # of times biterm $b$ is assigned to topic $z$.

$n_{w|z}$ - the # of times word $w$ is assigned to topic $z$.

$|B|$ - the total # of biterms.

## A Simplified Biterm Topic Modeling Algorithm Example:

Placeholder.

**TODO - implement simple hand-worked example of one iteration through the algorithm (provided we can find an example)** 

### Completion of the FIRST iteration of the Biterm algorithm:

Rinse and repeat.

## Resources Referenced:

- https://en.wikipedia.org/wiki/Greek_alphabet
    - for Greek alphabet name reference.
    

- https://sutheeblog.wordpress.com/2017/03/20/a-biterm-topic-model-for-short-texts/
    - blog explaining the biterm topic model in a more palatable way.


- https://www.cs.toronto.edu/~jstolee/projects/topic.pdf
    - contains a short section explaining the algorithm; includes plate notation diagram.
    

- https://stackoverflow.com/questions/29786985/whats-the-disadvantage-of-lda-for-short-texts
    - contains a response by the author of the biterm model.
 
 
- https://github.com/bnosac/BTM
    - contains a nice brief description of what a Biterm is and Biterm Model is.
    

- https://www.slideserve.com/baeddan-williams/a-biterm-topic-model-for-short-texts
    - presentation slides containing a nice visual representation of the BTM.
    