# Latent Dirichlet Allocation (LDA) Introduction

### Joseph Jinn and Keith VanderLinden

<span style="font-family:Papyrus; font-size:1.25em;">

</p>Latent Dirichlet Allocation is a probabilistic method of text analysis for topic modeling.  This method identifies the topics that exist within a set of documents and maps those documents to their associated topics.  The process uses a bag-of-words feature representation for the documents of interest.  In LDA's, each document is described by a distribution of topics and each topic is described by a distribution of words.  There are two primary components to LDA's.  The observed layers are the documents (also called composites) and the words that comprise those documents (the parts).  The hidden (or latent) layer consists of the topics (also called categories) as well as the various variables utilized by the algorithm.  The output of the algorithm is a list of the topics associated with the entire set of documents and the top words associated with each topic.  These topics are indexed values assigned integer values to which they are later assigned English descriptors to describe those topics.</p>

</span>

### Plate Notation for the LDA Algorithm:

<span style="font-family:Papyrus; font-size:1.25em;">
    
</p>Plate notation is a useful method for displaying variables that repeat via a graphical model. ("Plate notation" 2019)  The LDA algorithm contains variables that are repeated within 3 nested for loops so we find it useful to depict them in this form.  Each rectangle (plate) is a subgraph of grouped variables that indicate that those variables are repeated simultaneously.  The variables are represented as circles while the notation in the lower right hand corner of each rectangle (plate) indicate that it is repeated "M" or "k" times, for example.  In this case, the entire diagram represents the observed/latent variables and parameters utilized in LDA and their interdependency as indicated by the directed edges.  The list below describes the purpose of each variable and notation in the plate diagram. (Ganegedara & Ganegedara "Intuitive Guide to Latent Dirichlet Allocation" 2018)</p><br>

</span>

![lda](images/lda_model.jpeg)

<span style="font-family:Papyrus; font-size:1.25em;">

k — Number of topics a document belongs to (a fixed number).<br>

V — Size of the vocabulary.<br>

M — Number of documents.<br>

N — Number of words in each document.<br>

w — A word in a document. This is represented as one hot encoded vector of size V (i.e. V — vocabulary size).<br>

z — A topic from a set of k topics. A topic is a distribution of words. For example it might be, Animal = (0.3 Cats, 0.4 Dogs, 0 AI, 0.2 Loyal, 0.1 Evil).<br>

α — Distribution related parameter that governs what the distribution of topics is for all the documents in the corpus looks like.<br>

η — Distribution related parameter that governs what the distribution of words in each topic looks like.<br>

θ — Random matrix where θ(i,j) represents the probability of the i th document to containing the j th topic.<br>

β — A random matrix where β(i,j) represents the probability of i th topic containing the j th word.<br>

</span>

<span style="font-family:Papyrus; font-size:1.25em;">

</p>The plate notation above represents the algorithm in a graphical format. α is the parameter for the Dirichlet distribution prior that influences the topic-document distribution described by θ. η is the parameter for the Dirichlet distribution prior that influences the word-topic distribution described by β.  While shown as a constant value below, alpha and eta are actually 1-d vectors with the length determined by the set of topics (k) that we specify.</p><br><br>

</p>The largest plate surrounds all the variables related to a single document in the set of documents (M) that comprise the corpus of interest.  The plate indicates that the variables contained within are repeated M times, once for each document, which also represents a for loop in the pseudocode for the algorithm.</p><br><br>

</p>The smaller plate within the largest plate surround all the variables related to a single word within a single document.  The plate indicates that the variables contained within are repeated N times, once for each word for the N words that comprise each of the M documents.  This smaller plate represents a nested for loop within the outer for loop represented by the largest plate.</p><br><br>

</p>Within the smaller plate, the variable "z" represents a single topic chosen from the topic distribution which represents the distribution of words that belong to that topic.  The variable "w" represents the the actual word itself.</p><br><br>
    
</p>"w" is shaded because it is an observed variable belonging to the observed layer.  All other variables are unshaded as they belong to the latent (hidden) layer that cannot be directly observed.</p><br><br>

</p>The directed edges between each circle representing each variable indicate dependencies between the variables.  The variable at the head of the edges depends on the variable at the tail of the edges.</p><br><br>

</p>The topmost plate surrounds the β word-topic distribution and indicates a for loop where we determine the word-topic distribution for each topic in the set of topics (k).  This is similar to the largest plate surrounding the θ topic-document distribution where there is a for loop that determines the topic-document distribution for each document in the set of documents (M). </p><br><br>

</span>

### Statistical Formula for Calculating "w" in LDA algorithm):

<span style="font-family:Papyrus; font-size:1.25em;">

The formula below summarizes the plate notation for the LDA algorithm as a probabilistic statement.  It encapsulates the idea that at each step of the algorithm all the latent (hidden) variables are updated and the revised values are then used to calculate the probability of the word (w) being assigned a given topic.<br>  

</span>

![mathematical_model](images/lda_equation.png)

<span style="font-family:Papyrus; font-size:1.25em;">

Given a set of M documents each containing N words with each word generated by a topic from a set of k topics, find the joint posterior probability of the theta (θ) distribution, beta distribution (β), and topic (z) given the data (D) using the prior parameters alpha (α) and eta (η). (Ganegedara & Ganegedara "Intuitive Guide to Latent Dirichlet Allocation" 2018)

</span>

<span style="font-family:Papyrus; font-size:1.25em;">

Joint posterior probability: 

The revised or updated probability of an event occurring given new information.<br>
Calculated by updating the prior probability using Bayes' Theorem.<br>
In other words, conditional probability - the probability of event A occurring given that event B has occurred.<br>

Prior probability:

The probability of an event occurring before new information is given.<br>
Calculated using Bayes' Theorem.

</span>

### Dirichlet Distribution Visualization (example):

<span style="font-family:Papyrus; font-size:1.25em;">

Dirichlet distributions are a family of continuous multivariate probability distributions parameterized by a vector (in our case α and η) of positive real numbers. ("Dirichlet distribution" 2019)  They are often used as priors in Bayesian statistics and here they are used as priors for the theta (document-topic) and beta (topic-word) distributions used in the LDA algorithm.  Large values of α and η push the distribution to the center whereas small values of α and η push the distribution to the edges.  Optimally, we desire values that result in a distribution similar to the one displayed in the top middle.<br>

</span>

![dirichlet](images/dirichlet_distribution.png)

<span style="font-family:Papyrus; font-size:1.25em;">

</p>The graphs above visualize Dirichlet Distributions using 3 topics (k = 3).  The values for α (alpha) and η (eta) influence the shape of the graphs.  By shape, we mean the shape of the probability density function that determines the θ and β distributions.  In this example, the graph is 3-d because we have k = 3 topics.  As k increases, the graphs would become k-dimensional Dirichlet Distribution graphs.</p><br>

</span>

### Pseudocode for the LDA algorithm:

<span style="font-family:Papyrus; font-size:1.25em;">
    
Assign topic (z) to each word (w) in each document (d) (randomly or based on some probabilistic distribution)

while(NOT exhausted time constraints)

    for each document (d)
        for each word (w)
            for each topic(z)

                Compute Probability(topic (z) | document (d))
                Compute Probability(word (w) | topic (z))

            Assign a new topic (z') to word (W) in the document (d) (based on the selection using computed probabilities).

</span>

<span style="font-family:Papyrus; font-size:1.25em;">

</p>The algorithm for Latent Dirichlet Allocation iteratively assigns a topic to each word in each document based on the computed conditional probabilities of a topic belonging to a document and a word belonging to a topic.  This is repeated until the allocated computation time is exhausted.</p><br>

</span>

## A Simplified Latent Dirichlet Example:

### Topics for our example:

| Topics (k=2) |
|--------------|
| Topic 1      |
| Topic 2      |

<span style="font-family:Papyrus; font-size:1.25em;">

It should be noted that the topics in an LDA model are actually just indexed (integer) values from 0-Z and not actually described by any sort of noun, verb, etc.  We later assign "food" and "animals" as the descriptors for the two topics as we see that the top N words for each indexed topic are strongly associated with those descriptors.  The # of topics and # of top words for each topic are determined by hyperparameter settings set by the user.<br>

</span>

### Initial topic assignment for each word in each document:

|    Documents (M = 5, N = 3)    |   Word 1   |  Word 2  |  Word 3  |   |
|:------------------------------:|:----------:|:--------:|:--------:|:-:|
| Doc 1 Word Topic Assignment--> |      1     |     2    |     1    |   |
|           Document 1           |     eat    | broccoli |  banana  |   |
| Doc 2 Word Topic Assignment--> |      2     |     1    |     2    |   |
|           Document 2           |   banana   |  spinach |   lunch  |   |
| Doc 3 Word Topic Assignment--> |      1     |     2    |     1    |   |
|           Document 3           | chinchilla |  kitten  |   cute   |   |
| Doc 4 Word Topic Assignment--> |      2     |     1    |     2    |   |
|           Document 4           |   sister   |  kitten  |   today  |   |
| Doc 5 Word Topic Assignment--> |      1     |     2    |     1    |   |
|           Document 5           |   hamster  |    eat   | broccoli |   |

<span style="font-family:Papyrus; font-size:1.25em;">

The above is step 1 in the LDA algorithm pseudocode.  For the purposes of this example, we simply randomly assign a topic to each word for each document rather than use a probabilistic distribution.<br>

M = 5 indicates that we have five documents in total.<br>
N =3 indicates that we have 3 words per document.<br>

</span>

### The list of unique words in our vocabulary (V):

| Words (V = 11) |
|----------------|
| eat            |
| broccoli       |
| banana         |
| spinach        |
| lunch          |
| chinchilla     |
| kitten         |
| cute           |
| sister         |
| today          |
| hamster        |

<span style="font-family:Papyrus; font-size:1.25em;">

The above is all the unique words in our vocabulary across all documents.  These are the words for which we will assign topics to based on our set of topics (k).<br>

</span>

### Computing the β (Beta) Distribution:
<br>
β — A distribution of words, one for each topic.<br>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-kiyi{font-weight:bold;border-color:inherit;text-align:left}
.tg .tg-u0o7{font-weight:bold;text-decoration:underline;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-xldj{border-color:inherit;text-align:left}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-fymr{font-weight:bold;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-xwhs{font-weight:bold;text-decoration:underline;border-color:inherit;text-align:left}
</style>
<table class="tg">
  <tr>
    <th class="tg-xldj"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
  </tr>
  <tr>
    <td class="tg-0pky"></td>
    <td class="tg-u0o7">Words</td>
    <td class="tg-fymr">eat</td>
    <td class="tg-fymr">broccoli</td>
    <td class="tg-fymr">banana</td>
    <td class="tg-fymr">spinach</td>
    <td class="tg-fymr">lunch</td>
    <td class="tg-fymr">chinchilla</td>
    <td class="tg-fymr">kitten</td>
    <td class="tg-fymr">cute</td>
    <td class="tg-fymr">sister</td>
    <td class="tg-fymr">today</td>
    <td class="tg-fymr">hamster</td>
  </tr>
  <tr>
    <td class="tg-xwhs">Topics</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
  </tr>
  <tr>
    <td class="tg-kiyi">1</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
  <tr>
    <td class="tg-fymr">2</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

To compute the Beta distribution, we look at our initial topic assignment for each word in each document.<br>

We count the # of times each word is associated with a particular topic across all documents.<br>

For example, we see here that the word "eat" appears two times in total.  The first time "eat" appears, it is associated with topic 1.  The second time "eat" appears, it is associated with topic 2.<br>

Therefore, we put a 1 in the cell corresponding to Topic 1 and the Word "eat" and we also put a 1 in the cell corresponding to Topic 2 and the Word "eat".<br>

We do this for each word (w) in our vocabulary (V) across all documents (d) based on our initial topic assignment for each word in each document.<br>

Note: "placeholder" simply means that we are not inputting an actual value for the sake of simplicity in this example.

</span>

### Computing the θ (Theta) Distribution:
<br>
θ — A distribution of topics, one for each document.<br>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-kiyi{font-weight:bold;border-color:inherit;text-align:left}
.tg .tg-xldj{border-color:inherit;text-align:left}
.tg .tg-xwhs{font-weight:bold;text-decoration:underline;border-color:inherit;text-align:left}
.tg .tg-fymr{font-weight:bold;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-xldj"></th>
    <th class="tg-xwhs">Documents</th>
    <th class="tg-kiyi">1</th>
    <th class="tg-fymr">2</th>
    <th class="tg-fymr">3</th>
    <th class="tg-fymr">4</th>
    <th class="tg-fymr">5</th>
  </tr>
  <tr>
    <td class="tg-xwhs">Topics</td>
    <td class="tg-xldj"></td>
    <td class="tg-xldj"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
  </tr>
  <tr>
    <td class="tg-kiyi">1</td>
    <td class="tg-xldj"></td>
    <td class="tg-xldj">2</td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
  <tr>
    <td class="tg-fymr">2</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">2</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

To compute the Theta distribution, we look at our initial topic assignment for each word in each document.<br>

We count the # of times each document is associated with each topic in our set of topics.<br>

Since we have three words per document, we see that Document 1 is associated with Topic 1 two times since two words are associated with Topic 1.  We also see that Document 1 is associated with Topic 2 one time since one word is associated with Topic 2.<br>

Therefore, we put a 2 in the cell corresponding to Topic 1 and Document 1 and we also put a 1 in the cell corresponding to Topic 2 and Document 1.<br>

We do this for each topic (z) for each document (d) based on our initial topic assignment for each word in each document.<br>

Note: "placeholder" simply means that we are not inputting an actual value for the sake of simplicity in this example.

</span>

### Updating the initial topic assignment for each word in each document:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-s268{text-align:left}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-s268"></th>
    <th class="tg-s268"></th>
    <th class="tg-s268"></th>
    <th class="tg-0lax"></th>
    <th class="tg-0lax"></th>
    <th class="tg-0lax"></th>
  </tr>
  <tr>
    <td class="tg-0lax"></td>
    <td class="tg-0lax">Document 1</td>
    <td class="tg-0lax">Document 2</td>
    <td class="tg-0lax">Document 3</td>
    <td class="tg-0lax">Document 4</td>
    <td class="tg-0lax">Document 5</td>
  </tr>
  <tr>
    <td class="tg-0lax">Broccoli-Topic 1</td>
    <td class="tg-0lax">1 X 2 = 2</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
  </tr>
  <tr>
    <td class="tg-0lax">Broccoli-Topic 2</td>
    <td class="tg-0lax">1 X 1 = 1</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

In order to update our initial topic assignments for each word in each document, we look at the Beta and Theta distributions we calculated previously.<br>

Notice that "broccoli" is associated with Topic 1 one time and Topic 2 one time in the Beta distribution while Document 1 is associated with Topic 1 two times and Topic 2 one time in the Theta distribution.<br>

Now, to calculate the new topic (z) assignment for the word (w) "broccoli", we do some simple arithmetic operations.<br>

We multiply the value in the cell associated with Topic 1 and "broccoli" in the Beta distribution with the value in the cell associated with Topic 1 and Document 1 in the Theta distribution.  This gives us 1 X 2 = 2.<br>

We then multiply the value in the cell associated with Topic 2 and "broccoli" in the Beta distribution with the value in the cell associated with Topic 2 and Document 1 in the Theta distribution.  This gives us 1 X 1 = 1.<br>

##### Important Note:  This process is repeated for each word in each document BEFORE moving on to the next document.

</span>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-88nc{font-weight:bold;border-color:inherit;text-align:center}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-uys7{border-color:inherit;text-align:center}
.tg .tg-y2k2{font-weight:bold;text-decoration:underline;border-color:inherit;text-align:center}
.tg .tg-7btt{font-weight:bold;border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-9353{font-weight:bold;text-decoration:underline;border-color:inherit;text-align:center;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-uys7"></th>
    <th class="tg-y2k2">Documents (M = 5)</th>
    <th class="tg-88nc">Document 1</th>
    <th class="tg-7btt">Document 2</th>
    <th class="tg-7btt">Document 3</th>
    <th class="tg-7btt">Document 4</th>
    <th class="tg-7btt">Document 5</th>
  </tr>
  <tr>
    <td class="tg-9353">Words (in Vocabulary)</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-7btt">Eat-Topic 1</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
  <tr>
    <td class="tg-7btt">Eat-Topic 2</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
  <tr>
    <td class="tg-7btt">Broccoli-Topic 1</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">1 X 2 = 2 --&gt; 2 / (2 + 1) = 2/3</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
  <tr>
    <td class="tg-7btt">Broccoli-Topic 2</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">1 X 1 = 1 --&gt; 1 / (2 + 1) = 1/3</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
  <tr>
    <td class="tg-7btt">...</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-7btt">...</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-7btt">Hamster-Topic 1</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
  <tr>
    <td class="tg-7btt">Hamster-Topic 2</td>
    <td class="tg-c3ow"></td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
    <td class="tg-c3ow">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

Next, we sum the resulting values for the above multiplications to obtain (1 X 2) + (1 X 1) = 3.<br>

Then, we divide the resulting values for each of the above multiplications by the value of the sum, 3.<br>

Therefore, we have obtained probability values by which we use to select a new topic to assign to the word "broccoli".<br>

In this case, they are a 2/3 = 0.6666667 chance that we assign "broccoli" to Topic 1 in Document 1 and a 1/3 = 0.33333333 chance that we assign "broccoli" to Topic 2 in Document 1.<br>

Notice that we are assigning a new topic to the word "broccoli" in Document 1 according to PROBABILITIES that are calculated using the arithmetic operations above.<br>

We are NOT simply arbitrarily assigning a new topic (z) to the word (w) "broccoli".  Everything is based on the Beta and Theta distributions and the conditional probabilities in the LDA pseudocode described above.<br>

We repeat this for each word (w) in our vocabulary (V) for each document (d) in our set of documents (M).<br>

</span>

### Updated topic assignment for "broccoli" in Document 1:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-c3ow">Documents (M = 5, N = 3)</th>
    <th class="tg-c3ow">Word 1</th>
    <th class="tg-c3ow">Word 2</th>
    <th class="tg-c3ow">Word 3</th>
    <th class="tg-c3ow"></th>
  </tr>
  <tr>
    <td class="tg-c3ow">Doc 1 Word Topic Assignment--&gt;</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow">2 --&gt; 1</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Document 1</td>
    <td class="tg-c3ow">eat</td>
    <td class="tg-c3ow">broccoli</td>
    <td class="tg-c3ow">banana</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Doc 2 Word Topic Assignment--&gt;</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Document 2</td>
    <td class="tg-c3ow">banana</td>
    <td class="tg-c3ow">spinach</td>
    <td class="tg-c3ow">lunch</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Doc 3 Word Topic Assignment--&gt;</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Document 3</td>
    <td class="tg-c3ow">chinchilla</td>
    <td class="tg-c3ow">kitten</td>
    <td class="tg-c3ow">cute</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Doc 4 Word Topic Assignment--&gt;</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Document 4</td>
    <td class="tg-c3ow">sister</td>
    <td class="tg-c3ow">kitten</td>
    <td class="tg-c3ow">today</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Doc 5 Word Topic Assignment--&gt;</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow">2</td>
    <td class="tg-c3ow">1</td>
    <td class="tg-c3ow"></td>
  </tr>
  <tr>
    <td class="tg-c3ow">Document 5</td>
    <td class="tg-c3ow">hamster</td>
    <td class="tg-c3ow">eat</td>
    <td class="tg-c3ow">broccoli</td>
    <td class="tg-c3ow"></td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">
    
Look to the table above for the new topic (z) assigned to the word (w) "broccoli" ASSUMING that in using the probabilities we just calculated we decide on reassigning "broccoli" to Topic 1 in Document 1.<br>

It is important to know that we could also have assigned "broccoli" to Topic 2 instead.  However, based on the calculated probabilities for each topic (z) in our set of topics (k) it is far more likely that a randomized selection will select Topic 1 rather than Topic 2 (since Topic 1 = 2/3 chance and Topic 2 = 1/3 chance).<br>

In an actual implementation of the LDA model, we would do this reassignment for each word (w) in each document (d) based on the probabilities calculated for each word (w) using the Beta and Theta distributions.<br>

However, we are not done yet with just the first iteration of the LDA algorithm.  We still need to update the values for the Beta and Theta distributions for the next iteration of the LDA algorithm.<br>

</span>

### Computing the Updated β (Beta) Distribution:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-xldj{border-color:inherit;text-align:left}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-xldj"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
    <th class="tg-0pky"></th>
  </tr>
  <tr>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">Words</td>
    <td class="tg-0pky">eat</td>
    <td class="tg-0pky">broccoli</td>
    <td class="tg-0pky">banana</td>
    <td class="tg-0pky">spinach</td>
    <td class="tg-0pky">lunch</td>
    <td class="tg-0pky">chinchilla</td>
    <td class="tg-0pky">kitten</td>
    <td class="tg-0pky">cute</td>
    <td class="tg-0pky">sister</td>
    <td class="tg-0pky">today</td>
    <td class="tg-0pky">hamster</td>
  </tr>
  <tr>
    <td class="tg-xldj">Topics</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky"></td>
  </tr>
  <tr>
    <td class="tg-xldj">1</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">1 --&gt; 2</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
  <tr>
    <td class="tg-0pky">2</td>
    <td class="tg-0pky"></td>
    <td class="tg-0pky">1</td>
    <td class="tg-0pky">1 --&gt; 0</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
    <td class="tg-0pky">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

Note that the cell associated with Topic 1 and "broccoli" has changed from 1 --> 2 and that the cell associated with Topic 2 and "broccoli" has changed from 1 --> 0.<br>

Refer to the updated topic assignment for "broccoli" in Document 1 in the table in the previous section.<br>

In that table, notice that the word (w) "broccoli" is now only associated with Topic 1 across all documents (d) and that the word (w) "broccoli" occurs twice across all documents (d).<br>

Therefore, we update the cell associated with Topic 1 and "broccoli" in the Beta distribution to 2 and we also update the cell associated with Topic 2 and "broccoli" in the Beta distribution to 0.<br>

We would do this for all words (w) in our vocabulary (V) for all topics (z) in our set of topics (k).<br>

</span>

### Computing the Updated θ (Theta) Distribution:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-s268{text-align:left}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-s268"></th>
    <th class="tg-s268">Documents</th>
    <th class="tg-s268">1</th>
    <th class="tg-0lax">2</th>
    <th class="tg-0lax">3</th>
    <th class="tg-0lax">4</th>
    <th class="tg-0lax">5</th>
  </tr>
  <tr>
    <td class="tg-s268">Topics</td>
    <td class="tg-s268"></td>
    <td class="tg-s268"></td>
    <td class="tg-0lax"></td>
    <td class="tg-0lax"></td>
    <td class="tg-0lax"></td>
    <td class="tg-0lax"></td>
  </tr>
  <tr>
    <td class="tg-s268">1</td>
    <td class="tg-s268"></td>
    <td class="tg-s268">2 --&gt; 3</td>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
  </tr>
  <tr>
    <td class="tg-0lax">2</td>
    <td class="tg-0lax"></td>
    <td class="tg-0lax">1 --&gt; 0</td>
    <td class="tg-0lax">2</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
    <td class="tg-0lax">placeholder</td>
  </tr>
</table>

<span style="font-family:Papyrus; font-size:1.25em;">

Note that the cell associated with Topic 1 and Document 1 has changed from 2 --> 3 and that the cell associated with Topic 2 and Document 1 has changed from 1 --> 0.<br>

Refer to the updated topic assignment for "broccoli" in Document 1 in the table in the previous section.<br>

In that table, notice that Document 1 contains 3 words (N) that are now all associated with Topic 1.  So, there are no words in Document 1 that are associated with Topic 2.<br>

Therefore, we update the cell associated with Topic 1 and Document 1 in the Theta distribution to 3 and we also update the cell associated with Topic 2 and Document 1 in the Theta distribution to 0.<br>

We would do this for all documents (d) for all topics (z) in our set of topics (k).<br>

</span>

### Completion of the FIRST iteration of the LDA algorithm:

<span style="font-family:Papyrus; font-size:1.25em;">

We would now start at Step 2 and rinse + repeat until we have exhausted our allocated computational time.<br>

</span>

## Resources Used:

<span style="font-family:Papyrus; font-size:1.25em;">

- https://en.wikipedia.org/wiki/Dirichlet_distribution<br>


- https://en.wikipedia.org/wiki/Plate_notation<br>


- https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation<br>


- https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158<br>
    - Utilized two diagrams, formula, and explanation of associated notation on LDA's.<br>


- https://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/<br>
    - Utilized blog's example as the basis for the explanation of the LDA algorithm pseudocode.<br>


- https://www.coursera.org/learn/ml-clustering-and-retrieval<br>
    - Information on collapsed Gibbs sampling and variational inference in relation to LDA's.<br>


- https://www.investopedia.com/terms/p/posterior-probability.asp<br>
    - Explanation of statistical terminology including posterior and prior probability.<br>


- https://cs.calvin.edu/courses/cs/x95/videos/2018-2019/<br>
    - Used Derek Fisher's explanation of why LDA does not work well on Tweets (with Scikit-Learn standard implementation).<br>

</span>