# Table of Contents

- [Proabability and Bayes Rule](#proabability-and-bayes-rule)
- [Naive Bayes for Sentiment Analysis](#naive-bayes-for-sentiment-analysis)
- [Laplacian Smoothing](#laplacian-smoothing)
- [Log Likelihood](#log-likelihood)

## Proabability and Bayes Rule

Proability is the frequency of a certain event. If we have a corpus of tweets we can calculate the positive and nevative probability P(A).

![Logistic Regression](images/probabilities.png)

We can also definite different event P(B), for example the probability that the word contains happy

The proability that an event is both positive and happy is then the intersection between the positive and happy set

![Adjoint Probability](images/adjoint_probability.png)

Now let s condider only the happy words of the corpus. The proability goes much higher for this calculation (75% for the example). You can do the same asy for example for the proability that happy is positive. 

We talk here of conditional probability, or the probability of B given that A happened or given looking at elements of set A, the probaily that is belongs to B as well.
Conditional probabilities help us reduce the sample search space. For example given a specific event already happened, i.e. we know the word is happy:

![Bayes Venn Diagram](images/bayes_venn_diagram.png)

Then you would only search in the blue circle above. The numerator will be the red part and the denominator will be the blue part. This leads us to conclude the following: 
$$
P(\text{Positive} \mid \text{``happy''}) = \frac{P(\text{Positive} \cap \text{``happy''})}{P(\text{``happy''})}
$$

$$
P(\text{``happy''} \mid \text{Positive}) = \frac{P(\text{``happy''} \cap \text{Positive})}{P(\text{Positive})}
$$

And since $P(\text{``happy''} \cap \text{Positive}) = P(\text{Positive} \cap \text{``happy''})$

$$
P(\text{Positive} \mid \text{``happy''}) = P(\text{``happy''} \mid \text{Positive}) \times \frac{P(\text{Positive})}{P(\text{``happy''})}
$$

Given us the basic bayes equation

$$
P(X \mid Y) = \frac{P(Y \mid X) P(X)}{P(Y)}
$$



## Naive Bayes for Sentiment Analysis

Similar to before, you will begin with two corpus. One for the positive tweets and one for the negative tweets. All the different words that appear in your corpus, along with their counts for positive and negative. 

For each word you can calculate the conditional probability of each word within the positive and negative class. You can find "power words", or words that have statistically different probability is one or the other category.
<center>
<img src="images/word_conditional_probability.png" width="250" alt="Conditional Word Probability"/>
</center>

Sometimes $P(w_i | \text{class})$ will yield a value of zero, which will make comparisons not possible. You can solve this problem by applying some smoothing process. This expression is called the Naive Bayes inference condition rule for binary classification. This is calculated as $$\prod_{i=1}^m \frac{P(w_i|pos)}{P(w_i|neg)}$$

Using this calculation you will get values >1 for positive sentiment analysis and <1 for negative sentiment analysis



## Laplacian Smoothing

Sometimes you might end up having words that never show up in your corpus. You get a probability of zero, and the probability of an entire sequence might go to zero. You can use a technique you can use to avoid your probabilities being zero.

Instead of calculating $$P(w_i|\text{class}) = \frac{\text{freq}(w_i, \text{class})}{N_\text{class}} \quad \text{class} \in \{\text{Positive}, \text{Negative}\}$$

You can add a one in the numerator and add at the denominator all of the unique words in your entire vocabulary

$$P(w_i|\text{class}) = \frac{\text{freq}(w_i, \text{class}) + 1}{N_\text{class} + V_\text{vocabulary}} \\[1em]

N_\text{class} = \text{frequency of all words in class} \\[1em]

V_\text{vocabulary} = \text{number of unique words in vocabulary}$$

With this the sum of probabilities should still sum to 1

So for a simple example we will have

<table>
    <tr>
        <th>word</th><th>Pos</th><th>Neg</th>
        <th style="border: none; width: 20px;"></th>
        <th>word</th><th>Pos</th><th>Neg</th>
    </tr>
    <tr>
        <td>I</td><td>3</td><td>3</td>
        <td style="border: none;"></td>
        <td>I</td><td>0.19</td><td>0.20</td>
    </tr>
    <tr>
        <td>am</td><td>3</td><td>3</td>
        <td style="border: none;"></td>
        <td>am</td><td>0.19</td><td>0.20</td>
    </tr>
    <tr>
        <td>happy</td><td>2</td><td>1</td>
        <td style="border: none;"></td>
        <td>happy</td><td>0.14</td><td>0.10</td>
    </tr>
    <tr>
        <td>because</td><td>1</td><td>0</td>
        <td style="border: none;"></td>
        <td>because</td><td>0.10</td><td>0.05</td>
    </tr>
    <tr>
        <td>learning</td><td>1</td><td>1</td>
        <td style="border: none;"></td>
        <td>learning</td><td>0.10</td><td>0.10</td>
    </tr>
    <tr>
        <td>NLP</td><td>1</td><td>1</td>
        <td style="border: none;"></td>
        <td>NLP</td><td>0.10</td><td>0.10</td>
    </tr>
    <tr>
        <td>sad</td><td>1</td><td>2</td>
        <td style="border: none;"></td>
        <td>sad</td><td>0.10</td><td>0.15</td>
    </tr>
    <tr>
        <td>not</td><td>1</td><td>2</td>
        <td style="border: none;"></td>
        <td>not</td><td>0.10</td><td>0.15</td>
    </tr>
    <tr>
        <td><b>Nclass</b></td><td><b>13</b></td><td><b>13</b></td>
        <td style="border: none;"></td>
        <td><b>Sum</b></td><td><b>1</b></td><td><b>1</b></td>
    </tr>
</table>

<div align="left">
    <strong>Laplacian Smoothing</strong><br>
    V = 8
</div>
