# Foundations of Natural Language Processing

Primary problems that NLP aims to solve is the set of methods for making human language accessible to computers, the analysis of a text, generation of context-appropriate text, and in the modern context, accomplishing these tasks by processing fast sums of data and statistical methods. Natural Languages have loose rules of syntax that are ambiguous and not well-defined. Computer Languages are non-natural languages that have well defined syntax and rules of constructions. We first review some prob/stats concepts that are useful for natural language modeling.

## Hidden Variables

Recall that for conditional probabilities $P(X|Y) = P(X,Y) / P(X)$. A conditional probability query doesn't always reference all variables in the full distribution. Suppose there are random variables $X, Y, H$. Then the conditional probability formula changes. 

$$P(Y = y|E = e) = \alpha \sum_h P(Y = y, E = e, H = h)$$ 

Where $\alpha$ is the normalization constant that ensures the probability will sum to zero, and is equal to total probability $P(E = e)$. 

$$\alpha = \frac{1}{\sum_y P(Y = y, E = e)} = \frac{1}{\sum_y \sum_h P(Y = y, E = e, H = h)}$$

For example, suppose I want to figure out $P(Y=1 | E = positive)$, but there's some confounding variable $H$. 

1. Calculate the numerator $\sum_h P(Y = 1, E = positive, H = h)$; we are interested in all worlds where $Y = 1, E = positive$. 
2. Now to normalize, we need to divide by all cases where $E = positive$, and we do that by dividing by $\sum_y \sum_h P(Y = y, E = positive, H = h)$

If we have multiple hidden variables $H_1, H_2, ... H_n$, we simply do the following. 

$$P(Y = y | E = e) = \alpha \sum_{x_1 \in H_1} \sum_{x_2 \in H_2} \dots \sum_{x_n \in H_n} P(Y = y, E = e, H_1 = x_1, \dots, H_n = x_n)$$
$$\alpha = \frac{1}{P(E = e)} = \frac{1}{\sum_y \sum_{x_1 \in H_1} \sum_{x_2 \in H_2} \dots \sum_{x_n \in H_n} P(Y = y, E = e, H_1 = x_1, \dots, H_n = x_n)}$$