# Adjacent fields of NLP

<img src="./adjacent_fields.png" width="800">

# Level of language structure
- From linguistic perspective, there are different levels of structure, going from super-facial to conceptural

## Morphology
- Understanding the internal **structure of words**
- The study of **morphemes** and the way they are sequenced in a word
- Morpheme: smallest meaningful unit of language
- Morphemes include: 
    - stems 
    - prefixes 
    - suffixes 
    - infixes
- Examples of analysing words into morphemes:
    - word: 
        - one morpheme
        - words: word + s, two morphemes
    - mistreating: 
        - mis + treat + ing, three morphemes
    - bet: 
        - one morpheme
        - betting: bet + ting, two morphemes
        - better: one morpheme
    - care: 
        - one morpheme
        - scare: one morpheme
- Applications:
    - Morphological segmentation: dividing words into individual units called morphemes
    - Stemming: cutting off prefix or suffix and reduce to the root form (e.g., studies -> studi)
    - Lemmatization: reducing the various inflected forms of a word into the root form (e.g., studies -> study).
    

## Syntax
- The **structure of sentence**, how the sentence is built up
- The arragement of words and phrases in a sentence such that they make grammatical sense

- Applications: 
    - Part-of-speech tagging
    - Entity extraction
    - Syntactic parsing (CFG): undertaking grammatical analysis for the provided sentence
    - Syntactic parsing (dependencies)
    - Named entity recognition (NER): 
        - determining the parts of a text that can be identified and categorized into preset groups.
        - Examples of such groups include names of people and names of places.

- Examples:
    - Kim loves Marry; Mary loves Kim; 
        - exactly same words, different meaning; subject, object;
    - Kim Marry loves; 
        - Not reasonable;
        - Every sentence is a sequence of words, but not every sequence of words is a reasonable sentence.

    - He is *dancing* with Rose. (verb)
    - *Dancing* is a great exercise. (noun)
    - Don't forget to pack your *dancing* shoes. (adj)

<br>
- List of common Part-of-Speech tags

<img src="./pos_tag.png" width="400" align="left">

## Semantics
- The study of words and sentence **meaning**
- Understand the meaning and interpretation of words and how sentences are structured
- At different levels:
    - word / lexical
    - sentence / sequence 
    - text / document
- Applications:
    - Word embedding / encoding
    - Word sense disambiguition
    - Semantic role labeling
    - Natural language generation: 
        - using databases to derive semantic intentions and convert them into human language
    
- Semantic analysis is one of the difficult aspects of NLP that has not been fully resolved yet


## Pragmatics
- How language is used to achieve specific intentions
- The intention of the speaker during conversation (e.g., chat)
- Conversational implicatures:
    - how I interpret what you say based on the context

- Applications:
    - Dialogue systems
    - Speech act labeling
    - Discourse structure parsing
    

# Why is NLP hard?
- Hidden structure of language is **ambiguous** and complex **at all levels**

- low-level rules:
    - single -> plural: 
         - girl -> girls
         - class -> classes 
         - candy - candies 
         
    - abbreviation: 
        - TBA -> To Be Announced


- high-level rules:
     - jokes 
     - sarcasm
     - Ironic: 
         - if I say Lovely day to you whilst we are both being soaked by heavy rain, you will use knowledge that people don’t usually like rain to infer that I am being ironic <br>

Consider the proverb: **"Time flies like an arrow"**

## Word sense ambiguity
- Time:
    - abstract time
    - a specific point in time
    - to measure time
- flies:
    - moves through the air
    - little pesky insects
- like:
    - similar to
    - have affect for
- arrow:
    - pointy stick shot from a bow
    - to move straight towards a target
    
- Meet me at the bank.
    - bank: the organization that provides finalcial services
    - bank: the side of a river

## Part-of-speech ambiguity

<img src="./pos_amb.png" width="400" align="left">

## Syntactic ambiguity 
- structure of a sentence
- obvious to you but not to computers

<img src="./syn_amb.png" width="800">

<img src="./syn_amb_4.png" width="500" align="left">

<img src="./syn_amb_5.png" width="300" align="left">

## A changing target
- New words and phrases, changing at different rate
    - googling, blogger, wi-fi
- Sentence structure
    - subtle rate

## Much we can do
- Lot of data
- Increased computational ability
- Complex algorithms
- ...

## Much we still can't do
- Limited data contexts and low resource languages
- Manual labeling based on human judgements
- Integrate information across modalities
    - text, image, sound, action sequence, video
    - multimodel learning like a kid 
- Transfer learning across tasks and domains
    - getting better but still not enough <br>
- Open-ended problems
- ...

# Mathmatical review

## Probability
- $P(A)$: the fraction of possible worlds (given what I know) in which A is true 

- $0 \leq P(A) \leq 1$
- $P(true) = 1$
- $P(false) = 0$
- $P(A or B) = P(A) + P(B) - P(A & B)$ 

<img src="./union.png" width="300">

- Boolean variable
    > $P(\sim A) = P(not A) = 1 - P(A)$  <br>
    
\begin{equation} P(A) = P(A \& B) + P(A \& \sim B)  \end{equation} <br>

- Multivalued random variables
    - Variables that can take more than two values in some set $\{v_1, v_2, ..., v_k\}$
        - e.g., POS of a word : $\{noun, verb, adjective, adverb\}$ <br>
    > $P(A=v_i \& A=v_j) = 0$ if $i \neq j$ <br>
    > $P(A=v_1 or A=v_2 or \quad ... \quad or A=v_k) = 1$ <br>
    

- disjunction
> $P(A=v_1 \lor A=v_2 \lor \quad...\quad \lor A=v_i) = \sum_{k=1}^{i} P(A=v_k)$    

- conjunction, sum up the joint probability of each independet value A could take
> $P(B) = P(B \land [ A=v_1 \lor \quad...\quad \lor A=v_i]) = \sum_{k=1}^{i} P(B \land A=v_k)$

### Conditional probability
- $P(A|B)$: probability of A given B, the fraction of possible worlds with B true that also have A true <br>
> P(Headache) = 0.1 <br>
> P(Flu) = 0.02 <br>
> P(Headache|Flu) = 0.5 <br>
> Headache is rare, flu is much rare, but if you got flu, you have 50% chance of having a headache <br>

<img src="./union.png" width="300" align="left">

- definition:
> $P(A|B) = \frac{P(A \land B)}{P(B)}$ <br>
- Chain rule:
> $P(A \land B) = P(A|B)P(B)$ <br>
> $P(B \land A) = P(B|A)P(A)$ <br>
> $P(A \land B) = P(B \land A) = P(A|B)P(B) =  P(B|A)P(A)$ <br>
> $P(A_1 \land A_2 \land ... \land A_n) = P(A_1)P(A_2|A_1)P(A_3|A_1A_2)...P(A_n|A_1A_2...A_{n-1})$ <br>

- Language model (language generation):
> $P(I \quad have \quad a \quad dog) = P(I)P(have|I)P(a | I \quad have)P(dog | I \quad have \quad a)$

### Atomic events
- A complete specification of the state of the world about which the agent is uncertain
- E.g, If the world consists of two boolean variables: A and B, then there are four distinct atomic events:
> A = true & B = true <br>
> A = true & B = false <br>
> A = false & B = true <br>
> A = false & B = false <br>
- Atomic events are mutually **exclusive** and exhaustive

### Prior probability
- The belief prior to arrival of any (new) evidence
- Unconditional probability
> prior probability: $P(A=true)=0.1$ <br>
> conditional probability: $P(A=true| B=true)=0.1$ <br>

- Probability distribution
> Values for all possible assignments <br>
> Boolean: $P(A) = <true: 0.1; false: 0.9>$ <br>
> Multivalued: $P(Weather) = <sunny: 0.72; rainy: 0.1; cloudy:0.08 ; snow:0.1>$

- Joint probability distribution
> the probability of every atomic event on the set of random variables <br>
> $P(Weather, Cavity)$ is a matrix of $4 \times 2$ <br>
<img src="./4_2_matrix.png" width="500" align="left">


## Inference
- Given some information about the probability distribution, determine the probability of some proposition
> $P(\sim Study \quad \& \quad (GoodGrade \quad or \quad GoodJob)) $
### Inference by enumeration


- Joint probability distribution
    - $P(toothache)$
    - $P(toothache \quad or \quad cavity)$
    - $P(\sim cavity | toothache)$

> <img src="./prob_table.png" width="300" align="left">

- Sum the atomic events where the proposition is true
> $P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2$ <br>
> <br>
> $P(toothache \quad or \quad cavity) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 + 0.008 = 0.28$ <br>
> <br>
> $P(\sim cavity | toothache)$ = $\frac{P(\sim cavity \& toothache)}{P(toothache)}$ = $\frac{0.016 + 0.064}{0.108+0.012+0.016+0.064}$ = 0.4 <br>
> <br>

> <img src="./P_toothache_catch.png" width="600" align="left">

- Chain rule:
> $P(Toothache, Catch, Cavity) $ <br>
> $ = P(Toothache|Catch, Cavity)P(Catch, Cavity) $ <br> 
> $ = P(Toothache|Catch, Cavity)P(Catch| Cavity)P(Cavity) $ <br>

## Independence

- Two boolean random variables A and B are independent if and only if:
> $P(A|B) = P(A)$ , the probability of event A happens is not affect by knowing B <br>
> $P(B|A) = P(B)$, the probability of event B happens is not affect by knowing A <br>

- Independent facts about boolean variables
> $P(A \& B) = P(A|B)P(B)=P(A)P(B)$ <br>
> <br>
> $P(\sim A| B) = 1-P(A|B) = 1-P(A) = P(\sim A)$ <br>
> <br>
> $P(A| \sim B) = P(A \& \sim B) / P(\sim B) = P(\sim B | A)P(A) / P(\sim B) = P(\sim B)P(A)/P(\sim B) = P(A)$ <br>

- Multivalued independence
- For multivalued random variables A and B, A is independent of B if and only if:
> $\forall u,v: P(A=u | B=v) = P(A=u)$ <br>
> <br>
> $\forall u,v: P(B=v | A=u) = P(B=v)$ <br>
> <br>
> $\forall u,v: P(A=u \land B=v) = P(A=u)P(B=v)$ <br>

- Make **independence assumptions** on random variables (based on our domain knowledge):
> <img src="./independence.png" width="500" align="left"> <br>
> <br><br><br><br><br><br>
> P(Toothache, Catch, Cavity, Weather) = P(Toothach, Catch, Cavity)P(Weather) <br>

- $2*2*2*4$ (=32) entries reduced to $2*2*2+4$ (=12)

## Conditional independence
- Absolute independence is powerful but rare
- Conditional independence: weaker form of independence
> For **boolean** random variables, A is conditionally independent of B given C iff: <br>
> <br>
> $P(A|B,C) = P(A|C)$ <br>
> <br>
> $P(A| \sim B,C) = P(A|C)$ <br>
> <br>
>
> For **multivalued** random variables, A is conditionally independent of B given C iff: <br>
> <br>
> $\forall u,v,w: P(A=u| B=v \land C=w) = P(A=u|C=w)$

## Baye's Theorem
- Bayes' rule:
> $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
- Assessing diagnostic probability from causal probability:
> $P(Cause|Effect) = \frac{P(Effect|Cause)P(Cause)}{P(Effect)}$ <br>
> <br>
> $P(Cold|Flu) = \frac{P(Flu|Cold)P(Cold)}{P(Flu)}$ <br>

> Bayes, Thomas (1783) An essay towards solving a problem in the doctrine of chances Philosophical transactions of the Royal Society of London, 53:370-418

### Bayes' rule and gambling
- Suppose there are two sealed envelopes: 
> one with 2 red beads, 2 black beads, and $1; <br> 
> the other with 1 red bead, 2 black beads, and no money. <br>

- I draw an envelope at random, and offer to sell it to you. How much should you be willing to pay? <br>
> <img src="./bayes.png" width="300" align="left">

- Now, you are allowed to see one (randomly drawn) bead from the selected envelope:
> If it is black, how much should you be willing to pay?
> <br>
> If it is red, how much should you be willing to pay? <br>
> <br>
> $P(Win|Black) = \frac{P(Win,Black)}{P(Black)}= \frac{P(Black|Win)P(Win)}{P(Black)} = \frac{P(Black|Win)P(Win)}{P(Black|Lose)*P(Lose)} = \frac{(1/2*1/2)}{1/2*1/2 + 2/3*1/2} = 3/7$