# Exercise 2 - Probabilistic CKY Parser
(14 points)

Finish the `ProbabilisticCKYParser` class given below (you can reuse the code from the first exercise for it). It takes a grammar (the grammar does not need to be checked for its structure as all rules already fit to the algorithm described in the slides), and a sentence for which it generates a parsing tree serialized in the bracketed notation.

The two methods are as follows:
- `Constructor`
  - takes a list of non-terminal to non-terminal rules in the CNF  as a map (the key is the left side of the rule, the value are the possible right sides) with probabilities assigned to each rule.
  - takes a list of probabilistical lexical rules (i.e., non-terminals pointing to a list of terminals they could be replaced with) as a map (the key is the non-terminal on the left side while the value are the possible tokens on the right side and their probability)
  - for the rules, the simple `RightSide` structure is used. It contains the right side of a rule (in this case either a single terminal or two non-terminals) and the probability of the rule.
- `getParseTree`
  - takes a sentence as a single string
    - the tokens in the sentence are lowercased
    - the tokens are separated by whitespaces
    - the sentence does not contain any punctuation
  - returns the parsing tree as a string serialized in the bracketed notation
    - in case the parsing does not lead to a tree (i.e., the given sentence is not part of the given grammar) an empty String `""` should be returned.
    - in case a terminal symbol in the sentence is unknown, `null` should be returned

#### Example

Imagine a very simple grammar with the non-terminals $A, B, C, S$ and the terminals $a, b$. $S$ is the start symbol and we have the following rules:

<table>
    <tr><td><p align="left">$S$</p></td><td>$\rightarrow$</td><td><p align="left">$A\;\; B$</p></td><td>$0.4$</td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$A\;\; C$</p></td><td>$0.6$</td></tr>
    <tr><td><p align="left">$A$</p></td><td>$\rightarrow$</td><td><p align="left">$a$</p></td><td>$1.0$</td></tr>
    <tr><td><p align="left">$B$</p></td><td>$\rightarrow$</td><td><p align="left">$b$</p></td><td>$1.0$</td></tr>
    <tr><td><p align="left">$C$</p></td><td>$\rightarrow$</td><td><p align="left">$b$</p></td><td>$0.6$</td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$c$</p></td><td>$0.4$</td></tr>
</table>

For the sentence `"a b"`, the `getParseTree` method should return the following tree in the bracketed notation:
```
[S[A[a]B[b]]]
```
Please note that the alternative tree `[S[A[a]C[b]]]` is not a correct result since it has a lower probability
$P([S[A[a]B[b]]]) = P(S \rightarrow A\;\; B) \times P(A \rightarrow a) \times P(B \rightarrow b) = 0.4 \times 1.0 \times 1.0 = 0.4$

$P([S[A[a]C[b]]]) = P(S \rightarrow A\;\; C) \times P(A \rightarrow a) \times P(C \rightarrow b) = 0.6 \times 1.0 \times 0.6 = 0.36$

#### Grammar

An extended version of the flight example grammar will be used in this exercise. All rules can be found in the following two tables:

<table>
    <tr>
        <th>Left side non-terminal</th>
        <th></th>
        <th><p align="left">Right side non-terminals</p></th>
        <th>Probability</th>
    </tr>
    <tr><td><p align="left">$S$</p></td><td>$\rightarrow$</td><td><p align="left">$NP\;\; VP$</p></td><td><p align="left">$0.8$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$X1\;\; VP$</p></td><td><p align="left">$0.15$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Verb\;\; NP$</p></td><td><p align="left">$0.01$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$X2\;\; PP$</p></td><td><p align="left">$0.005$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Verb\;\; PP$</p></td><td><p align="left">$0.0075$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$VP\;\; PP$</p></td><td><p align="left">$0.0075$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$VP\;\; X3$</p></td><td><p align="left">$0.0025$</p></td></tr>
    <tr><td><p align="left">$Nominal$</p></td><td>$\rightarrow$</td><td><p align="left">$Nominal\;\; Noun$</p></td><td><p align="left">$0.2$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Nominal\;\; PP$</p></td><td><p align="left">$0.05$</p></td></tr>
    <tr><td><p align="left">$NP$</p></td><td>$\rightarrow$</td><td><p align="left">$Det\;\; Nominal$</p></td><td><p align="left">$0.2$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Nominal\;\; Noun$</p></td><td><p align="left">$0.03$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Nominal\;\; PP$</p></td><td><p align="left">$0.0075$</p></td></tr>
    <tr><td><p align="left">$PP$</p></td><td>$\rightarrow$</td><td><p align="left">$Preposition\;\; NP$</p></td><td><p align="left">$1.0$</p></td></tr>
    <tr><td><p align="left">$VP$</p></td><td>$\rightarrow$</td><td><p align="left">$Verb\;\; NP$</p></td><td><p align="left">$0.2$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$X2\;\; PP$</p></td><td><p align="left">$0.1$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$Verb\;\; PP$</p></td><td><p align="left">$0.15$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$X3 \;\; PP$</p></td><td><p align="left">$0.05$</p></td></tr>
    <tr><td><p align="left">$X1$</p></td><td>$\rightarrow$</td><td><p align="left">$Aux\;\; NP$</p></td><td><p align="left">$1.0$</p></td></tr>
    <tr><td><p align="left">$X2$</p></td><td>$\rightarrow$</td><td><p align="left">$Verb\;\; NP$</p></td><td><p align="left">$1.0$</p></td></tr>
    <tr><td><p align="left">$X3$</p></td><td>$\rightarrow$</td><td><p align="left">$NP\;\; NP$</p></td><td><p align="left">$1.0$</p></td></tr>
</table>

<table>
    <tr>
        <th><p align="left">Non-terminal</p></th>
        <th></th>
        <th><p align="left">Terminal</p></th>
        <th><p align="left">Probability</p></th>
    </tr>
    <tr><td><p align="left">$S$</p></td><td>$\rightarrow$</td><td><p align="left">$book$</p></td><td><p align="left">$0.00525$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$include$</p></td><td><p align="left">$0.00525$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$prefer$</p></td><td><p align="left">$0.007$</p></td></tr>
    <tr><td><p align="left">$Aux$</p></td><td>$\rightarrow$</td><td><p align="left">$can$</p></td><td><p align="left">$0.4$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$does$</p></td><td><p align="left">$0.6$</p></td></tr>
    <tr><td><p align="left">$Det$</p></td><td>$\rightarrow$</td><td><p align="left">$a$</p></td><td><p align="left">$0.6$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$that$</p></td><td><p align="left">$0.1$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$the$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td><p align="left">$Nominal$</p></td><td>$\rightarrow$</td><td><p align="left">$boat$</p></td><td><p align="left">$0.02625$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$book$</p></td><td><p align="left">$0.075$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$dinner$</p></td><td><p align="left">$0.075$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flight$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flights$</p></td><td><p align="left">$0.225$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$meal$</p></td><td><p align="left">$0.01125$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$money$</p></td><td><p align="left">$0.0375$</p></td></tr>
    <tr><td><p align="left">$Noun$</p></td><td>$\rightarrow$</td><td><p align="left">$boat$</p></td><td><p align="left">$0.035$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$book$</p></td><td><p align="left">$0.1$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$dinner$</p></td><td><p align="left">$0.1$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flight$</p></td><td><p align="left">$0.4$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flights$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$meal$</p></td><td><p align="left">$0.015$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$money$</p></td><td><p align="left">$0.05$</p></td></tr>
    <tr><td><p align="left">$NP$</p></td><td>$\rightarrow$</td><td><p align="left">$boat$</p></td><td><p align="left">$0.0039375$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$book$</p></td><td><p align="left">$0.01125$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$dinner$</p></td><td><p align="left">$0.01125$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flight$</p></td><td><p align="left">$0.045$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$flights$</p></td><td><p align="left">$0.03375$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$houston$</p></td><td><p align="left">$0.18$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$i$</p></td><td><p align="left">$0.14$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$me$</p></td><td><p align="left">$0.0525$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$meal$</p></td><td><p align="left">$0.0016875$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$money$</p></td><td><p align="left">$0.005625$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$nwa$</p></td><td><p align="left">$0.12$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$she$</p></td><td><p align="left">$0.0175$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$you$</p></td><td><p align="left">$0.14$</p></td></tr>    
    <tr><td><p align="left">$Preposition$</p></td><td>$\rightarrow$</td><td><p align="left">$from$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$near$</p></td><td><p align="left">$0.15$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$on$</p></td><td><p align="left">$0.2$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$through$</p></td><td><p align="left">$0.05$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$to$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td><p align="left">$Verb$</p></td><td>$\rightarrow$</td><td><p align="left">$book$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$include$</p></td><td><p align="left">$0.3$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$prefer$</p></td><td><p align="left">$0.4$</p></td></tr>
    <tr><td><p align="left">$VP$</p></td><td>$\rightarrow$</td><td><p align="left">$book$</p></td><td><p align="left">$0.105$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$include$</p></td><td><p align="left">$0.105$</p></td></tr>
    <tr><td></td><td>$|$</td><td><p align="left">$prefer$</p></td><td><p align="left">$0.14$</p></td></tr>
</table>

#### Notes

- You are free to use a different IDE to develop your solution. However, you have to copy the solution into this notebook to submit it.
- Do not add additional external libraries.
- Interface
  - You can use _[TAB]_ for autocompletion and _[SHIFT]_+_[TAB]_ for code inspection.
  - Use _Menu_ -> _View_ -> _Toggle Line Numbers_ for debugging.
  - Check _Menu_ -> _Help_ -> _Keyboard Shortcuts_.
- Known issues
  - All global variables will be set to void after an import.
  - Missing spaces arround `%` (Modulo) can cause unexpected errors so please make sure that you have added spaces around every `%` character.
- Finish
  - Save your solution by clicking on the _disk icon_.
  - Make sure that all necessary imports are listed at the beginning of your cell.
  - Run a final check of your solution by
    - click on _restart the kernel, then re-run the whole notebook_ (the fast forward arrow in the tool bar)
    - wait fo the kernel to restart and execute all cells (all executable cells should have numbers in front of them instead of a `[*]`) 
    - Check all executed cells for errors. If an exception is thrown, please check your code. Note that although the error might look cryptic, until now we never encounter that an exception was caused without a valid reason inside of the submitted code. A good way to check the code is to copy the solution into a new class in your favorite IDE and check
      - errors reported by the IDE
      - imports the IDE adds to your code which might be missing in your submission.
  - Finally, choose _Menu_ -> _File_ -> _Close and Halt_.
  - Do not forget to _Submit_ your solution in the _Assignments_ view.

In [None]:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.stream.Collectors;

/**
 * A simple structure containing the right side of a rule and its probability.
 */
public class RightSide {
    /**
     * The probability of the rule.
     */
    public final double probability;
    /**
     * The log probability of the rule.
     */
    public final double logProbability;
    /**
     * The right side of the rule.
     */
    public final String[] values;

    public RightSide(double probability, String... values) {
        this.probability = probability;
        this.values = values;
        logProbability = Math.log(probability);
    }
}

public class ProbabilisticCKYParser {

    // YOUR CODE HERE

    /**
     * Constructor.
     * 
     * @param startSymbol
     *            the start symbol of the grammar
     * @param grammar
     *            the non-terminal to non-terminal rules
     * @param lexicon
     *            the non-terminal to terminal rules
     */
    public ProbabilisticCKYParser(String startSymbol, Map<String, RightSide[]> grammar,
            Map<String, RightSide[]> lexicon) {
        // YOUR CODE HERE
    }

    /**
     * Takes the sentence and returns the generated table serialized as a single
     * String,
     * 
     * @param sentence
     *            the sentence that should be parsed
     * @return the parsing tree in bracketed notation.
     */
    public String getParseTree(String sentence) {
        final double SENTINEL = 1.0d;
        String parseTree = null;
        // YOUR CODE HERE
        return parseTree;
    }
}
// This line should make sure that compile errors are directly identified when executing this cell
// (the line itself does not produce any meaningful result)
new ProbabilisticCKYParser("", new HashMap<>(), new HashMap<>());
System.out.println("compiled");

# Evaluation

- Run the following cell to test your implementation.
- You can ignore the cells afterwards.

In [None]:
%maven org.junit.jupiter:junit-jupiter-api:5.3.1
import org.junit.jupiter.api.Assertions;
import org.opentest4j.AssertionFailedError;

public static void checkParsingTree(ProbabilisticCKYParser recognizer, String sentence, String expectedTree) {
    try {
        long time1 = System.currentTimeMillis();
        String result = recognizer.getParseTree(sentence);
        time1 = System.currentTimeMillis() - time1;
        if (expectedTree == null) {
            Assertions.assertNull(result,
                    "The result was expected to be null. However, the result of your solution is \"" + result
                            + "\".");
        } else {
            Assertions.assertEquals(expectedTree, result);
        }
        System.out.println("Test successful. Calculation took " + time1 + "ms");
    } catch (AssertionFailedError e) {
        throw e;
    } catch (Throwable e) {
        System.err.println("Your solution caused an unexpected error:");
        throw e;
    }
}
Map<String, RightSide[]> grammar = new HashMap<>();
Map<String, RightSide[]> lexicon = new HashMap<>();
String expectedTree;
ProbabilisticCKYParser parser;

// Check the very simple S A B C Grammar
grammar.put("S", new RightSide[] { new RightSide(0.4, "A", "B"), new RightSide(0.6, "A", "C") });
lexicon.put("A", new RightSide[] { new RightSide(1.0, "a") });
lexicon.put("B", new RightSide[] { new RightSide(1.0, "b") });
lexicon.put("C", new RightSide[] { new RightSide(0.6, "b"), new RightSide(0.4, "c") });
parser = new ProbabilisticCKYParser("S", grammar, lexicon);
expectedTree = "[S[A[a]B[b]]]";
checkParsingTree(parser, "a b", expectedTree);

// Define the flight grammar
grammar.clear();
grammar.put("Nominal",
        new RightSide[] { new RightSide(0.2, "Nominal", "Noun"), new RightSide(0.05, "Nominal", "PP") });
grammar.put("NP", new RightSide[] { new RightSide(0.2, "Det", "Nominal"),
        new RightSide(0.03, "Nominal", "Noun"), new RightSide(0.0075, "Nominal", "PP") });
grammar.put("PP", new RightSide[] { new RightSide(1.0, "Preposition", "NP") });
grammar.put("S", new RightSide[] { new RightSide(0.8, "NP", "VP"), new RightSide(0.15, "X1", "VP"),
        // S → VP [.05] x VP → Verb NP [.20] = 0.01
        new RightSide(0.01, "Verb", "NP"),
        // S → VP [.05] x VP → Verb NP PP [.10] = 0.005
        new RightSide(0.005, "X2", "PP"),
        // S → VP [.05] x VP → Verb PP [.15]
        new RightSide(0.0075, "Verb", "PP"),
        // S → VP [.05] x VP → VP PP [.15]
        new RightSide(0.0075, "VP", "PP"),
        // S → VP [.05] x VP → VP → Verb NP NP [.05]
        new RightSide(0.0025, "X3", "PP"), });
grammar.put("VP", new RightSide[] { new RightSide(0.2, "Verb", "NP"), new RightSide(0.1, "X2", "PP"),
        new RightSide(0.15, "Verb", "PP"), new RightSide(0.15, "VP", "PP"), new RightSide(0.05, "X3", "PP") });
grammar.put("X1", new RightSide[] { new RightSide(1.0, "Aux", "NP") });
grammar.put("X2", new RightSide[] { new RightSide(1.0, "Verb", "NP") });
grammar.put("X3", new RightSide[] { new RightSide(1.0, "NP", "NP") });

lexicon.clear();
lexicon.put("Aux", new RightSide[] { new RightSide(0.4, "can"), new RightSide(0.6, "does") });
lexicon.put("Det",
        new RightSide[] { new RightSide(0.6, "a"), new RightSide(0.10, "that"), new RightSide(0.30, "the") });
// Nominal → Noun [.75] x Noun → book [.10] | flights [.30] | meal [.015] |
// money [.05] | flight [.40] | dinner [.10] | boat [.035]
lexicon.put("Nominal",
        new RightSide[] { new RightSide(0.075, "book"), new RightSide(0.225, "flights"),
                new RightSide(0.01125, "meal"), new RightSide(0.0375, "money"), new RightSide(0.3, "flight"),
                new RightSide(0.075, "dinner"), new RightSide(0.02625, "boat") });
lexicon.put("Noun",
        new RightSide[] { new RightSide(0.1, "book"), new RightSide(0.3, "flights"),
                new RightSide(0.015, "meal"), new RightSide(0.05, "money"), new RightSide(0.4, "flight"),
                new RightSide(0.1, "dinner"), new RightSide(0.035, "boat") });
// NP → Pronoun [.35] x Pronoun → I [.40] | she [.05] | me [.15] | you [.40]
lexicon.put("NP", new RightSide[] { new RightSide(0.14, "i"), new RightSide(0.0175, "she"),
        new RightSide(0.0525, "me"), new RightSide(0.14, "you"),
        // NP → Proper-Noun [.30] x Proper-Noun → Houston [.60] | NWA [.40]
        new RightSide(0.18, "houston"), new RightSide(0.12, "nwa"),
        // NP → Nominal [.15] x Nominal → Noun [.75] x Noun → book [.10] | flights [.30]
        // | meal [.015] | money [.05] | flight [.40] | dinner [.10] | boat [.035]
        new RightSide(0.01125, "book"), new RightSide(0.03375, "flights"), new RightSide(0.0016875, "meal"),
        new RightSide(0.005625, "money"), new RightSide(0.045, "flight"), new RightSide(0.01125, "dinner"),
        new RightSide(0.0039375, "boat") });
lexicon.put("Preposition", new RightSide[] { new RightSide(0.3, "from"), new RightSide(0.15, "near"),
        new RightSide(0.2, "on"), new RightSide(0.05, "through"), new RightSide(0.3, "to") });
// S → VP [.05] x VP → Verb [.35] x Verb → book [.30] | include [.30] | prefer
// [.40]
lexicon.put("S", new RightSide[] { new RightSide(0.00525, "book"), new RightSide(0.00525, "include"),
        new RightSide(0.007, "prefer") });
lexicon.put("Verb", new RightSide[] { new RightSide(0.3, "book"), new RightSide(0.3, "include"),
        new RightSide(0.4, "prefer") });
// VP → Verb [.35] x Verb → book [.30] | include [.30] | prefer [.40]
lexicon.put("VP", new RightSide[] { new RightSide(0.105, "book"), new RightSide(0.105, "include"),
        new RightSide(0.14, "prefer") });

parser = new ProbabilisticCKYParser("S", grammar, lexicon);

// Check a simple sentence
expectedTree = "[S[X2[Verb[book]NP[Det[the]Nominal[flight]]]PP[Preposition[through]NP[houston]]]]";
checkParsingTree(parser, "book the flight through houston", expectedTree);

// Check a sentence with an unknown word
checkParsingTree(parser, "my flight to paris", null);

// Check a sentence with an unknown structure
checkParsingTree(parser, "to houston i prefer", "");

In [None]:
// Ignore this cell

In [None]:
// Ignore this cell