# Analyzing the Meaning of Sentences

## Natural Language Understanding

### Querying a Database

Suppose we wanted to ask the question `What cities are located in China?` and the information is present in a Database that can be queried using SQL. 

We can use a feature based grammar that parses a sentence in English to SQL as below:

In [4]:
import nltk

In [9]:
nltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'


Parsing the query into SQL:

In [11]:
from nltk import load_parser
cp = load_parser("grammars/book_grammars/sql0.fcfg")
query = "What cities are located in China"

In [12]:
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = " ".join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china"


Executing the query over a pre-loaded database:

In [13]:
from nltk.sem import chat80
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: print(r[0], end=" ")

canton chungking dairen harbin kowloon mukden peking shanghai sian tientsin 

## Propositional Logic

Propositional logic allows us to represent those parts of linguistic structure which correspond to certain sentential connectives.

The basic expressions of propositional logic are propositional symbols, often written as P, Q, R, etc. There are varying conventions for representing boolean operators. Since we will be focusing on ways of exploring logic within NLTK, we will stick to the following ASCII versions of the operators. 

Examples below:

In [14]:
nltk.boolean_ops()

negation       	-
conjunction    	&
disjunction    	|
implication    	->
equivalence    	<->


NLTKs Expression object can process logical expressions into various subclasses of Expression:

In [15]:
read_expr = nltk.sem.Expression.fromstring

In [16]:
read_expr("-(P & Q)")

<NegatedExpression -(P & Q)>

In [18]:
read_expr("P & Q")

<AndExpression (P & Q)>

In [20]:
read_expr("P | (R -> Q)")

<OrExpression (P | (R -> Q))>

In [21]:
read_expr("P <-> -- P")

<IffExpression (P <-> --P)>

**Syntactic Validity**

Arguments can be checked for "syntactic validity" by using a proof system. 

In [25]:
SnF = read_expr("SnF")
NotFnS = read_expr('-FnS')
R = read_expr('SnF -> -FnS')
prover = nltk.Prover9()
prover.prove(NotFnS, [SnF, R])

True

**Valuations**

A Valuation is a mapping from basic expressions of the logic to their values. Here's an example:

In [27]:
val = nltk.Valuation([('P', True), ('Q', True), ('R', False)])

In [28]:
val['P']

True

In [29]:
# let's initialize a model m that uses val
dom = set()
g = nltk.Assignment(dom)
m = nltk.Model(dom, val)

In [30]:
print(m.evaluate('(P & Q)', g))

True


In [31]:
print(m.evaluate('-(P & Q)', g))

False


In [32]:
print(m.evaluate('(P & R)', g))

False


In [33]:
print(m.evaluate('(P | R)', g))

True


## First-Order Logic

First-order logic keeps all the boolean operators of Propositional Logic. But it adds some important new mechanisms.

The standard construction rules for first-order logic recognize terms such as individual variables and individual constants, and predicates which take differing numbers of arguments. For example, Angus walks might be formalized as *walk(angus)* and Angus sees Bertie as *see(angus, bertie)*. We will call walk a unary predicate, and see a binary predicate. 

### Syntax

The usual way of expressing first-order logic syntax is to assign **types** to expressions. We will use two basic types: e is the type of entities, while t is the type of formulas, i.e., expressions which have truth values. 

Given these two basic types, we can form complex types for function expressions. That is, given any types σ and τ, 〈σ, τ〉 is a complex type corresponding to functions from 'σ things' to 'τ things'. For example, 〈e, t〉 is the type of expressions from entities to truth values, namely unary predicates. 

In [48]:
read_expr = nltk.sem.Expression.fromstring
expr = read_expr("walk(Angus)", type_check=True)
expr.argument

<ConstantExpression Angus>

In [35]:
expr.argument.type

e

In [36]:
expr.function

<ConstantExpression walk>

In [37]:
expr.function.type

<e,?>

To help the type-checker, we need to specify a signature, implemented as a dictionary that explicitly associates types with non-logical constants:

In [38]:
sig = {"walk": "<e, t>"}
expr = read_expr("walk(angus)", signature=sig)
expr.function.type

e

In first-order logic, arguments of predicates can also be individual variables such as x, y and z. In NLTK, we adopt the convention that variables of type e are all lowercase.

In general, an occurrence of a variable x in a formula φ is free in φ if that occurrence doesn't fall within the scope of all x or some x in φ. Conversely, if x is free in formula φ, then it is bound in all x.φ and exists x.φ. If all variable occurrences in a formula are bound, the formula is said to be closed.

We mentioned before that the Expression object can process strings, and returns objects of class Expression. Each instance expr of this class comes with a method free() which returns the set of variables that are free in expr.

In [39]:
read_expr("dog(cyril)").free()

set()

In [40]:
read_expr("dog(x)").free()

{Variable('x')}

In [41]:
read_expr("exists x.dog(x)").free()

set()

In [42]:
read_expr('((some x. walk(x)) -> sing(x))').free()

{Variable('x')}

In [45]:
read_expr('exists x.own(y, x)').free()

{Variable('y')}

### First Order Theorem Proving

Let's use a first order logic to formalize rules such as 

`all x. all y.(north_of(x, y) -> -north_of(y, x))`

In [49]:
NotFnS = read_expr("-north_of(f,s)")
SnF = read_expr("north_of(s,f)")
R = read_expr("all x. all y.(north_of(x, y) -> -north_of(y, x))")

In [50]:
prover = nltk.Prover9()

In [52]:
prover.prove(NotFnS, [SnF, R])

True

In [55]:
FnS = read_expr("north_of(f,s)")
prover.prove(FnS, [SnF, R])

False

### Truth in Model

Relations are represented semantically in NLTK in the standard set-theoretic way: as sets of tuples. For example, let's suppose we have a domain of discourse consisting of the individuals Bertie, Olive and Cyril, where Bertie is a boy, Olive is a girl and Cyril is a dog. For mnemonic reasons, we use b, o and c as the corresponding labels in the model. We can declare the domain as follows:

In [56]:
dom = {"b", "o", "c"}

In [59]:
v = """
    bertie => b
    olive => o
    cyril => c
    boy => {b}
    girl => {o}
    dog => {c}
    walk => {o, c}
    see => {(b, o), (c, b), (o, c)}
    """
val = nltk.Valuation.fromstring(v)
print(val)

{'bertie': 'b',
 'boy': {('b',)},
 'cyril': 'c',
 'dog': {('c',)},
 'girl': {('o',)},
 'olive': 'o',
 'see': {('b', 'o'), ('o', 'c'), ('c', 'b')},
 'walk': {('c',), ('o',)}}


### Individual Variables and Assignments

A variable assignment is a mapping from individual variables to entities in the domain. Assignments are created using the Assignment constructor, which also takes the model's domain of discourse as a parameter.

In [63]:
g = nltk.Assignment(dom, [('x', 'o'), ('y', 'c')])
g

{'x': 'o', 'y': 'c'}

In [64]:
print(g)

g[c/y][o/x]


Evaluating an atomic formula of first order logic:

In [65]:
m = nltk.Model(dom, val)
m.evaluate("see(olive, y)", g)

True

In the above cell, we are evaluating a formula similar to our earlier examplle, see(olive, cyril). 

However, when the interpretation function encounters the variable y, rather than checking for a value in val, it asks the variable assignment g to come up with a value:

In [67]:
g['y']

'c'

Checking the inverse relation:

In [68]:
m.evaluate("see(y, x)", g)

False

The method `purge()` clears all bindings from an assignment:

In [70]:
g.purge()
g

{}

If we not try to evaulate the formula `see(olive, y)`, it is like trying to interpret a sentence conatining a him, and we don't know what him refers to. 

In [71]:
m.evaluate("see(olve, y)", g)

'Undefined'

Since our models already contain rules for interpreting boolean operators, arbitrarily complex formulas can be composed and evaluated.

In [73]:
m.evaluate('see(bertie, olive) & boy(bertie) & -walk(bertie)', g)

True

The general process of determining truth or falsity of a formula in a model is called model checking.

### Quantification

Consider the following:

In [74]:
m.evaluate("exists x.(girl(x) & walk(x))", g)

True

evaluate() returns True here because there is some u in dom such that the formila is satisfied by an assignment which binds x to u. In fact, o is such a u:

In [75]:
m.evaluate('girl(x) & walk(x)', g.add('x', 'o'))

True

One useful tool offered by NLTK is the `satisfiers()` method. This returns a set of all the individuals that satisfy an open formula. 

In [76]:
fmla1 = read_expr("girl(x) | boy(x)")
m.satisfiers(fmla1, "x", g)

{'b', 'o'}

In [77]:
fmla2 = read_expr("girl(x) -> walk(x)")
m.satisfiers(fmla2, "x", g)

{'b', 'c', 'o'}

The rest of this chapter is text, theory heavy. It focuses on using the first order logic to represent the semantics of english sentences and discourse processing.

It is redundant to write all the text explanations in this notebook, and the code blocks correspond directly to the explanations. Thus it is recommended to read the chapter in its entirety from the book.

Further reading: https://www.nltk.org/book/ch10.html#ex-sem8