# NLTK Chapter 10

## Analyzing the Meaning of Sentences

*The html version of this chapter in the book is available [here](https://www.nltk.org/book/ch10.html "ch10").*

### 1 Natural Language Understanding

#### 1.1   Querying a Database

The grammar used below allows us (in very limited situations) to convert a natural language query into SQL.  Unfortunately, the grammar no longer seems to be in the folder `book_grammars`, so I had to download the grammar from GitHub, which took a bit of tinkering to get to work:*

In [27]:
import nltk

from nltk import load_parser

cp = nltk.parse.util.load_parser('https://raw.githubusercontent.com/nltk/nltk_teach/master/examples/grammars/book_grammars/sql0.fcfg')

query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china"


__Your Turn:__ Run the parser with maximum tracing on, i.e., `cp = load_parser('grammars/book_grammars/sql0.fcfg', trace=3)`, and examine how the values of `sem` are built up as complete edges are added to the chart.

In [28]:
cp = nltk.parse.util.load_parser('https://raw.githubusercontent.com/nltk/nltk_teach/master/examples/grammars/book_grammars/sql0.fcfg', 
                                 trace = 3)

query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

|.W.c.a.l.i.C.|
Leaf Init Rule:
|[-] . . . . .| [0:1] 'What'
|. [-] . . . .| [1:2] 'cities'
|. . [-] . . .| [2:3] 'are'
|. . . [-] . .| [3:4] 'located'
|. . . . [-] .| [4:5] 'in'
|. . . . . [-]| [5:6] 'China'
Feature Bottom Up Predict Combine Rule:
|[-] . . . . .| [0:1] Det[SEM='SELECT'] -> 'What' *
Feature Bottom Up Predict Combine Rule:
|[-> . . . . .| [0:1] NP[SEM=(?det+?n)] -> Det[SEM=?det] * N[SEM=?n] {?det: 'SELECT'}
Feature Bottom Up Predict Combine Rule:
|. [-] . . . .| [1:2] N[SEM='City FROM city_table'] -> 'cities' *
Feature Single Edge Fundamental Rule:
|[---] . . . .| [0:2] NP[SEM=(SELECT, City FROM city_table)] -> Det[SEM='SELECT'] N[SEM='City FROM city_table'] *
Feature Bottom Up Predict Combine Rule:
|[---> . . . .| [0:2] S[SEM=(?np+WHERE+?vp)] -> NP[SEM=?np] * VP[SEM=?vp] {?np: (SELECT, City FROM city_table)}
Feature Bottom Up Predict Combine Rule:
|. . [-] . . .| [2:3] IV[SEM=''] -> 'are' *
Feature Bottom Up Predict Combine Rule:
|. . [-> . . .| [2:3] VP[SEM=(?v+?pp)] 

Getting results:

In [31]:
from nltk.sem import chat80
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: print(r[0], end = " ")

canton chungking dairen harbin kowloon mukden peking shanghai sian tientsin 

__Your Turn__: Extend the grammar `sql0.fcfg` so that it will translate [(4a)](https://www.nltk.org/book/ch10.html#ex-dbq21) into [(4b)](https://www.nltk.org/book/ch10.html#ex-dbq22), and check the values returned by the query.

You will probably find it easiest to first extend the grammar to handle queries like *What cities have populations above 1,000,000* before tackling conjunction. After you have had a go at this task, you can compare your solution to `grammars/book_grammars/sql1.fcfg` in the NLTK data distribution.

*Usually the __Your Turn__ exercises in this book are fairly trivial, but not this one.  My implementation didn't work until I peeked at the solution to find out what I was doing wrong.*

*First, the code to make an SQL query from a natural language query looking for cities with more than 1,000,000 inhabitants:*

In [46]:
from nltk import grammar, parse

g = """
% start S

S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]

VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
VP[SEM=(?v + ?np)] -> TV[SEM=?v] NP[SEM=?np]

NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
NP[SEM=(?n + ?pp)] -> N[SEM = ?n] PP[SEM=?pp]
NP[SEM=?n] -> N[SEM=?n] | Num[SEM=?n]

Num[SEM='1000'] -> '1,000,000'

PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]

NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'

Det[SEM='SELECT'] -> 'Which' | 'What'

N[SEM='City FROM city_table'] -> 'cities'
N[SEM='Population'] -> 'populations'

IV[SEM=''] -> 'are'
TV[SEM=''] -> 'have'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in' 
P[SEM='>'] -> 'above'

"""

gram = grammar.FeatureGrammar.fromstring(g)
cp = parse.FeatureEarleyChartParser(gram)

query = 'What cities have populations above 1,000,000'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Population > 1000


*Answer to first query:*

In [47]:
from nltk.sem import chat80
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: print(r[0], end = " ")

athens bangkok barcelona berlin birmingham bombay bucharest budapest buenos_aires cairo calcutta canton chicago chungking delhi detroit glasgow hamburg hongkong_city hyderabad istanbul karachi kyoto leningrad london los_angeles madras madrid manila melbourne mexico_city milan montreal moscow mukden nagoya nanking naples new_york osaka paris peking philadelphia rio_de_janeiro rome santiago sao_paulo seoul shanghai singapore_city sydney tehran tientsin tokyo vienna yokohama 

In [48]:
from nltk import grammar, parse

g = """
% start S

S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]

VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
VP[SEM=(?v + ?np)] -> TV[SEM=?v] NP[SEM=?np]
VP[SEM=(?v1 + ?cc + ?v2)] -> VP[SEM=?v1] CC[SEM=?cc] VP[SEM=?v2]

NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
NP[SEM=(?n + ?pp)] -> N[SEM = ?n] PP[SEM=?pp]
NP[SEM=?n] -> N[SEM=?n] | Num[SEM=?n]

Num[SEM='1000'] -> '1,000,000'

PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]

NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'

Det[SEM='SELECT'] -> 'Which' | 'What'

N[SEM='City FROM city_table'] -> 'cities'
N[SEM='Population'] -> 'populations'

IV[SEM=''] -> 'are'
TV[SEM=''] -> 'have'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in' 
P[SEM='>'] -> 'above'
CC[SEM='AND'] -> 'and'
"""

gram = grammar.FeatureGrammar.fromstring(g)
cp = parse.FeatureEarleyChartParser(gram)

query = 'What cities are in China and have populations above 1,000,000'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china" AND Population > 1000


*Answer to second query.  Bear in mind this data is nearly 40 years old.  Now the answers would be much, much different:*

In [49]:
from nltk.sem import chat80
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: print(r[0], end = " ")

canton chungking mukden peking shanghai tientsin 

#### 1.2   Natural Language, Semantics and Logic

*__No notes.__*

### 2   Propositional Logic

Boolean operators in NLTK:

In [50]:
nltk.boolean_ops()

negation       	-
conjunction    	&
disjunction    	|
implication    	->
equivalence    	<->


*__Table 2.1:__*

Truth conditions for the Boolean Operators in Propositional Logic.


|Boolean Operator |	|Truth Conditions|		|
|:---	|:---	|:---:|:---|
|negation (*it is not the case that…*)|	-φ is true in $s$|	iff	|φ is false in $s$|
|conjunction (*and*)|	(φ & ψ) is true in $s$ |	iff|	φ is true in $s$ and ψ is true in $s$ |
|disjunction (*or*)|	(φ \| ψ) is true in $s$ |	iff|	φ is true in $s$ or ψ is true in $s$ |
|implication (*if ..., then …*)|	(φ -> ψ) is true in $s$ |	iff|	φ is false in $s$ or ψ is true in $s$ |
|equivalence (*if and only if*)|	(φ <-> ψ) is true in $s$|	iff|	φ and ψ are both true in $s$ or both false in $s$|


Most of these rules are straightforward, with the exception of *implication*.  An implication of the form (`P -> Q`) is only false when `P` is true and `Q` is false.  Thus, a formula where `P` corresponds to 'Elvis is still alive and is working at a Wal-Mart in Arkansas' and `Q` corresponds to 'Thursday comes after Wednesday' would come out true.

NLTK's `Expression` object can process logical expressions

In [51]:
read_expr = nltk.sem.Expression.fromstring
read_expr('-(P & Q)')

<NegatedExpression -(P & Q)>

In [52]:
read_expr('P & Q')

<AndExpression (P & Q)>

In [53]:
read_expr('P | (R -> Q)')

<OrExpression (P | (R -> Q))>

In [54]:
read_expr('P <-> -- P')

<IffExpression (P <-> --P)>

In NLTK, a third-party application called Prover9 can test theorems, but this needs to be downloaded separately and I can't seem to find anyone who's currently hosting the file, so I'm skipping the next section.



Paused at "Recall that we interpret sentences of a logical language relative to a model, which is a very simplified version of the world...."