# 10.1 - Natural Language Understanding

## Querying a Database
Given that there exists a database of cities and their respective countries, e.g. 

In [6]:
import pandas as pd
city_table = [{"City": "Athens", "Country":"Greece"},
              {"City": "Bangkok", "Country":"Thailand"},
              {"City": "Barcelona", "Country":"Spain"}]
pd.DataFrame(city_table)

Unnamed: 0,City,Country
0,Athens,Greece
1,Bangkok,Thailand
2,Barcelona,Spain


Then when a question like "*What country is Athens in?*" is asked, the answer to that question is "*Greece*". The way to perform such question-and-answer (or querying) is via SQL. For example, the question can be translated to:

`SELECT country FROM city_table WHERE city = 'Athens'; `

However, in order to address a task in a more general domain, we will need a whole new set of tools.

How do we get the same effect using English as our input? The feature-based grammar learnt in Chapter 9 illustrates how to assemble a meaning representation for a sentence in tandem with parsing the sentence. 

In [2]:
import nltk
nltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'


This will allow us to parse a query into SQL (2 examples follow):

In [7]:
from nltk import load_parser
cp = load_parser('grammars/book_grammars/sql0.fcfg')
query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china"


In [9]:
query2 = 'What cities are located in Greece'
trees2 = list(cp.parse(query2.split()))
answer2 = trees2[0].label()['SEM']
answer2 = [s for s in answer2 if s]
q2 = ' '.join(answer2)
print(q2)

SELECT City FROM city_table WHERE Country="greece"


We are now able to return useful data in response to a query in natural language.

In [5]:
from nltk.sem import chat80
#nltk.download('city_database')
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: 
    print(r[0])

canton
chungking
dairen
harbin
kowloon
mukden
peking
shanghai
sian
tientsin


There is 2 criticism points to this approach.

* the grammar is very hard-wired. The production `N[SEM='City FROM city_table'] -> 'cities'` requires us to explicitly know the table name but the same data could be in a different table. 
* the phrases `Which` `cities` correspond to the SQL `SELECT` and `from city_table` respectively. But none of these have a well-defined meaning when isolated from the other.

Hence, we need an approach more abstract and generic than SQL.

## Natural Language, Semantics and Logic

Let's introduce two fundamental notions in semantics. The first is that **declarative sentences are *true* or *false* in certain situations**. The second is that **definite noun phrases and proper nouns refer to things in the world**.

Once we have adopted the notion of truth in a situation, we have a powerful tool for reasoning. We can look at sets of sentences and ask if they can be true together in some situation. 

**Example 1**

* Singapore is to the south of Malaysia
* Singapore is a republic

Both these sentences in **Example 1** can be true, and are called **consistent sentences**.

**Example 2**

* Singapore is to the south of Malaysia
* Singapore is to the north of Malaysia

**Example 3**
* The population of Kuala Lumpur is 1.6 million.
* No city in Malaysia has a population of 1.6 million.

**Example 4**
* The population of Singapore is 5.4 million.
* The population of Singapore is 3.0 million.

Both these sentences in **Example 2** cannot be true, because the relation *to the south* is asymmetric to *to the north*. Similarly, the capital of a country is a city within that country, or the numeric value  of a property can only take one value. Hence both sentences in **Example 3** and **Example 4** cannot be true at the same time.These sentences are called **inconsistent sentences**.

Logic-based approaches to natural language semantics focus on aspects of natural language that guide our judgements of consistency and inconsistency. Determining properties of consistency can be reduced to a task manipulated by a computer. Let's first develop a technique to representing a solution, what logicians call a **model**.

A **model** is a set $W$ of sentences in a formal representation of a situation where all the sentences in $W$ are true.

The usual way of representing models involves set theory. The domain $D$ of discource (all the entities we are about) is a set of individuals, while relations are treated as sets built up from $D$. 

For example, a domain $D$ consists of three children, Stefan, $s$, Klaus, $k$ and Evi, $e$. This is denoted as

$$
D = \left\{s, k, e\right\}
$$

The following sentences follow:

* The expression *boy* denotes the set consisting of Stefan and Klaus. 
* The expression *girl* denotes the set consisting of Evi.
* The expression *is running* denotes the set consisting of Stefan and Evi.

You are able to create a Venn diagram to represent these 3 sets and their respective members, in a universal set $D$.

Can a computer understand the meaning of a sentence? And how could we tell if it did...in other words, can a computer think? *Alan Turing* famously proposed to answer this by examining the ability of a computer to hold sensible conversations with a human.

Suppose you're having a chat session with an entity, a person or a computer, but you're not told on the outset which is it. If you cannot identify which of your partners is a computer after chatting with them then the computer has successfully imitated a human, or has passed the "Turing Test". If a computer can pass the Turing test then we should be prepared to say a computer *can think* and can be said to be intelligent.