Code Preparations
===

In [2]:
import nltk
from nltk import grammar, parse
from nltk.parse.generate import generate
from platform import python_version
python_version()

'3.7.16'

In [3]:
print(nltk.__version__)

3.5


In [4]:
nltk.data.show_cfg('h2.fcfg')

% start S
# Grammar Rules
S[SEM=<?subj(?vp)>] -> DP[SEM=?subj] VP[SEM=?vp]
S[SEM=<?vp(?subj)>] -> NP[SEM=?subj] VP[SEM=?vp]
S[SEM=<?vp1(?subj) & ?vp2(?subj)>] -> NP[SEM=?subj] VP[SEM=?vp1] 'and' VP[SEM=?vp2]
DP[ SEM=<?X(?P)>] -> Det[SEM=?X] N[SEM=?P] 
VP[SEM=?Q] -> 'is' A[SEM=?Q]
VP[SEM=?P] -> 'is' 'a' N[SEM=?P]
# This is included for testing.
VP[SEM=<\x.offend(x)>] -> 'offends'
# Transitive verb with individual object.
VP[ SEM=<?R(?n)>] -> TV[SEM=?R] NP[SEM=?n]
# Transitive verb with quantifier object.
# The object is given minimal scope.
VP[ SEM=<\m.?X(\n.(?R(n)(m)))>] -> TV[SEM=?R] DP[SEM=?X]
# Lexical Rules
A[SEM=<\n.exists c.(vowel(c) & char(n,c))>] -> 'vocalic'
A[SEM=<\n.exists c.((-vowel(c)) & char(n,c))>] -> 'consonantal'
A[SEM=<\n.exists c.(capital(n) & char(n,c))>] -> 'capitalized'
A[SEM=<\n. (n = 1) >] -> 'initial'
A[SEM=<\n. all m.-le(n,m)  >] -> 'final'
Det[SEM=<\P Q.all n.(P(n) -> Q(n))>] -> 'every'
Det[SEM=<\P Q.exists n.(P(n) & Q(n))>] -> 'a'
Det[SEM=<\P Q.exists n.(P(n

In [5]:
cp = nltk.load_parser('h2.fcfg', trace=0)
sent = 'letter two is vocalic'.split()
for tree in cp.parse(sent): print(tree)

(S[SEM=<exists c.(vowel(c) & char(2,c))>]
  (NP[SEM=<2>] letter two)
  (VP[SEM=<\n.exists c.(vowel(c) & char(n,c))>]
    is
    (A[SEM=<\n.exists c.(vowel(c) & char(n,c))>] vocalic)))


In [6]:
tree.draw()

![1.png](1.png)

to use Angela Liu's implementation from mapping the word

In [7]:
from typing import Callable, List, Set

def to_model_str(word: str, special_rels: List[Callable[[str], str]]=[]) -> str:
    """
    Creates the string form of the model for the input word. This string is meant to be passed to `nltk.Valuation.fromstring`.
    By default, the function will only add the relations mapping i => i for i from 1 to the length of `word` and a relation 
    mapping char => the set of tuples (i, word[i]). The `special_rels` function allows you to specify additional relations to 
    be added to the valuation string.
    
    :param word: The word to create a model string for.
    :param special_rels: A list of functions that when called return a string of the form {relation_name} => {relation_contents}. Defaults to the empty list.
    :returns: a string representing the model for word
    """
    n = len(word)
    model_str = []
    char = []
    for i in range(1, n+1):
        model_str.append(f'{i} => {i}')
        char.append((i, word[i-1]))
    model_str.append(f'char => {set(char)}'.lower())
    return '\n'.join(model_str + [rel(word) for rel in special_rels]).replace("'", "")
# Angela Liu
import re


get_vowel = lambda w: f'vowel => {set(re.findall(r"[AEIOUaeiou]", w))}'.lower()
get_cons = lambda w: 'cons => {}'.format(set(re.findall(r"[^AEIOUaeiou\W0-9]", w))).lower()
follows = lambda w: f'le => {set([(i+1,j+1) for i in range(len(w)) for j in range(i, len(w)) if i != j])}'
get_capital = lambda w: f'capital => {set([m.span()[0] + 1 for m in re.finditer(r"[A-Z]", w)])}'
# Angela Liu

get_glide = lambda w: f'glide => {set(re.findall(r"[YWyw]", w))}'.lower()
# is_final = lambda w: f'final => {set((len(w),))}'
# added

def emptysets(val:nltk.sem.evaluate.Valuation):
  val.update([(k,set()) for (k,v) in val.items() if v == 'set()'])

words = ['cat', 'mAtch', 'peRiLOuSy']
vals = [nltk.Valuation.fromstring(to_model_str(w, [get_vowel, get_cons, follows, get_capital,get_glide])) for w in words]
for v in vals: emptysets(v)
models = [nltk.Model(val.domain, val) for val in vals]
for w, m in zip(words, models):
    print(f'{w}\n----------------\n{m}\n')
# Angela Liu

cat
----------------
Domain = {'a', '2', '3', '1', 'c', 't'},
Valuation = 
{'1': '1',
 '2': '2',
 '3': '3',
 'capital': set(),
 'char': {('3', 't'), ('1', 'c'), ('2', 'a')},
 'cons': {('t',), ('c',)},
 'glide': set(),
 'le': {('1', '3'), ('1', '2'), ('2', '3')},
 'vowel': {('a',)}}

mAtch
----------------
Domain = {'4', '5', 'a', '2', '3', '1', 'm', 'h', 'c', 't'},
Valuation = 
{'1': '1',
 '2': '2',
 '3': '3',
 '4': '4',
 '5': '5',
 'capital': {('2',)},
 'char': {('2', 'a'), ('5', 'h'), ('4', 'c'), ('3', 't'), ('1', 'm')},
 'cons': {('h',), ('t',), ('c',), ('m',)},
 'glide': set(),
 'le': {('1', '2'),
        ('1', '3'),
        ('1', '4'),
        ('1', '5'),
        ('2', '3'),
        ('2', '4'),
        ('2', '5'),
        ('3', '4'),
        ('3', '5'),
        ('4', '5')},
 'vowel': {('a',)}}

peRiLOuSy
----------------
Domain = {'u', '4', '6', 'r', '5', '2', 'o', '3', '1', 's', 'i', 'y', '9', 'l', '7', 'e', '8', 'p'},
Valuation = 
{'1': '1',
 '2': '2',
 '3': '3',
 '4': '4',
 '5'

Start of the work:
===

```
Semantics of sentences about strings
Computational Linguistics Spring 2023
Problems Set 2
```

The text for this module is the NLTK book
Chapter 9. Building Feature Based Grammars and
Chapter 10. Analyzing the Meaning of Sentences

See also lecture8_2023.ipynb and string_2023.ipynb

The purpose of the assingnment is to develop feature-based grammars that include
logical semantics, and to evaluate the adequacy of the semantics by computing
truth in logically constructed models.  For instance, we want to be able to
evaluate whether the sentence

   every consonant is capitalized

is true or false as description of the word

   CINEMA

or

   Cinema

or

   CINEmA.

In each problem n do these steps. The problem statement gives sentence sn. See
Chapters 9 and 10 for the methodology.

(i) Define a feature based grammar gn that includes all the words in sentence sn
its lexicon.  The feature grammars will usually add a word and/or construction
to a base grammar which will be similar to simple-sem.fcfg.  This base grammar
will be distributed. (It will be helpful to figure out how to add a lexical item
or production to a grammar in Python. Discuss the method for this on the forum.
Or you can define the grammar from scratch.)

(ii) Parse the sentenced and display the tree.

(iii) Map sn to a logical formula fn by parsing with gn and extracting the
semantics that annotates the root.

(iv) Define a combination four words (serving as models) and the intuitive
truth values of sentence sn as a description of the word.

(v) Transform the four words into four models or valuations in the sense of
Chapter 10. This can be done as in Lecture 8, or by using a function.
Code for this may be shared and discussed on the forum.

(vi) Evaluate formula fn in the four models to obtain four truth values. Compare them
to the target truth values.

Work individually, except that code for mapping a word to a valuation may be
shared.  Post techinical questions and requests for hints on the forum.

Notes
The problems are selected so that quantifiers are used only in subject position.
So it is not necessary (except perhaps in challenge problems) to apply the strategy
from simple-sem.fcfg to fit quantified NPs into object positions.

The words 'precedes' and 'follows' are interpreted in the sense of 'not necessarily
immediately'.


Problems
=======

1. letter two is consonantal   
Base your analysis on 'letter two is vocalic', which is covered by the base grammar.


I added a grammar rule 
```
A[SEM=<\n.exists c.(consonant(c) & char(n,c))>] -> 'consonantal'
```

parse and display

In [8]:
g1 = nltk.load_parser('h2.fcfg', trace=0)
s1 = 'letter two is consonantal'
s1_split = s1.split()
for tree in g1.parse(s1_split): print(tree)

(S[SEM=<exists c.(-vowel(c) & char(2,c))>]
  (NP[SEM=<2>] letter two)
  (VP[SEM=<\n.exists c.(-vowel(c) & char(n,c))>]
    is
    (A[SEM=<\n.exists c.(-vowel(c) & char(n,c))>] consonantal)))


In [9]:
tree.draw()

![2](2.png)

now the logical formula

In [10]:
t1=next(g1.parse(s1_split))
f1 = t1.label()['SEM']
print(f1)

exists c.(-vowel(c) & char(2,c))


define a sample four word

In [11]:
e1 = [('emu',True),('bat',False),('cat',False),('aka',True)]

In [12]:
words = [e[0] for e in e1]
truths = [e[1] for e in e1]
vals = [nltk.Valuation.fromstring(to_model_str(w, [get_vowel, follows])) for w in words]
assignments = [nltk.Assignment(val.domain) for val in vals]
for val in vals: emptysets(val)
models = [nltk.Model(val.domain, val) for val in vals]

In [13]:
print(f'{s1}\n---------------')
for w, a, m in zip(words, assignments, models):
    print(f'{w}\n{m.evaluate(str(f1),a)}\n----------------')

letter two is consonantal
---------------
emu
True
----------------
bat
False
----------------
cat
False
----------------
aka
True
----------------


```diff
+ The answer is correct! emu & aka has the second letter being consonants
```

6. letter three is final

   Define "final" in a way that works for words of any length. Don't include a
   corresponding constant in the valutions. Decide what should happen with a words of
   length one.


I added a grammar rule 
```
A[SEM=<\n. all m.-le(n,m)  >] -> 'final'
```

parse and display

In [14]:
g6 = nltk.load_parser('h2.fcfg', trace=0, cache=False)
s6 = 'letter three is final'
s6_split = s6.split()
for tree in g6.parse(s6_split): print(tree)

(S[SEM=<all m.-le(3,m)>]
  (NP[SEM=<3>] letter three)
  (VP[SEM=<\n.all m.-le(n,m)>] is (A[SEM=<\n.all m.-le(n,m)>] final)))


In [15]:
tree.draw()

![6](6.png)

now the logical formula

In [16]:
t6=next(g6.parse(s6_split))
f6 = t6.label()['SEM']
print(f6)

all m.-le(3,m)


define a sample four word

In [17]:
e6 = [('emu',True),('batd',False),('cbja',False),('awe',True)]

In [18]:
words = [e[0] for e in e6]
truths = [e[1] for e in e6]
vals = [nltk.Valuation.fromstring(to_model_str(w, [get_vowel, follows])) for w in words]
assignments = [nltk.Assignment(val.domain) for val in vals]
for val in vals: emptysets(val)
models = [nltk.Model(val.domain, val) for val in vals]

In [19]:
print(f'{s6}\n---------------')
for w, a, m in zip(words, assignments, models):
    print(f'{w}\n{m.evaluate(str(f6),a)}\n----------------')

letter three is final
---------------
emu
True
----------------
batd
False
----------------
cbja
False
----------------
awe
True
----------------


```diff
+ The answer is correct! emu & aka has the second letter being consonants
```


7. every vowel is adjacent to letter three

   Include 'adjacent' in the grammar. Use the strategy with PP[to] to select the
   preposition.  Either define the semantics of 'adjacent' in terms of the available
   primitives, or add to the function that constructs valuations.

8. every vowel that follows letter two is capitalized
   This has a subject relative clause . Methodology for the
   semantics of subject relative clauses is in lecture8.ipynb.

9. some vowel immediately precedes letter three
   some vowel immediately follows letter three

  Define "immediately" in a way that works for both "precedes" and "follows".

10. Post at least one challenge problem to PS2 challenge on the forum before
   the target date for challenge problems.  Be creative rather than varying
   challenge problems from others in mechanical ways. Include your challenge problem
   here.

11. Solve at least one challenge problem. When you solve it, post core
   part of the result to ed.  Give your solution here in the format above. Don't pick
   a problem that somebody else has solved.

*12. If you made a contribution to defining the function that maps word strings
to valuations, describe it here.

Work individually, except that code for mapping a word to a valuation may be
shared.  Post techinical questions and requests for hints on the forum.

Notes
The problems are selected so that quantifiers are used only in subject position.
So it is not necessary (except perhaps in challenge problems) to apply the strategy
from simple-sem.fcfg to fit quantified NPs into object positions.

The words 'precedes' and 'follows' are interpreted in the sense of 'not necessarily
immediately'.