A Python interface to Google translate

If it is not already installed, this is likely to install it:

In [1]:
!pip install googletrans



Once installed, bring it into Python and create a Translator instance

In [2]:
from googletrans import Translator

In [3]:
t = Translator()

Then to translate, ask the Translator instance to translate, giving it source and destination languages. The result will be in the attribute `text`.

In [4]:
ja = t.translate("This is interesting", src='en', dest='ja').text

In [5]:
print(ja)

これは面白い


In [6]:
en2 = t.translate("これは面白い", src="ja", dest="en").text

In [7]:
print(en2)

This is funny


In [8]:
kr = t.translate("This is interesting", src='en', dest='ko').text

In [9]:
print(kr)

이것은 흥미 롭다


A bit more on part-of-speech tagging.

In [10]:
import nltk

In [11]:
text1 = "They refuse to permit us to obtain the refuse permit."

In [12]:
text = nltk.word_tokenize(text1)

In [13]:
text

['They',
 'refuse',
 'to',
 'permit',
 'us',
 'to',
 'obtain',
 'the',
 'refuse',
 'permit',
 '.']

The built-in magic tagger will do a somewhat reasonable job on this.  (This is the same thing that is used in the homework.)

In [14]:
nltk.pos_tag(text)

[('They', 'PRP'),
 ('refuse', 'VBP'),
 ('to', 'TO'),
 ('permit', 'VB'),
 ('us', 'PRP'),
 ('to', 'TO'),
 ('obtain', 'VB'),
 ('the', 'DT'),
 ('refuse', 'NN'),
 ('permit', 'NN'),
 ('.', '.')]

You can get some information about the definitions using `help`.  These tags come from the Penn Treebank.  (This is also described in the book.)

In [15]:
nltk.help.upenn_tagset('DT')

DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those


In [16]:
nltk.help.upenn_tagset('N.*')

NN: noun, common, singular or mass
    common-carrier cabbage knuckle-duster Casino afghan shed thermostat
    investment slide humour falloff slick wind hyena override subhumanity
    machinist ...
NNP: noun, proper, singular
    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
    Shannon A.K.C. Meltex Liverpool ...
NNPS: noun, proper, plural
    Americans Americas Amharas Amityvilles Amusements Anarcho-Syndicalists
    Andalusians Andes Andruses Angels Animals Anthony Antilles Antiques
    Apache Apaches Apocrypha ...
NNS: noun, common, plural
    undergraduates scotches bric-a-brac products bodyguards facets coasts
    divestitures storehouses designs clubs fragrances averages
    subjectivists apprehensions muses factory-jobs ...


Looking at `similar()` to find things that share the same distribution in a text (often this will correspond to syntactic category)

In [17]:
from nltk.book import *

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908


In [18]:
t = nltk.Text(w.lower() for w in text6)

In [19]:
t.similar('unladen')

african


In [20]:
t.similar('knight')

and so well question oh strand couple sort system shall launcelot task
way string glass cry feint


Looking at the universal tagset

In [21]:
from nltk.corpus import brown

In [22]:
brown.categories()

['adventure',
 'belles_lettres',
 'editorial',
 'fiction',
 'government',
 'hobbies',
 'humor',
 'learned',
 'lore',
 'mystery',
 'news',
 'religion',
 'reviews',
 'romance',
 'science_fiction']

In [23]:
' '.join(brown.sents(categories='reviews')[0])

'It is not news that Nathan Milstein is a wizard of the violin .'

In [24]:
brown.tagged_words(categories='reviews')[:10]

[('It', 'PPS'),
 ('is', 'BEZ'),
 ('not', '*'),
 ('news', 'NN'),
 ('that', 'CS'),
 ('Nathan', 'NP'),
 ('Milstein', 'NP'),
 ('is', 'BEZ'),
 ('a', 'AT'),
 ('wizard', 'NN')]

In [25]:
brown.tagged_words(categories='reviews', tagset='universal')[:10]

[('It', 'PRON'),
 ('is', 'VERB'),
 ('not', 'ADV'),
 ('news', 'NOUN'),
 ('that', 'ADP'),
 ('Nathan', 'NOUN'),
 ('Milstein', 'NOUN'),
 ('is', 'VERB'),
 ('a', 'DET'),
 ('wizard', 'NOUN')]

In [26]:
import nltk

Semantic representations. Here we define `dom` as the domain of individuals, the entities we can name.

In [27]:
dom = {'a', 'b', 'c', 'd', 'm', 's'}

Then we define the translation from English to those abstract individuals. We put this into a "valuation function" by using `nltk.Valuation.fromstring()`. The string `names` contains a kind of human-readable way to write the mapping, and then `nltk.Valuation.fromstring()` converts it into an internally-valid valuation function.

In [28]:
names = """
andrea => a
bobby => b
chris => c
dana => d
the_sun => s
the_moon => m
"""

In [29]:
val = nltk.Valuation.fromstring(names)

In [30]:
type(val)

nltk.sem.evaluate.Valuation

In [31]:
print(val)

{'andrea': 'a',
 'bobby': 'b',
 'chris': 'c',
 'dana': 'd',
 'the_moon': 'm',
 'the_sun': 's'}


We can ask the valuation function what the semantic value is of an English word it knows.

In [32]:
print(val['bobby'])

b


In [33]:
print(val['the_moon'])

m


We'll define some more mappings from English to the abstract model of the world.  We will define the English word "person" to be a property that is true of individuals `a`, `b`, `c`, and `d`.  And "spaceball" to be a property that is true of `m` and `s`.

In [34]:
valp = nltk.Valuation.fromstring("person => {a, b, c, d}")

We will also define a shortcut so we don't have to type `nltk.Valuation.fromstring` all the time.  After this we should just be able to type `vfs` ('valuation from string' is the idea).

In [35]:
vfs = nltk.Valuation.fromstring

In [36]:
valsb = vfs("spaceball => {s, m}")

In [37]:
print(valp)

{'person': {('a',), ('b',), ('c',), ('d',)}}


In [38]:
print(valsb)

{'spaceball': {('m',), ('s',)}}


In [39]:
print(val)

{'andrea': 'a',
 'bobby': 'b',
 'chris': 'c',
 'dana': 'd',
 'the_moon': 'm',
 'the_sun': 's'}


To add the assignments from the valuation function `valp` into the existing valuation funtion `val`, we can use `.update()`.  The way this works is that you tell the target valuation function to update itself with the contents of the argument.

In [40]:
val.update(valp)

In [41]:
print(val)

{'andrea': 'a',
 'bobby': 'b',
 'chris': 'c',
 'dana': 'd',
 'person': {('a',), ('b',), ('c',), ('d',)},
 'the_moon': 'm',
 'the_sun': 's'}


In [42]:
val.update(valsb)

In [43]:
print(val)

{'andrea': 'a',
 'bobby': 'b',
 'chris': 'c',
 'dana': 'd',
 'person': {('a',), ('b',), ('c',), ('d',)},
 'spaceball': {('m',), ('s',)},
 'the_moon': 'm',
 'the_sun': 's'}


Now let's add definitions for the properties of being from Boston and from Cambridge.

In [44]:
val.update(vfs("bostonian => {a, b}"))

In [45]:
val.update(vfs("cantabrigian => {c, d}"))

And we'll assert that andrea likes bobby.  "Likes" is a transitive verb, so it is a relation between individuals (rather than being a property of individuals). That is, "likes" should be represented by a set of pairs, which pair up the likers and the likees.  Note that this is not guaranteed to be a mutual relationship.

In [46]:
val.update(vfs("likes => {(a,b)}"))

In [47]:
print(val['likes'])

{('a', 'b')}


And, now, we try to make it mutual by updating `val` with "bobby likes andrea". Except it doesn't fully work.  Bobby comes to like Andrea, but Andrea doesn't like Bobby anymore.  Because we replaced the "like" relation, rather than adding to it. 

In [48]:
val.update(vfs("likes => {(b, a)}"))

In [49]:
print(val['likes'])

{('b', 'a')}


We can of course just fully replace the thing.

In [50]:
val.update(vfs("likes => {(a,b), (b,a)}"))

In [51]:
print(val['likes'])

{('a', 'b'), ('b', 'a')}


It would be nice to be able to add to whatever is already in the set.  We can do that as follows.  Recall that `val['likes']` will give us the set of pairs that's already there. So, what we want to do is ask that set of pairs to `.update()` itself with a set of pairs we would like to add.

In [52]:
val['likes'].update({('a','a')})

In [53]:
print(val['likes'])

{('a', 'b'), ('a', 'a'), ('b', 'a')}


That works, but it requires typing a lot of `'` characters.  We can also use `vfs` to do some of this for us.  If we write a throwaway valuation function like `valx` below, the only purpose of which is to parse the string `"{(a,b),(b,a)}"` into the proper set `{('a', 'b'), ('b', 'a')}"`, we can do something like this:

In [54]:
vfs("x => {(c, d)}")['x']

{('c', 'd')}

In [55]:
val['likes'].update(vfs("x => {(b, b)}")['x'])

In [56]:
print(val['likes'])

{('a', 'b'), ('b', 'b'), ('a', 'a'), ('b', 'a')}


So does Andrea like Bobby now?  Well we can check in our model of the world.  That means: is the pair of semantic value of 'andrea' and the semantic value of 'bobby' in the semantic value of 'like'?

In [57]:
(val['andrea'], val['bobby']) in val['likes']

True

Who are the people?

In [58]:
val['person']

{('a',), ('b',), ('c',), ('d',)}

Does Andrea like everyone? We can figure that out by going through all the people and seeing if Andrea is paired with each one of them. First, we can go through the people and make a list of Andrea's "like" status with each one.

Note that the valuation function for "person" above gave us a set of 1-member "tuples".  Pairs with one member.  What that means is that `'a'` is not in `val['person']` --- rather, `('a',)` is.  So if we want to go through the people, we need to pull out the actual individuals from these tuples.  The first way below does not work (we know it's not true that Andrea likes nobody, so there sould be some `True` in this list somehwere).  The second and third ways work.

In [59]:
[(val['andrea'], x) in val['likes'] for x in val['person']]

[False, False, False, False]

In [60]:
[(val['andrea'], x[0]) in val['likes'] for x in val['person']]

[True, True, False, False]

In [61]:
[(val['andrea'], x) in val['likes'] for (x,) in val['person']]

[True, True, False, False]

Now we can answer the question we were after.  Does Andrea like everyone?  If so, then there should be no `False` values in the lists we just found.  So "Andrea likes everyone" is true if...

In [62]:
not False in [(val['andrea'], x) in val['likes'] for (x,) in val['person']]

False

Ok, now let's do something more sophisticated. NLTK has a concept of a "Model" which is based on a domain of individuals and a valuation function (again, translation from English into this model).  The valuation function contains information about properties like "person" and relations like "likes".  We also need to specify an Assignment, which is more or less telling us what pronouns and names refer to.

In [63]:
m = nltk.Model(dom, val)
g = nltk.Assignment(dom)

Once we have this, we can ask the model to evaluate simple formulas, so we can find out if they are true or not.  So we can in a more simple way find out if someone likes another by asking the model to evaluate it:

In [64]:
print(m.evaluate('likes(dana, chris)', g))

False


In [65]:
print(m.evaluate('likes(andrea, bobby)', g))

True


The power of this is that it also can handle quantifiers and logic.  So, the work we did before to figure out whether Andrea likes everyone can be accomplished in the following way as well.  Maybe this doesn't seem all that much simpler, but it kind of is.

If you haven't seen this before, the way this works is this: It goes through all of the individuals (including both spaceballs and people) and evaluates whether being a person implies being liked by Andrea.  Because A -> B is true if A is false (A -> B is true in all circumstances except with A is true and B is false), this gets the right truth conditions.

In [66]:
print(m.evaluate('all x.(person(x) -> likes(andrea, x))', g))

False


In [67]:
print(val)

{'andrea': 'a',
 'bobby': 'b',
 'bostonian': {('a',), ('b',)},
 'cantabrigian': {('c',), ('d',)},
 'chris': 'c',
 'dana': 'd',
 'likes': {('a', 'b'), ('b', 'b'), ('a', 'a'), ('b', 'a')},
 'person': {('a',), ('b',), ('c',), ('d',)},
 'spaceball': {('m',), ('s',)},
 'the_moon': 'm',
 'the_sun': 's'}


Let's actually update it so Andrea likes everyone, and then try it again.

In [68]:
val.update(vfs("likes => {(a, b), (a,c), (a,a), (a, d)}"))

In [69]:
val

{'andrea': 'a',
 'bobby': 'b',
 'chris': 'c',
 'dana': 'd',
 'the_sun': 's',
 'the_moon': 'm',
 'person': {('a',), ('b',), ('c',), ('d',)},
 'spaceball': {('m',), ('s',)},
 'bostonian': {('a',), ('b',)},
 'cantabrigian': {('c',), ('d',)},
 'likes': {('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')}}

In [70]:
print(m.evaluate('all x.(person(x) -> likes(andrea, x))', g))

True


There was a question in class about whether False -> True evaluates as True.  It does, and below is my attempt to show this.  "The moon" is not a person, so the formula below reduces to "False -> True" and the whole thing evaluates as True.  As does False -> False.  And True -> True as well.  The only one where the formula evaluates as false is True -> False.

In [71]:
print(m.evaluate('person(the_moon) -> person(andrea)', g))
print(m.evaluate('person(the_moon) -> person(the_moon)', g))
print(m.evaluate('person(andrea) -> person(andrea)', g))
print(m.evaluate('person(andrea) -> person(the_moon)', g))

True
True
True
False


Now, on to building a syntax for semantic combination.  We will start by defining the semantic representation of NP nodes.

In [72]:
npdef = """
NP[SEM=<andrea>] -> 'andrea'
NP[SEM=<bobby>] -> 'bobby'
NP[SEM=<chris>] -> 'chris'
NP[SEM=<dana>] -> 'dana'
NP[SEM=<the_sun>] -> 'the_sun'
NP[SEM=<the_moon>] -> 'the_moon'
"""

In [73]:
cfgdef = r"""
% start S
S[SEM=<?vp(?subj)> ] -> NP[SEM=?subj ] VP[SEM=?vp ]
"""

But actually, we only scratched the surface of that, we'll come back to it.  Below were some notes about how `+=` works and raw strings, and that was it.

In [74]:
a_string = "hello"

In [75]:
a_number = 4

In [76]:
print(a_string + ", world")

hello, world


In [77]:
a_string = a_string + ", world"

In [78]:
a_string

'hello, world'

In [79]:
a_number = a_number + 2

In [80]:
a_number

6

In [81]:
a_number += 3

In [82]:
print(a_number)

9


In [83]:
a_string += '!!!'

In [84]:
print(a_string)

hello, world!!!


In [85]:
a_number -= 3

In [86]:
a_number

6

In [87]:
print(cfgdef)


% start S
S[SEM=<?vp(?subj)> ] -> NP[SEM=?subj ] VP[SEM=?vp ]



In [88]:
cfgdef += r"""
VP[SEM=<?v(?obj)>] -> V[SEM=?v] NP[SEM=?obj]
V[SEM=<\y.\x.likes(x,y)] -> 'likes'
"""

In [89]:
print(r"Hello\", world")

Hello\", world
