# Grounded Semantics 

* According to Steven Pinker:

```
“Semantics is about the relation of words to thoughts, but it also about the relation of words to other human concerns. Semantics is about the relation of words to reality—the way that speakers commit themselves to a shared understanding of the truth, and the way their thoughts are anchored to things and situations in the world.”
```

So now we have to ask the following questions:

* How can we represent the meaning of words in machines?
* How can we represent "the world" in machines?
  * Symbolically? As pixels? Something in between?

#### Troubleshooting:

* When a dataframe doesn't resolve dtypes, it means there are columns with the same name. 

# 'TAKE' corpus


### A user seated in front of a large screen would see a scene like this:

![title](data/r6_11.png) 

### User determines which object to refer to, then says something like:
#### "the pink object on the very top right"

![title](data/r6_11_s.png) 

### the Words-as-Classifiers model

* Represent each word as a logistic regression classifier
* Features for the classifiers are the object features (e.g., RGB, HSV, X/Y, skewness ... using OpenCV)
* Each classifier has a bunch of positive and negative training examples

In [1]:
import sqlite3
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
con = sqlite3.connect('take.db')

In [3]:
import pandas as pd

In [4]:
#
#  read in data as query from sqlite database into a DataFrame
#

tiles = pd.read_sql_query("SELECT * FROM cv_piece_raw", con)

In [5]:
pd.read_sql_query("SELECT * FROM cv_piece_raw where episode_id = 'r6.163'", con)

Unnamed: 0,episode_id,id,r,g,b,h,s,v,orientation,h_skew,v_skew,num_edges,position,pos_x,pos_y
0,r6.163,tile-3,230.513646,0.305808,238.0,145.663401,249.536039,238.657803,44.035262,right-skewed,symmetric,7,right center,541.0,76.444444
1,r6.163,tile-7,8.220617,239.236376,0.317137,57.334209,248.110965,238.117531,-33.125853,left-skewed,symmetric,10,center center,139.333333,78.222222
2,r6.163,tile-6,193.17341,182.899807,182.899807,0.0,13.836224,192.75851,19.269183,right-skewed,top-skewed,6,left center,60.666667,81.333333
3,r6.163,tile-9,0.355263,239.186678,231.658717,86.438322,247.976974,238.333059,12.886634,symmetric,bottom-skewed,8,left center,104.666667,125.333333
4,r6.163,tile-5,195.056657,184.686261,184.686261,0.0,13.804533,193.267705,25.7109,right-skewed,symmetric,6,left center,60.0,126.222222
5,r6.163,tile-2,232.880282,0.161972,240.447183,147.163732,251.992958,240.973592,-10.85005,right-skewed,symmetric,8,right center,577.333333,124.888889
6,r6.163,tile-12,233.111111,0.150327,240.691028,146.609626,251.085561,240.480689,7.177631,left-skewed,bottom-skewed,8,right center,499.0,128.0
7,r6.163,tile-4,193.261189,182.993988,182.993988,0.0,13.780227,190.502338,2.037598,symmetric,symmetric,12,left center,100.666667,176.444444
8,r6.163,tile-14,0.279476,8.174672,239.302038,117.527656,251.981077,240.982533,14.827662,right-skewed,top-skewed,6,right center,537.0,185.333333
9,r6.163,tile-8,240.426959,0.144201,0.144201,0.0,252.858307,241.339812,3.801395,symmetric,bottom-skewed,6,left top,63.666667,326.666667


In [14]:
pd.read_sql_query("SELECT * FROM referent where episode_id = 'r6.163'", con)

Unnamed: 0,episode_id,object
0,r6.163,tile-0


In [15]:
pd.read_sql_query("SELECT * FROM hand where episode_id = 'r6.163'", con)

Unnamed: 0,episode_id,inc,word,start_time,end_time
0,r6.163,1,dann,0.07,0.68
1,r6.163,2,<sil>,0.68,1.68
2,r6.163,3,<sil>,1.68,1.78
3,r6.163,4,aus,1.78,2.06
4,r6.163,5,der,2.06,2.24
5,r6.163,6,linken,2.24,2.65
6,r6.163,7,hälfte,2.65,3.4
7,r6.163,8,das,3.4,3.74
8,r6.163,9,<sil>,3.74,3.8
9,r6.163,10,lila,3.8,4.1


In [6]:
#
#    do a one-hot encoding for string features
#

tiles['v_top_skewed'] = tiles.v_skew == 'top_skewed'
tiles.v_top_skewed = tiles.v_top_skewed.astype(int)
tiles['v_symmetric'] = tiles.v_skew == 'symmetric'
tiles.v_symmetric = tiles.v_symmetric.astype(int)
tiles['v_bottom_skewed'] = tiles.v_skew == 'bottom-skewed'
tiles.v_bottom_skewed = tiles.v_bottom_skewed.astype(int)

tiles['h_top_skewed'] = tiles.h_skew == 'right_skewed'
tiles.h_top_skewed = tiles.v_bottom_skewed.astype(int)
tiles['h_symmetric'] = tiles.h_skew == 'symmetric'
tiles.h_symmetric = tiles.v_bottom_skewed.astype(int)
tiles['h_bottom_skewed'] = tiles.h_skew == 'left-skewed'
tiles.h_bottom_skewed = tiles.v_bottom_skewed.astype(int)

In [7]:
#
#   drop columns that don't have continuous features
#

tiles.drop(['h_skew','v_skew','position'], 1, inplace=True)

In [8]:
#
#  get euclidean distance from center
#

from scipy.spatial import distance

center = (0,0)

tiles['c_diff'] = tiles.apply(lambda x: distance.euclidean(center, (x['pos_x'], x['pos_y'])), axis=1)

In [9]:
tiles[:15]

Unnamed: 0,episode_id,id,r,g,b,h,s,v,orientation,num_edges,pos_x,pos_y,v_top_skewed,v_symmetric,v_bottom_skewed,h_top_skewed,h_symmetric,h_bottom_skewed,c_diff
0,r2.10,tile-13,143.137517,244.64486,11.174232,42.626836,241.875834,245.367156,32.541958,6,59.333333,73.777778,0,1,0,0,0,0,94.676317
1,r2.10,tile-7,11.154438,241.869822,111.37574,72.049704,240.81716,242.338462,22.245955,8,141.666667,72.444444,0,0,1,1,1,1,159.115185
2,r2.10,tile-5,11.483669,240.073662,110.711605,71.37665,238.566366,240.884642,6.379068,9,101.666667,80.444444,0,0,1,1,1,1,129.643433
3,r2.10,tile-4,11.179123,143.223595,244.788758,102.109327,241.885732,245.515133,30.315793,6,534.333333,124.888889,0,1,0,0,0,0,548.734312
4,r2.10,tile-9,11.769231,240.94076,111.261715,71.386384,238.613616,242.626879,21.37789,6,99.0,178.666667,0,1,0,0,0,0,204.261543
5,r2.10,tile-1,241.915171,11.662316,141.810767,159.54323,238.823817,243.602773,15.818443,6,61.333333,180.444444,0,1,0,0,0,0,190.58325
6,r2.10,tile-0,244.332317,11.141463,142.926829,161.509146,241.765244,244.720732,34.277244,6,572.0,326.666667,0,1,0,0,0,0,658.707151
7,r2.10,tile-12,111.263126,11.262523,241.694629,130.913096,240.164152,241.809294,1.882938,8,499.666667,332.0,0,0,1,1,1,1,599.908975
8,r2.10,tile-8,11.75817,139.829248,238.365196,99.213235,235.023693,237.897876,43.787662,7,65.0,332.0,0,1,0,0,0,0,338.303119
9,r2.10,tile-14,11.065722,243.089518,111.771105,72.875921,243.579603,244.397167,-9.805954,6,139.0,329.333333,0,1,0,0,0,0,357.465305


In [12]:
#
#    now, get the targets (referents) as DataFrames
#
targs = pd.read_sql_query("SELECT * FROM referent", con)

In [13]:
targs.columns = ['episode_id', 'target']

In [14]:
targs[:5]

Unnamed: 0,episode_id,target
0,r6.163,tile-0
1,r6.162,tile-1
2,r6.161,tile-6
3,r6.160,tile-1
4,r6.165,tile-11


In [15]:
from pandasql import sqldf
from pandasql import *

pysqldf = lambda q: sqldf(q, globals())

In [14]:
#
#   the result of this should be a df of the target objects' features
#

query = '''
SELECT tiles.* FROM
targs 
INNER JOIN
tiles
ON targs.episode_id = tiles.episode_id
AND targs.target = tiles.id;
'''

targets = pysqldf(query)

In [15]:
pysqldf("select * from targets")

Unnamed: 0,episode_id,id,r,g,b,h,s,v,orientation,num_edges,pos_x,pos_y,v_top_skewed,v_symmetric,v_bottom_skewed,h_top_skewed,h_symmetric,h_bottom_skewed,c_diff
0,r6.163,tile-0,233.084091,0.143750,240.659659,147.222159,252.088636,240.796591,-5.293050,8,102.000000,428.000000,0,1,0,0,0,0,439.986363
1,r6.162,tile-1,195.261845,39.054239,0.000000,5.836658,247.846010,193.120948,-3.267609,12,575.666667,125.777778,0,1,0,0,0,0,589.247113
2,r6.161,tile-6,9.110448,240.550746,0.879104,56.974627,249.061194,240.691791,17.594086,8,67.666667,75.555556,0,0,1,1,1,1,101.426918
3,r6.160,tile-1,94.209113,72.955248,72.955248,0.000000,56.328723,92.538649,-36.517355,8,104.666667,431.111111,0,0,1,1,1,1,443.634874
4,r6.165,tile-11,110.459885,204.237822,38.082378,46.014327,202.752149,203.656877,8.334750,8,101.333333,130.666667,0,0,1,1,1,1,165.354837
5,r6.164,tile-8,99.305848,0.877193,243.987135,129.614035,251.162573,244.852047,-2.428847,7,64.000000,72.000000,0,0,1,1,1,1,96.332757
6,r7.175,tile-1,238.527867,4.868674,152.513133,158.421525,245.993594,237.588725,36.929499,8,63.666667,127.555556,0,1,0,0,0,0,142.561791
7,r6.169,tile-14,96.279748,77.738377,77.738377,0.000000,47.707644,94.838455,10.650640,8,139.666667,130.666667,0,0,1,1,1,1,191.260439
8,r6.168,tile-13,22.958254,79.822581,177.995256,106.008539,216.186907,179.277040,-23.855756,6,578.000000,436.444444,0,0,1,1,1,1,724.270497
9,r7.174,tile-4,210.667506,194.443325,188.782116,7.858942,26.829345,208.678841,28.462700,6,496.666667,378.666667,0,1,0,0,0,0,624.552818


In [16]:
#
#   the result of this should be a df of the non-target objects' features
#

query = '''
SELECT tiles.* FROM
tiles
LEFT OUTER JOIN
targs
ON targs.episode_id = tiles.episode_id
AND targs.target = tiles.id
WHERE targs.target is null;
'''

non_targets = pysqldf(query)

In [17]:
pysqldf("select * from non_targets where episode_id = 'r6.163'")

Unnamed: 0,episode_id,id,r,g,b,h,s,v,orientation,num_edges,pos_x,pos_y,v_top_skewed,v_symmetric,v_bottom_skewed,h_top_skewed,h_symmetric,h_bottom_skewed,c_diff
0,r6.163,tile-3,230.513646,0.305808,238.0,145.663401,249.536039,238.657803,44.035262,7,541.0,76.444444,0,1,0,0,0,0,546.374188
1,r6.163,tile-7,8.220617,239.236376,0.317137,57.334209,248.110965,238.117531,-33.125853,10,139.333333,78.222222,0,1,0,0,0,0,159.788904
2,r6.163,tile-6,193.17341,182.899807,182.899807,0.0,13.836224,192.75851,19.269183,6,60.666667,81.333333,0,0,0,0,0,0,101.467017
3,r6.163,tile-9,0.355263,239.186678,231.658717,86.438322,247.976974,238.333059,12.886634,8,104.666667,125.333333,0,0,1,1,1,1,163.28979
4,r6.163,tile-5,195.056657,184.686261,184.686261,0.0,13.804533,193.267705,25.7109,6,60.0,126.222222,0,1,0,0,0,0,139.757109
5,r6.163,tile-2,232.880282,0.161972,240.447183,147.163732,251.992958,240.973592,-10.85005,8,577.333333,124.888889,0,1,0,0,0,0,590.686899
6,r6.163,tile-12,233.111111,0.150327,240.691028,146.609626,251.085561,240.480689,7.177631,8,499.0,128.0,0,0,1,1,1,1,515.155316
7,r6.163,tile-4,193.261189,182.993988,182.993988,0.0,13.780227,190.502338,2.037598,12,100.666667,176.444444,0,1,0,0,0,0,203.141379
8,r6.163,tile-14,0.279476,8.174672,239.302038,117.527656,251.981077,240.982533,14.827662,6,537.0,185.333333,0,0,0,0,0,0,568.082251
9,r6.163,tile-8,240.426959,0.144201,0.144201,0.0,252.858307,241.339812,3.801395,6,63.666667,326.666667,0,0,1,1,1,1,332.813094


In [16]:
#
#   now get the utterances / referring expressions (REs)
#
utts = pd.read_sql_query("SELECT * FROM hand", con)

In [17]:
utts.columns

Index(['episode_id', 'inc', 'word', 'start_time', 'end_time'], dtype='object')

In [19]:
utts[:25]

Unnamed: 0,episode_id,inc,word,start_time,end_time
0,r6.163,1,dann,0.07,0.68
1,r6.163,2,<sil>,0.68,1.68
2,r6.163,3,<sil>,1.68,1.78
3,r6.163,4,aus,1.78,2.06
4,r6.163,5,der,2.06,2.24
5,r6.163,6,linken,2.24,2.65
6,r6.163,7,hälfte,2.65,3.4
7,r6.163,8,das,3.4,3.74
8,r6.163,9,<sil>,3.74,3.8
9,r6.163,10,lila,3.8,4.1


In [20]:
#
#   the result of this should be words and object features
#

query = '''
SELECT utts.word, utts.inc, targets.* FROM
targets 
INNER JOIN
utts
ON targets.episode_id = utts.episode_id
'''

positive = pysqldf(query)

In [21]:
#
#   the result of this should be words and object features
#

query = '''
SELECT utts.word, utts.inc, non_targets.* FROM
non_targets 
INNER JOIN
utts
ON non_targets.episode_id = utts.episode_id
'''

negative = pysqldf(query)

In [22]:
negative.drop_duplicates(subset=['inc', 'episode_id', 'id'], inplace=True)

In [23]:
pysqldf("select * from negative where episode_id = 'r6.163'")

Unnamed: 0,word,inc,episode_id,id,r,g,b,h,s,v,...,num_edges,pos_x,pos_y,v_top_skewed,v_symmetric,v_bottom_skewed,h_top_skewed,h_symmetric,h_bottom_skewed,c_diff
0,dann,1,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
1,<sil>,2,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
2,<sil>,3,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
3,aus,4,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
4,der,5,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
5,linken,6,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
6,hälfte,7,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
7,das,8,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
8,<sil>,9,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188
9,lila,10,r6.163,tile-3,230.513646,0.305808,238.000000,145.663401,249.536039,238.657803,...,7,541.000000,76.444444,0,1,0,0,0,0,546.374188


In [24]:
len(negative), len(positive)

(194019, 13863)

# train

In [25]:
import random

num_eval = 100

eids = set(positive.episode_id)
test_eids = set(random.sample(eids, num_eval))
train_eids = eids - test_eids
positive_train = positive[positive.episode_id.isin(train_eids)]
negative_train = negative[negative.episode_id.isin(train_eids)]

In [26]:
len(positive_train), len(negative_train)

(12414, 173733)

In [None]:
words = list(set(utts.word))

In [28]:
#
#    train the WACs using logres, wac dictionary will have them
#

from sklearn import linear_model


wac = {}
todrop = ['word', 'inc', 'episode_id', 'id']

for word in words:
    # get all pos and neg examples for current word
    pos_word_frame = positive_train[positive_train.word == word]
    pos_word_frame = np.array(pos_word_frame.drop(todrop, 1))
    neg_word_frame = negative_train[negative_train.word == word]
    neg_word_frame = np.array(neg_word_frame.drop(todrop, 1))
    sample = random.sample(list(neg_word_frame), len(pos_word_frame))

    if len(sample) == 0: continue
    examples = np.concatenate((pos_word_frame, sample))
    labels = np.concatenate(([1] * len(pos_word_frame), [0] * len(pos_word_frame)))
    if len(set(labels)) < 2: continue #TODO: this shouldn't be necessary
    regr = linear_model.LogisticRegression(penalty='l2')
    regr.fit(examples, labels)
    wac[word] = regr

# test

In [29]:
#
#   prepare the evalation data
#
if '<sil>' in wac: del wac['<sil>']
utts_eval = utts[utts.episode_id.isin(test_eids)]
utts_eval = utts[utts.word.isin(wac)] # remove words not in WAC
tiles_eval = tiles[tiles.episode_id.isin(test_eids)]

In [30]:
#
#   the result of this should be words and object features
#

query = '''
SELECT utts_eval.word, utts_eval.inc, tiles_eval.* FROM
utts_eval
INNER JOIN
tiles_eval
ON utts_eval.episode_id = tiles_eval.episode_id
'''

eval_data = pysqldf(query)

In [35]:
eval_data[:10]

Unnamed: 0,word,inc,episode_id,id,r,g,b,h,s,v,...,num_edges,pos_x,pos_y,v_top_skewed,v_symmetric,v_bottom_skewed,h_top_skewed,h_symmetric,h_bottom_skewed,c_diff
0,das,1,r6.169,tile-0,167.759236,0.0,0.0,0.0,251.859606,166.188424,...,6,533.0,122.222222,0,0,0,0,0,0,546.833861
1,das,1,r6.169,tile-1,168.328144,0.0,0.0,0.0,252.404192,166.446707,...,6,57.666667,172.444444,0,0,0,0,0,0,181.83105
2,das,1,r6.169,tile-10,167.527122,72.858966,0.0,12.810466,251.257179,166.029994,...,6,59.666667,125.777778,0,1,0,0,0,0,139.212645
3,das,1,r6.169,tile-11,167.45501,72.832802,0.0,12.822591,251.582642,165.892151,...,6,571.666667,377.333333,0,1,0,0,0,0,684.969505
4,das,1,r6.169,tile-12,164.736806,0.0,0.0,0.0,248.271313,163.689445,...,8,63.333333,329.777778,0,1,0,0,0,0,335.804249
5,das,1,r6.169,tile-13,0.0,93.34913,165.182963,101.085783,250.257948,164.347331,...,8,140.333333,428.888889,0,1,0,0,0,0,451.263918
6,das,1,r6.169,tile-14,96.279748,77.738377,77.738377,0.0,47.707644,94.838455,...,8,139.666667,130.666667,0,0,1,1,1,1,191.260439
7,das,1,r6.169,tile-2,166.992006,0.0,0.0,0.0,251.293605,165.768895,...,6,135.333333,175.555556,0,0,0,0,0,0,221.663854
8,das,1,r6.169,tile-3,167.353373,0.0,0.0,0.0,252.61523,166.140949,...,6,496.666667,176.0,0,1,0,0,0,0,526.928627
9,das,1,r6.169,tile-4,93.791228,165.948246,0.0,42.629825,252.763158,166.363158,...,5,578.333333,133.333333,0,0,1,1,1,1,593.504189


In [31]:
#
#   prepare the DataFrame that will hold the results
#

res = pd.DataFrame()

res['word'] = eval_data.word
res['episode_id'] = eval_data.episode_id
res['id'] = eval_data.id
res['inc'] = eval_data.inc


In [32]:
#
# for scikit classifiers
# 
res['p'] = eval_data.apply(lambda x: wac[x.word].predict_proba(np.array(list(map(lambda x: np.float(x), x[4:]))).reshape(1, -1))[0][1], axis=1)
res[:10]

Unnamed: 0,word,episode_id,id,inc,p
0,das,r6.169,tile-0,1,0.540574
1,das,r6.169,tile-1,1,0.492154
2,das,r6.169,tile-10,1,0.542769
3,das,r6.169,tile-11,1,0.465901
4,das,r6.169,tile-12,1,0.565137
5,das,r6.169,tile-13,1,0.408972
6,das,r6.169,tile-14,1,0.600581
7,das,r6.169,tile-2,1,0.447428
8,das,r6.169,tile-3,1,0.549868
9,das,r6.169,tile-4,1,0.571395


In [33]:
#
#   the result of this should be words and object features
#

query = '''
SELECT targs.target, res.* FROM
res
INNER JOIN
targs
ON targs.episode_id = res.episode_id
'''

res = pysqldf(query)

In [34]:
#
#   run evaluation as a sanity check
#

# sum up each object's values
query = '''
SELECT episode_id, target, sum(p) as p, id FROM
res 
group by episode_id, id
'''

result = pysqldf(query)

# find the argmax object for each episode
query = '''
SELECT episode_id, target, id, max(p) as max FROM
result 
GROUP BY episode_id
'''

result = pysqldf(query)

len(result[result.id == result.target]) / float(num_eval)

0.68