# Assignment 5: Grounded, Lexical Semantics

## Natural Language Processing - Boise State University

### Instructions

* Attached to the corresponding Trello card for this assignment are the files `features.txt` and `segmented-labeled.txt` which have data for a reference resolution task. I have already done a lot of the data munging for you. The rdg_munging.html / rdg_munging.ipynb notebook shows how I did that. At the very end I saved two data frames as two pickles named `scenedata.pkl` and `refexpdata.pkl`. You will use these two files. 
* You are to use the `scenedata.pkl` and `refexpdata.pkl` files to train logistic regression classifiers that take low-level object ('visual') data as features and produce a probability that an object matches a word's classifier. 

**scenedata** scenes are separated by `episodeid`. For each `eposodeid`, there are 8 images, each with an `imageid`. For each image, there could be between 1 and 7 `pieceid` depending on the scene type. 

Below is an example Scene where each image has two pieces (see http://www.sigdial.org/workshops/conference17/proceedings/pdf/SIGDIAL30.pdf for more information):

![title](rdg_scene_example.png) 

Using these kinds of scenes, the task was for the *Director* who knew which object needed to be selected, was to instruct the *Matcher* just which object that was. The *Director*'s game screen had the same images on it, but they were usually in a different order, forcing the *Director* to describe the objects in the image rather than the image placement on the grid (e.g., so a *Director* couldn't just say something like "first row, second column") to indicate an image).

The goal of this assignment is to use the data to train logistic regression classifiers for each word in the corpus and evaluate how well they can be used for resolving references to visual objects. **Note** that the goal is to resolve references to individual objects, not individual images (i.e., images can have more than one object in them). 

First, load the data and get an idea what it is:

The dataframe `scenes` is like a database that has the features of each object in each image for each episode. 

The dataframe `refs` has the referring expressions, where each each `id` represents an individual referring expression (i.e., grouping by id groups all the words in a referring rexpression), the `episodeid`, `imageid`, and `targetid` denote the episode, image of the episode, and target object in the image that is being referred by that referring expression. Note that for all referring expressions grouped by an id, the `id`, `episodeid`, `imageid`, and `targetid` are the same. The only thing that is different are the words in the word column. The words are ordered by row. (See example in the above cell.)

Note that the targetid is the pieceid for the referred object in a particular `episodeid`/`imageid`

In [2]:
import pandas as pd
import numpy as np

In [3]:
scenes = pd.read_pickle('scenedata.pkl')
refs = pd.read_pickle('refexpdata.pkl')

refs['type'] = refs.episodeid.map(lambda x: x.split('/')[0])
refs = refs[refs.type == 'Set0'] # we only use images where there is only one object in the image

In [3]:
scenes.columns

Index(['pieceid', 'imageid', 'episodeid', 'r', 'g', 'b', 'h', 's', 'v',
       'orientation', 'num_edges', 'pos_x', 'pos_y', 'h_skew_left-skewed',
       'h_skew_right-skewed', 'h_skew_symmetric', 'v_skew_bottom-skewed',
       'v_skew_symmetric', 'v_skew_top-skewed', 'c_diff'],
      dtype='object')

In [6]:
scenes[:5]

Unnamed: 0,pieceid,imageid,episodeid,r,g,b,h,s,v,orientation,num_edges,pos_x,pos_y,h_skew_left-skewed,h_skew_right-skewed,h_skew_symmetric,v_skew_bottom-skewed,v_skew_symmetric,v_skew_top-skewed,c_diff
0,0,1,Set0/1,86.480225,57.164215,46.304261,8.293657,127.795376,86.661635,5.742743,8,199,164,0,1,0,1,0,0,257.870122
1,0,2,Set0/1,79.55544,74.452909,59.535351,22.51474,74.233586,79.337073,41.51936,10,222,159,1,0,0,0,0,1,273.065926
2,0,3,Set0/1,130.428545,111.25028,86.211567,17.137593,94.26875,131.00056,-7.716261,12,203,161,0,0,1,1,0,0,259.094577
3,0,4,Set0/1,69.591751,55.848775,83.48426,135.273859,92.572226,83.479976,-21.40881,8,222,151,0,0,1,0,0,1,268.486499
4,0,5,Set0/1,36.108723,79.887808,112.033928,102.723919,177.755478,112.230646,42.677817,6,220,169,1,0,0,0,0,1,277.418456


In [7]:
refs.columns

Index(['id', 'episodeid', 'imageid', 'target', 'word'], dtype='object')

In [13]:
refs[refs.id == 4] # show the referring expression for id=4

Unnamed: 0,id,episodeid,imageid,target,word
3,4,Set0/1,8,0,like
3,4,Set0/1,8,0,off
3,4,Set0/1,8,0,to
3,4,Set0/1,8,0,the
3,4,Set0/1,8,0,left
3,4,Set0/1,8,0,like
3,4,Set0/1,8,0,a
3,4,Set0/1,8,0,reverse
3,4,Set0/1,8,0,l


### Procedure and Hints

* This was made easier for me using pandasql / pysqldf, but anything that can be done using pandasql/pydsqldf can be done using pandas merge functions. 
* You will need to somehow join/merge the refs with the data such that you can get the object features of the target objects (the key columns will be episodeid, imageid, and targetid/pieceid
* Split your data into train/test. You can randomly choose 100 referring expressions (ids) for testing. 
* Training is tricky. You need to do the following for each word in the vocabulary:
   * Get all of the features for the objects where that word was used. These are your positive training examples. 
   * Randomly choose features for objects where that word was *not* used. These are your negative training examples. 
   * You should have the same number of negative and positive training examples
   * Use `0` to label the negative training examples and `1` to label to positive training examples. 
   * Train the logistic regression classifier using the labeleled positive and negative examples (penalty='l2' helps here). 
   * I recommend using a dictionary where key=word, value=classifier
* Testing is also tricky. You need to make sure you are conducting a realistic test. You want to represent your data as if you are looking at a scene. That means, for a referring expression, you want the 8 corresponding images and all of the objects in those images. You then take the words in the referring expression, get their respective classifiers, and test them on each of the objects in each of the images. For each object, you will sum the probabilities that are returned for each classifier. The object with the highest score (i.e., the highest sum of probabilities) will be the guessed referent object. To calculate accuracy, you will check to see if that object's pieceid matches the targetid. If they do, then your accuracy increases. 
    * I was able to do testing using a query that joined the test and scene data into a dataframe such that all words and all objects were reprsented in individual rows. 
    * I then made a new column in that dataframe that was the probability of applying the word in a row to the object features in the same row. 
    * I then used a query to sum the results over the objects (accomplished by grouping by certain columns).
    * I then used a query to find the max-scored object and compared that with the target. 
* For this assignment, your accuracy needs to be above 50%. That seems low, but at the best when there is one object in each of the 8 images, the baseline is 1/8 (12.5%).

## Train

## Test