# MIT AI for Code and Science workshop

In today's workshop we will learn

[list learning outcomes]

Domains help us restrict the search space of our problem. 

## CLEVR Domain

We will demonstrate the neurosymbolic paradigm using a recent dataset for [Compositional Language and Elementary Visual Reasoning](https://cs.stanford.edu/people/jcjohns/clevr/). The dataset contains visual scenes of objects with various attributes and relations. Each scene has an associated set of natural language question and answer pairs, e.g., 

> <img src='https://cs.stanford.edu/people/jcjohns/clevr/teaser.jpg'>
>
> Question: "*There is a sphere with the same size as the metal cube; is it made of the same material as the small red sphere?*"
>
> Answer: No (small red sphere is rubber material, whereas the sphere with the same size as the metal cube is metal material).



## Load dataset

In [2]:
%cd /content/drive/MyDrive/ucla/aishni-omar/

/content/drive/MyDrive/ucla/aishni-omar


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Task

Visual Question and Answering.

## Neural Representation

Implement a neural network to solve the task. Pros: neural embeddings are an effective way to capture nuances about natural visual scenes, e.g. geometry, color, and shape.

In [43]:
# in progress

Failure Cases

In [44]:
# in progress

Run the cell below to inspect a neural embedding. You will notice that this representation is not interpretable. Has the network selected features that are truly relavant to the task?

In [45]:
# in progress

## Symbolic Representation

Reasoning with symbolic programs is interpretable.

#### Defining CLEVR as a python class (ignore)

In [None]:
# pass image into a symbolic function to get an output (select red cubes)

# pass sentence into a symbolic function to get an answer (how many red cubes? 3)

In [None]:
# embed an image with a neural network
# show plot

# embed a sentence with neural network
# show plot

### CLEVR Dataset Grammar

Questions in the CLEVR dataset can be represented as functional program. 

Programs are typically represented as a grammar. [define grammar].

Run the cell below to inspect the CLEVR dataset grammar. Here it is represented as a Python dictionary where each item is a (key, value) pair describing the functions in the grammar. Every function has a name, a set of inputs, side_inputs, output and boolean variable indicating whether the function is a terminal.

In [28]:
import json


with open('datasets/clevr-dataset-gen-main/question_generation/metadata.json') as f:
    metadata = json.load(f)

functions_by_name = {}
for f in metadata['functions']:
  functions_by_name[f['name']] = f
metadata['_functions_by_name'] = functions_by_name
grammar = metadata['_functions_by_name']
grammar

{'scene': {'name': 'scene',
  'inputs': [],
  'output': 'ObjectSet',
  'terminal': False},
 'filter_color': {'name': 'filter_color',
  'inputs': ['ObjectSet'],
  'side_inputs': ['Color'],
  'output': 'ObjectSet',
  'terminal': False},
 'filter_shape': {'name': 'filter_shape',
  'inputs': ['ObjectSet'],
  'side_inputs': ['Shape'],
  'output': 'ObjectSet',
  'terminal': False},
 'filter_size': {'name': 'filter_size',
  'inputs': ['ObjectSet'],
  'side_inputs': ['Size'],
  'output': 'ObjectSet',
  'terminal': False},
 'filter_material': {'name': 'filter_material',
  'inputs': ['ObjectSet'],
  'side_inputs': ['Material'],
  'output': 'ObjectSet',
  'terminal': False},
 'unique': {'name': 'unique',
  'inputs': ['ObjectSet'],
  'output': 'Object',
  'terminal': False,
  'properties': []},
 'relate': {'name': 'relate',
  'inputs': ['Object'],
  'side_inputs': ['Relation'],
  'output': 'ObjectSet',
  'terminal': False},
 'union': {'name': 'union',
  'inputs': ['ObjectSet', 'ObjectSet'],
  'out

In [58]:
class CLEVR_OBJECT():
    def __init__(self):
      self.attribute = self.attribute = {  
        'size' : ["small", "large"],
        'color' : ["gray", "red", "blue", "green", "brown", "purple", "cyan", "yellow"],
        'shape' : ["cube", "sphere", "cylinder"],
        'material' : ["rubber", "metal"]
      }
      self.position
    def get_attr(self, x):
      return [k for k, v in self.attribute.items() if x in v][0]

class CLEVR_DSL():
  def __init__(self):
    self.obj = CLEVR_OBJECT()
    self.integer = [{str(i)} for i in range(11)]
    self.relation = ["left", "right", "behind", "front"]
    self.str2func = {'count': self.count}
  
  def query(self, obj, attr):
    i = obj.index[0]
    return obj[attr][i]

  def count(self, objects):
    return len(objects)
  
  #def relate(self, object, relation):

class ProgramExecutor(CLEVR_DSL):
  def __init__(self):
    super().__init__()
    pass
  
  def execute(self, function, params, output):
    output = self.str2func[function](output)
    return output

  def __call__(self, scene, program):
    self.scene = scene
    output = None
    for seq in program:
      args = seq.split()
      prev_out = self.execute(args[0], args[1], output)
    return output

**Cons:** Here we have 35 functions each taking 1 to 2 input parameters. In total we have 1024 possible permutations of parameters. Moreover, the functions can be composed to form larger programs, leading to a combinatorial explosion of the program space.

In [56]:
num_params = 1
for k, v in metadata['_functions_by_name'].items():
  #print(v['inputs'], len(v['inputs']))
  if (len(v['inputs']) > 0):
    num_params = num_params * len(v['inputs'])
print(len(grammar.keys()), "functions and ", num_params, "possible input permutations!")

35 functions and  1024 possible input permutations!


Therefore, we need smart ways to search in the space of programs. 

The simplest search algorithm is enumeration. [define]

The simplest form of program synthesis is a search in the space of programs defined by our DSL. 

The most basic search algorithm is bottom-up search which constructs all possible programs from a grammar starting with the terminals in the language. 

### Symbolic Search
[introduce]

#### Top-down enumeration

In [48]:
# in progress

def synthesize(grammar, input, output):
  plist = [] # initialize with terminals
  while(True):
    for p in grammar:
      if (p[input] == [output]):
        return p
    bank += grow(grammar)

def grow(grammar):
  g = []

#### Bottom-up enumeration

In [47]:
# in progress

### Learning to synthesize
(input output examples)

In [49]:
# in progress

## Neuro-symbolic Representation

We can combine neural and symbolic components to make more powerful representations. Why are they powerful? What are the ways they can be combined?

In [50]:
# in progress
# implement neurosymbolic concept learner
# https://arxiv.org/pdf/1904.12584.pdf

# Conclusion

# Further Reading