# Image recognition with TTR


## Bridging between perceptual and conceptual domains

Let's apply the object detection representation proposed in Dobnik & Cooper's *Interfacing language, spatial perception and cognition in TTR* to image recognition.

![Fig 8](fig/lspc-fig8.png)

Here, we use `Image` instead of `PointMap` for the whole, but instead of `reg:PointMap` we use yet another type (and rename it), `seg:Segment`. In Cooper's case the same type can be used to represent both the region and the whole, because a `PointMap` is a set of absolute positions. With `Image`, positions are relative to an origin, which needs to be specified when cropping.

I guess in the general case, the domain of an `ObjectDetector` function need not be the same as the `reg` fields in the output elements.

In [1]:
import sys
sys.path.append('pyttr')
from pyttr.ttrtypes import *
from pyttr.utils import *
import PIL.Image

ttrace()

# Basic types.

Ind = BType('Ind')

Int = BType('Int')
Int.learn_witness_condition(lambda x: isinstance(x, int))
print(Int.query(365))

Image = BType('Image')
Image.learn_witness_condition(lambda x: isinstance(x, PIL.Image.Image))
img = PIL.Image.open('res/dogcar.jpg')
print(Image.query(img))

# Segment type: a rectangular area of a given image.

Segment = RecType({#'i': Image,
    'cx': Int, 'cy': Int, 'w': Int, 'h': Int})
print(Segment.query(Rec({#'i': img,
    'cx': 100, 'cy': 150, 'w': 40, 'h': 20})))

# Redefine Image.show() to work with Rec.show().
def image_show(self):
    return str(self)
PIL.Image.Image.show = image_show
show(img)

True
True
True


'<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1080x1080 at 0x7FF40400E588>'

In [2]:
Ppty = FunType(Ind, Ty)
ImageDetection = RecType({'seg': Segment, 'pfun': Ppty})
ImageDetections = ListType(ImageDetection)
ObjectDetector = FunType(Image, ImageDetections)

## Object detection model YOLO

Requires OpenCV and [Darkflow](https://github.com/thtrieu/darkflow). `yolo.weights` is from [Yolo](https://pjreddie.com/darknet/yolo/).

In [3]:
from darkflow.net.build import TFNet
import numpy as np

tfnet = TFNet({"model": "yolo/yolo.cfg", "load": "yolo/yolo.weights",
    'config': 'yolo', "threshold": 0.1})

Parsing yolo/yolo.cfg
Loading yolo/yolo.weights ...
Successfully identified 203934260 bytes
Finished in 0.03660297393798828s
Model has a coco model name, loading coco labels.

Building net ...
Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 608, 608, 3)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 608, 608, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 304, 304, 32)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 304, 304, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 76, 76, 128)
 Load  |  Yep!  | conv 3x3p1_1  +bn

In [4]:
# Make preds and ptypes identifiable by their predicate names.
# From now on, use mktype().
ptypes = dict()
def mkptype(sym, types=[Ind], vars=['v']):
    id = '/'.join([sym, ','.join(show(type) for type in types), ','.join(vars)])
    if id not in ptypes:
        ptypes[id] = PType(Pred(sym, types), vars)
    return ptypes[id]

print(show(mkptype('rabbit') is mkptype('rabbit')))

True


In [5]:
def xy1xy2_to_cwh(x1, y1, x2, y2):
    '''Transform to center, width and height.'''
    return {'cx': int(x1/2 + x2/2), 'cy': int(y1/2 + y2/2), 'w': x2 - x1, 'h': y2 - y1}

In [6]:
def yolo_detector(i):
    return [Rec({
        'seg': Rec({
            #'i': i,
            **xy1xy2_to_cwh(o['topleft']['x'], o['topleft']['y'], o['bottomright']['x'], o['bottomright']['y']),
        }),
        'pfun': Fun('v', Ind, mkptype(o['label'], 'v')),
    }) for o in tfnet.return_predict(np.array(i))] # @todo RBG/BGR?

image_detections = yolo_detector(img)

print(ImageDetections.query(image_detections))
print(ImageDetection.query(image_detections[0]))
print(Ppty.query(image_detections[0].pfun))
print(Segment.query(image_detections[0].seg))

for image_detection in image_detections:
    print(show(image_detection))

True
True
True
True
{pfun = lambda v:Ind . person(v), seg = {w = 276, cx = 138, cy = 654, h = 809}}
{pfun = lambda v:Ind . person(v), seg = {w = 706, cx = 714, cy = 657, h = 796}}
{pfun = lambda v:Ind . person(v), seg = {w = 380, cx = 194, cy = 888, h = 381}}
{pfun = lambda v:Ind . car(v), seg = {w = 774, cx = 490, cy = 589, h = 979}}
{pfun = lambda v:Ind . dog(v), seg = {w = 687, cx = 704, cy = 714, h = 718}}
{pfun = lambda v:Ind . chair(v), seg = {w = 219, cx = 757, cy = 541, h = 210}}
{pfun = lambda v:Ind . chair(v), seg = {w = 778, cx = 547, cy = 687, h = 783}}
{pfun = lambda v:Ind . sofa(v), seg = {w = 957, cx = 486, cy = 677, h = 803}}
{pfun = lambda v:Ind . cell phone(v), seg = {w = 187, cx = 93, cy = 588, h = 423}}
{pfun = lambda v:Ind . clock(v), seg = {w = 71, cx = 44, cy = 544, h = 107}}


Here's a version where individuals are created too.

In [7]:
DetectedInd = RecType({'seg': Segment, 'pfun': Ppty, 'ind': Ind})
DetectedInds = ListType(DetectedInd)

def yolo_detector_ind(i):
    return [Rec({
        'seg': Rec({
            #'i': i,
            **xy1xy2_to_cwh(o['topleft']['x'], o['topleft']['y'], o['bottomright']['x'], o['bottomright']['y']),
        }),
        'pfun': Fun('v', Ind, mkptype(o['label'], 'v')),
        'ind': Ind.create(),
    }) for o in tfnet.return_predict(np.array(i))]

ind_detections = yolo_detector_ind(img)
print(DetectedInds.query(ind_detections))
print(show(ind_detections[0]))
print(list(r.ind for r in ind_detections))

True
{seg = {w = 276, cx = 138, cy = 654, h = 809}, pfun = lambda v:Ind . person(v), ind = a_{0}}
['a_{0}', 'a_{1}', 'a_{2}', 'a_{3}', 'a_{4}', 'a_{5}', 'a_{6}', 'a_{7}', 'a_{8}', 'a_{9}']


## Spatial relations

In [22]:
# An index of IndDetection by Ind.
ind_dets = dict((r.ind, r) for r in ind_detections)

Left = mkptype('left', [Ind, Ind], ['a', 'b'])
Left.learn_witness_condition(lambda ab: ind_dets[ab[0]].seg.cx < ind_dets[ab[1]].seg.cx)
print(show(Left))

print(Left.query((ind_detections[0].ind, ind_detections[1].ind)))
print(Left.query((ind_detections[1].ind, ind_detections[2].ind)))

<pyttr.ttrtypes.PType object at 0x7ff39a12fba8>
left(a, b)
True
False


<pyttr.ttrtypes.PType at 0x7ff39a12fba8>

In [9]:
ClassifiedInd = RecType({'ind': Ind, 'pfun': Ppty})
ClassifiedInds = ListType(ClassifiedInd)
LocatedInd = RecType({'ind': Ind, 'seg': Segment})
LocatedInds = ListType(LocatedInd)
print(ClassifiedInds.query(ind_detections))
print(LocatedInds.query(ind_detections))

True
True


In [10]:
# Inspired by RobotState in Dobnik, Cooper & Larsson (2013).
State = RecType({
    'image': Image,
    'objects': LocatedInds, # ???
    'beliefs': RecTy,
})

In [11]:
def observe(image):
    '''Perform classification and return a State.'''
    dets = yolo_detector_ind(image)
    beliefs = RecType({})
    objects = []
    for det in dets:
        objects.append(Rec({'ind': det.ind, 'seg': det.seg}))
        beliefs.addfield(gensym('c'), (det.pfun, [det.ind])) # We'd like to use ⇑objects[i] but ⇑ is not implemented?
    return Rec({
        'image': image,
        'objects': objects,
        'beliefs': beliefs,
    })

state = observe(img)

In [12]:
from IPython.display import Latex
Latex(to_ipython_latex(state))

<IPython.core.display.Latex object>

## Questions

Starting to sketch on whatever the output from parsing would be.

In [25]:
print("A dog is to the left of a car")

def create_abc(prop_a, prop_b, rel):
    '''Creates a record type describing two individuals and a relation between them.'''
    return RecType({
        'a_1': Ind,
        'a_2': Ind,
        'c_{' + prop_a + '}': (Fun('v', Ind, mkptype(prop_a)), ['a_1']),
        'c_{' + prop_b + '}': (Fun('v', Ind, mkptype(prop_b)), ['a_2']),
        'c_{' + rel + '}': (Fun('a', Ind, Fun('b', Ind, mkptype(rel, [Ind, Ind], ['a', 'b']))), ['a_1', 'a_2'])
    })

question = create_abc('dog', 'car', 'left')

Latex(to_ipython_latex(question))

A dog is to the left of a car


<IPython.core.display.Latex object>

A very simple and naive parser.

In [30]:
import nltk
grammar = nltk.CFG.fromstring("""
S -> Det N 'is' P Det N
Det -> 'a' | 'an'
N -> 'dog' | 'car' | 'sofa' | 'person' | 'chair'
P -> 'to' 'the' 'left' 'of' | 'to' 'the' 'right' 'of'
""")
parser = nltk.ChartParser(grammar)

def parse(sent):
    sent = sent.lower().split(' ')
    for tree in parser.parse(sent):
        return create_abc(tree[1][0], tree[5][0], tree[3][2])
r = parse('A dog is to the left of a car')
Latex(to_ipython_latex(r))

<IPython.core.display.Latex object>