# Image recognition with TTR


## Bridging between perceptual and conceptual domains

Let's apply the object detection representation proposed in Dobnik & Cooper's *Interfacing language, spatial perception and cognition in TTR* to image recognition.

![Fig 8](fig/lspc-fig8.png)

Here, we use `Image` instead of `PointMap` for the whole, but instead of `reg:PointMap` we use yet another type (and rename it), `seg:Segment`. In Cooper's case the same type can be used to represent both the region and the whole, because a `PointMap` is a set of absolute positions. With `Image`, positions are relative to an origin, which needs to be specified when cropping.

I guess in the general case, the domain of an `ObjectDetector` function need not be the same as the `reg` fields in the output elements.

In [1]:
import sys
sys.path.append('pyttr')
from pyttr.ttrtypes import *
from pyttr.utils import *
import PIL.Image

ttrace()

# Basic types.

Ind = BType('Ind')

Int = BType('Int')
Int.learn_witness_condition(lambda x: isinstance(x, int))
print(Int.query(365))

Image = BType('Image')
Image.learn_witness_condition(lambda x: isinstance(x, PIL.Image.Image))
img = PIL.Image.open('res/dogcar.jpg')
print(Image.query(img))

# Segment type: a rectangular area of a given image.

Segment = RecType({#'i': Image,
    'cx': Int, 'cy': Int, 'w': Int, 'h': Int})
print(Segment.query(Rec({#'i': img,
    'cx': 100, 'cy': 150, 'w': 40, 'h': 20})))

# Redefine Image.show() to work with Rec.show().
def image_show(self):
    return str(self)
PIL.Image.Image.show = image_show
show(img)

True
True
True


'<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1080x1080 at 0x7FD6480714A8>'

In [2]:
def latex(*objs):
    texcode = '\n\n'.join(to_ipython_latex(obj) for obj in objs)
    #print(texcode)
    return Latex(texcode)

$Ind$ and $Image$ are basic types.

$Segment = \left[\begin{array}{rcl}
\text{cx} &:& Int\\
\text{cy} &:& Int\\
\text{w} &:& Int\\
\text{h} &:& Int\\
\end{array}\right]$

$Ppty = (Ind \rightarrow Type)$

$Object = \left[ \begin{array}{rcl}
    \text{pfun} &:& Ppty \\
    \text{seg} &:& Segment \\
\end{array} \right]$

$ObjectDetector = ( Image \rightarrow [Object] )$

In [3]:
Ppty = FunType(Ind, Ty)
Object = RecType({'seg': Segment, 'pfun': Ppty})
Objects = ListType(Object)
ObjectDetector = FunType(Image, Objects)

latex(Segment, Ppty, ObjectDetector)

<IPython.core.display.Latex object>

In [4]:
# Custom PyTTR utilities

from functools import reduce
    
def copy_rectype(T):
    R = RecType()
    for k, v in T.comps.__dict__.items():
        R.addfield(k, v)
    return R

def rectype_relabels(T, rlbs):
    for k1, k2 in rlbs.items():
        T.Relabel(k1, k2)
    return T

def rectype_merges(Ts):
    return reduce((lambda T, U: T.merge(U)), Ts, RecType())

def is_basic_type(T):
    tn = lambda T: type(T).__name__
    return (tn(T) == 'BType') if tn(T) != 'SingletonType' else is_basic_type(T.comps.base_type)

def basic_fields(T):
    return [k for k, v in T.comps.__dict__.items() if is_basic_type(v)]

def nonbasic_fields(T):
    return [k for k, v in T.comps.__dict__.items() if not is_basic_type(v)]

ptypes = dict()
def mkptype(sym, types=[Ind], vars=['v']):
    """Make preds and ptypes identifiable by their predicate names."""
    id = '/'.join([sym, ','.join(show(type) for type in types), ','.join(vars)])
    if id not in ptypes:
        ptypes[id] = PType(Pred(sym, types), vars)
    return ptypes[id]

def create_fun(pred_name, vars=['a']):
    fun = mkptype(pred_name, vars=vars)
    for v in reversed(vars):
        fun = Fun(v, Ind, fun)
    return fun

# print(show(create_fun('bamba', 'abcd')))

## Object detection model YOLO

We use an object detection model to detect and recognize objects in an image. The output is modeled as a set of TTR records.

Requires OpenCV and [Darkflow](https://github.com/thtrieu/darkflow). `yolo.weights` is from [Yolo](https://pjreddie.com/darknet/yolo/).

In [5]:
from darkflow.net.build import TFNet
import numpy as np

tfnet = TFNet({"model": "yolo/yolo.cfg", "load": "yolo/yolo.weights",
    'config': 'yolo', "threshold": 0.1})
yolo_out = dict()
def yolo(img):
    if str(img) not in yolo_out:
        yolo_out[str(img)] = tfnet.return_predict(np.array(img))
    return yolo_out[str(img)]

Parsing yolo/yolo.cfg
Loading yolo/yolo.weights ...
Successfully identified 203934260 bytes
Finished in 0.027223825454711914s
Model has a coco model name, loading coco labels.

Building net ...
Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 608, 608, 3)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 608, 608, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 304, 304, 32)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 304, 304, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 152, 152, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 152, 152, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 76, 76, 128)
 Load  |  Yep!  | conv 3x3p1_1  +b

In [6]:
def xy1xy2_to_cwh(x1, y1, x2, y2):
    '''Transform to center, width and height.'''
    return {'cx': int(x1/2 + x2/2), 'cy': int(y1/2 + y2/2), 'w': x2 - x1, 'h': y2 - y1}

In [7]:
def yolo_detector(i):
    return [Rec({
        'seg': Rec({
            #'i': i,
            **xy1xy2_to_cwh(o['topleft']['x'], o['topleft']['y'], o['bottomright']['x'], o['bottomright']['y']),
        }),
        'pfun': create_fun(o['label'].replace(' ', '_')),
        
    }) for o in yolo(i)] # @todo RBG/BGR?

objs = yolo_detector(img)

print(Objects.query(objs))
print(Object.query(objs[0]))
print(Ppty.query(objs[0].pfun))
print(Segment.query(objs[0].seg))

latex(objs[-1])

True
True
True
True


<IPython.core.display.Latex object>

## Individualization function

The object detection model gave us evidence that certain segments contain something that present certain properties/classes.

Now let's recognize that there are individuals which are located at those segments and having those properties.

**Is the domain of $Individualize$ really objects *of type* $IndObj$? Can a record type be *of* another record type?**

$IndObj = \left[\begin{array}{rcl}
\text{x} &:& Ind\\
\text{c}_{prop} &:& Type\\
\text{c}_{loc} &:& Type\\
\end{array}\right]$?

$Individualize : (Object \rightarrow IndObj)$ or $Individualize : (Object \rightarrow Type)$ ?

$Individualize = \lambda r : Object\ . \left[\begin{array}{rcl}
    \text{x} &:& Ind \\
    \text{c}_{prop} &:& r.\text{pfun}(\text{x}) \\
    \text{c}_{loc} &:& \text{location}(\text{x}, r.\text{seg}) \\
\end{array}\right]$

In [8]:
LocFun = create_fun('location', 'ab')

def individualize(r):
    x = gensym('x')
    return RecType({
        x: Ind,
        gensym('prop'): r.pfun.app(x),
        gensym('loc'): LocFun.app(x).app(r.seg),
    })
latex(individualize(objs[-1]))

<IPython.core.display.Latex object>

## Combining commitments

All observed situations are combined into one, so they can be considered simultaneously.

In [9]:
from itertools import product

objs_few = objs[2:5]
situations = [individualize(r) for r in objs_few]

In [10]:
from functools import reduce
sitmerge = rectype_merges(situations)
latex(sitmerge)

<IPython.core.display.Latex object>

## Spatial relations

In [11]:
location_relation_classifiers = {
    'left': lambda a, b: a.cx < b.cx,
    'right': lambda a, b: a.cx > b.cx,
    'above': lambda a, b: a.cy < b.cy,
    'below': lambda a, b: a.cy > b.cy,
}

def get_locs(T):
    locs = dict()
    for c in nonbasic_fields(T):
        t = T.comps.__dict__[c]
        if isinstance(t, PType) and t.comps.pred.name == 'location':
            locs[t.comps.args[0]] = t.comps.args[1]
    return locs

def detect_relations(T, classifiers):
    locs = get_locs(T)
    rels = []
    for k, f in classifiers:
        for x1, x2 in product(locs, locs):
            if f(locs[x1], locs[x2]):
                rels.append(create_fun(k, 'ab').app(x1).app(x2))
    return rels
    
rels = detect_relations(sitmerge, location_relation_classifiers.items())
sitmergerels = rectype_merges([sitmerge] + [RecType({gensym('rel'): rel}) for rel in rels])
latex(sitmergerels)

<IPython.core.display.Latex object>

## Text parsing

In [12]:
def create_abc(prop_a, prop_b, rel):
    '''Creates a record type describing two individuals and a relation between them.'''
    return RecType({
        'x': Ind,
        'y': Ind,
        'c_{' + prop_a + '}': create_fun(prop_a).app('x'),
        'c_{' + prop_b + '}': create_fun(prop_b).app('y'),
        'c_{' + rel + '}': create_fun(rel, 'ab').app('x').app('y'),
    })

print("A dog is to the left of a car")
question = create_abc('dog', 'car', 'left')
latex(question)

A dog is to the left of a car


<IPython.core.display.Latex object>

In [13]:
from lark.lark import Lark
ttr_parser = Lark(r'''
?start: type
?type: btype | rectype | ptype
btype: name
rectype: "{" tfield ("," tfield)* "}"
tfield: sym ":" type
ptype: sym "(" args ")"
args: sym ("," sym)*
?sym: name
?name: /[a-zA-Z0-9{}_]+/
%ignore " "
''')

def ttr_inflate(tree):
    if tree.data == 'btype':
        return BType(str(tree.children[0]))
    elif tree.data == 'rectype':
        return RecType(dict((str(f.children[0]), ttr_inflate(f.children[1])) for f in tree.children))
    elif tree.data == 'ptype':
        return mkptype(str(tree.children[0]), vars=tree.children[1].children)
    
def ttr_parse(s):
    return ttr_inflate(ttr_parser.parse(s))

s = '{c_rel : left(x, y), c_y : car(y), x : Ind, y : Ind, c_x : dog(x)}'
T = ttr_parse(s)
print(show(T))

{c_x : dog(x), c_rel : left(x, y), c_y : car(y), y : Ind, x : Ind}


In [14]:
import nltk

grammar = nltk.grammar.FeatureGrammar.fromstring(r'''
%start S
S[SEM=("x:Ind, y:Ind", <?s(x) & ?vp(x, y)>)] -> NP[SEM=?s] VP[SEM=?vp]
NP[SEM=<?det(?n)>] -> Det[SEM=?det] N[SEM=?n]
Det[SEM=<\P a.P(a)>] -> 'a' | 'an'
N[SEM=<dog>] -> 'dog'
N[SEM=<car>] -> 'car'
N[SEM=<person>] -> 'person'
N[SEM=<chair>] -> 'chair'
VP[SEM=?pp] -> 'is' PP[SEM=?pp]
PP[SEM=<\a b.(?prep(a, b) & ?o(b))>] -> Prep[SEM=?prep] NP[SEM=?o]
Prep[SEM=<left>] -> 'to' 'the' 'left' 'of'
Prep[SEM=<right>] -> 'to' 'the' 'right' 'of'
Prep[SEM=<above>] -> 'above'
Prep[SEM=<under>] -> 'under'
''')
parser = nltk.FeatureChartParser(grammar)

texts = [
    'A dog is to the left of a car',
    'A car is to the left of a dog',
#     'There is a dog to the left of a car',
#     'Is the dog to the left of the car',
#     'Is there a dog to the left of the car',
]

def and_list(a):
    """Flatten a tree of And expressions."""
    from nltk.sem.logic import AndExpression
    if isinstance(a, AndExpression):
        return and_list(a.first) + and_list(a.second)
    return [a]

def parse_text(text):
    trees = parser.parse(text.lower().split())
    sem = nltk.sem.root_semrep(list(trees)[0])
    fields = [sem[0]] + ['c_{}:{}'.format(i+1, str(x)) for i, x in enumerate(and_list(sem[1]))]
    ttr_text = '{' + ', '.join(fields) + '}'
    T = ttr_parse(ttr_text)
    return T

for text in texts:
    print(text)
    r = parse_text(text)
    print(show(r))

latex(r)

A dog is to the left of a car
{c_3 : car(y), x : Ind, c_1 : dog(x), y : Ind, c_2 : left(x, y)}
A car is to the left of a dog
{c_3 : dog(y), x : Ind, c_1 : car(x), y : Ind, c_2 : left(x, y)}


<IPython.core.display.Latex object>

## Checking text against image

Essentially, we would like to check if the situation observed is a subtype of the situation described by the text/question, whether $Q \sqsupseteq A$. A new problem here is that field labels do not match, even if the field values (the types) match. We thus need to consider all (?) relabelings of Q:

A record type $T_1$ is a *relabel-subtype* of $T_2$ if there is a relabeling of $T_1$, $T_{1_{rlb}}$ where $T_{1_{rlb}} \sqsubseteq T_2$.

Could we forget field labels and just look at the two sets of field values? Not really, because we have dependent types, so $\text{dog}(x_1) ≠ \text{dog}(x_2)$. We need to carry out each candidate *relabeling* and check subtypeness. In practice, and in this case, relabeling the basic-type ($Ind$) fields is enough, because those are the only ones whose labels appear in dependent fields. For each basic-field relabeling, we can then kind of forget labels and just find subtypeness of field values.

In [15]:
from itertools import permutations, combinations

def find_subtype_relabeling(T, U):
    '''Could record type T be a sub type of record type U if relabeling in T is allowed?'''
    # Find possible relabelings for basic-type fields
    basic_label_permutations = set(ps[:len(basic_fields(U))] for ps in permutations(basic_fields(T)))
    
    for tks in basic_label_permutations:
        # Copy U and try a basic-fields relabeling
        U2 = copy_rectype(U)
        rlb = list(zip(basic_fields(U), tks))
        for uk, tk in rlb:
            U2.Relabel(uk, tk)
        
        # For each U field, find a T field that is a subtype
        match = dict()
        for uk in nonbasic_fields(U2):
            for tk in nonbasic_fields(T):
                if T.comps.__dict__[tk].subtype_of(U2.comps.__dict__[uk]):
                    match[uk] = tk
                    break
            if uk not in match:
                break

        # Successful if all non-basic fields match.
        if len(match) == len(nonbasic_fields(U2)):
            return dict(list(rlb) + list(match.items()))
    return None

obs = sitmergerels
r = parse_text(texts[1])
print(find_subtype_relabeling(obs, r))
r2 = rectype_relabels(copy_rectype(r), find_subtype_relabeling(obs, r))
print(obs.subtype_of(r2))
latex(r2)

{'c_3': 'prop_{3}', 'x': 'x_{2}', 'c_1': 'prop_{2}', 'y': 'x_{3}', 'c_2': 'rel_{9}'}
True


<IPython.core.display.Latex object>