# Composite

Can a composite pattern make templating easier?

## Templating

There are three template targets:

* plain text
* PDF
* Word

That's because some of the uses for this are lawyers that use Word documents.

I'm building the templating system up from low principles, because everyone tries to start in the middle and I've never found or built a templating system that I've been satisfied with. That's partly because it's an end-user product that's quite finicky. In a tool where you can do anything, anything we do to it can have brittle consequences.

Still, the goal is to provide merged documents in a safe way.

## Workflow

A good system will have:

* a valid source document
* a valid set of instructions to merge
* validation on inputs, even partial ones
* conditional sections of documents, based on the dataset
* incremental control over merging (partial merges OK)
* clear instructions when something failed
* document lifecycle, including versions
* the ability to target text, PDF, and Word
* support from the document and inputs to clean up and guide the process

If that's what I want generally, a composite only adds a tree to the system.


## Embedments

Embedments are basically fields, with foreign code and some smart collections of operations. Using a composite of embedments, we are saying there's a tree of instructions, one instruction per field. That may be the wrong way to think of the problem, considering the real problem defined in Workflow.

The Composite comes from the GoF Composite Pattern. The Embedment comes from Martin Fowler's Domain Specific Languages.

## Basic Model

The basic model uses a Component, Composite, and Leaf to create an operational workflow.

The original source on this can be [found here](https://sourcemaking.com/design_patterns/composite/python/1)

In [1]:
import abc

class Component(metaclass=abc.ABCMeta):
    
    @abc.abstractmethod
    def operation(self):
        pass
    
class Composite(Component):
    
    def __init__(self):
        self._children = set()
        
    def operation(self):
        for child in self._children:
            child.operation()
            
    def add(self, component):
        self._children.add(component)
        
    def remove(self, component):
        self._children.discard(component)
        
class Leaf(Component):
    def operation(self):
        pass

class L1(Leaf):
    def operation(self):
        print(id(self))

In [2]:
composite = Composite()
composite.add(L1())
composite.add(L1())
composite.operation()

4527109008
4527108784


In [3]:
root = Composite()
b1 = Composite()
l1 = L1()
l2 = L1()
b1.add(l1)
b1.add(l2)
root.add(b1)
b2 = Composite()
b2.add(l2)
root.add(b2)
root.operation()

4552970760
4552970704
4552970760


### Making Sense

Strengths:

* a leaf can work with state
* Component can be state smart

Modifications:

* can pass/share state so leaves can work from a collective state
* work with a tree and a builder of some sort


In [4]:
import re
import abc
from collections.abc import Iterable
from functools import partial

In [5]:
def listify(o):
    if o is None: return []
    if isinstance(o, list): return o
    if isinstance(o, str): return [o]
    if isinstance(o, dict): return [o]
    if isinstance(o, Iterable): return list(o)
    return [o]

class Component(metaclass=abc.ABCMeta):
    
    @abc.abstractmethod
    def __call__(self):
        pass
    
class Composite(Component):
    
    @classmethod
    def build(cls, components, **kw):
        return cls(**kw).add(components)
    
    def __init__(self, **kw):
        self._children = []
        self.sort_key = kw.get('key')
        
    @property
    def sorter(self):
        if self.sort_key is None: return sorted
        return partial(sorted, key=self.sort_key)
        
    def add(self, components):
        if not isinstance(components, Iterable): components = [components]
        self._children = list(self.sorter(self._children + components))
        return self
        
    def remove(self, components):
        if not isinstance(components, Iterable): components = [components]
        for component in components:
            self._children.remove(component)
        return self
    
    @property
    def length(self):
        return sum([child.length for child in self._children])
    
    def __call__(self, *a, **kw):
        for child in self._children:
            child(*a, **kw)
            
    def __repr__(self):
        return f"{self.__class__.__name__}: {self.length} leaves"
        
class Leaf(Composite):

    @classmethod
    def hydrate(cls, item):
        if isinstance(item, cls): return item
        if isinstance(item, dict): return cls(**item)
        return cls(item)
        
    @classmethod
    def build(cls, items):
        items = listify(items)
        leaves = [cls.hydrate(item) for item in items]
        return Composite.build(leaves)
    
    length = 1

    def __call__(self, *a, **kw):
        pass


In [6]:
class NaiveEmbedment(Leaf):
    def __init__(self, name=None, pattern=None, **kw):
        self.name = name
        self.pattern = re.compile(name if pattern is None else pattern)
        self.kw = kw
        
    def __call__(self, document, replacement, *a, **kw):
        return re.sub(self.pattern, replacement, document)
    
    def __repr__(self):
        return f"{self.__call__.__name__}: {self.name} {self.pattern}"
    
class InsertEmbedment(Leaf):
    @classmethod
    def build(cls, items):
        items = listify(items)
        leaves = [cls.hydrate(item) for item in items]
        return Composite.build(leaves, key=lambda e: e.location)

    def __init__(self, location, name=None, **kw):
        self.location = location
        self.name = name
        self.offset = kw.get('offset', 0)
        self.kw = kw
        
    def __call__(self, document, text, *a, **kw):
        offset = kw.get('offset', self.offset)
        position = self.location + offset
        self.offset = offset + len(text)
        result = document[:position] + text + document[position:]
        print(result)
        return result
    
    def __repr__(self):
        return f"{self.__class__.__name__} {self.location}"
    

In [7]:
r = re.compile('foo')
doc = "foo bar baz foo bar foo"

In [8]:
NaiveEmbedment('foo')(doc, 'ccc')

'ccc bar baz ccc bar ccc'

In [9]:
s = InsertEmbedment.build([15, 10, 5])
s(doc, 'abc')

foo babcar baz foo bar foo
foo bar baabcz foo bar foo
foo bar baz fooabc bar foo


In [10]:
s = InsertEmbedment(12, name='incomplete')
print(s(doc, 'xxx '))
s.offset

foo bar baz xxx foo bar foo
foo bar baz xxx foo bar foo


4

### Making Sense

So, the embedment composite is different:

* uses call
* has some builders and a hydration mechanism
* addresses a sort

This makes it easier to assemble, but it's still lacking clarity:

* all the steps?
* error control?
* executable/valid?
* state/incremental process?

Also, it's hard to say what an embedment should be doing. There's the possiblity of knowing a location, or a slice that should be replaced, but that's difficult. It might be right to have a difficult embedment if the environment needs to learn those kinds of things, and it's easier to get a slice or a location from another tool. Compared to a Jinja template, though, it's opaque. There needs to be confidence from the other tool that this is the right way to address the source document.

This makes sense for use cases like:

* Given a text document, want to programatically build a template rather than pre-build it.
* Given a PDF document, want to create an overlay set of fields.
* Given a document and an ML model, I want to see if I can build a fieldset and survey pair.

## Embedments Composite

Embedments is a verb for what a field does. It applies code to embed itself.

* All embedments work on documents.
* Allow embedments to have an order.
* A simple embedment can apply text to a position in the document.
* The position is relative to the original location, or something easier to use.


In [11]:
class TextEmbedment:
    
    def __init__(self, *a, **kw):
        self.kw = kw

    @property
    def prior_offset(self):
        return self.kw.get('offset', 0)
    
    @property
    def posterior_offset(self):
        if hasattr(self, 'text'):
            length = len(self.text)
        else:
            lengt = 0
        return self.offset + length
    
    def _get(self, name):
        return getattr(self, name) + self.prior_offset
    
class PlainEmbedment(TextEmbedment):
    
    def __init__(self, position, text, **kw):
        self.position = position
        self.text = text
        self.kw = kw
        
    def __call__(self, document):
        position = self._get('position')
        return document[:position] + self.text + document[position:]
    
class ReplacingEmbedment(TextEmbedment):
    
    def __init__(self, begin, end, text, **kw):
        self.begin = begin
        self.end = end
        self.text = text
        self.kw = kw
        
    def __call__(self, document):
        begin = self._get('begin')
        end = self._get('end')
        return document[:begin] + self.text + document[end:]

In [12]:
document = "This is a document."
embedment = PlainEmbedment(10, 'nice ')
assert embedment(document) == "This is a nice document."

embedment = ReplacingEmbedment(8, 9, 'one fine')
assert embedment(document) == 'This is one fine document.'

### Making Sense

If I know where something goes, I can apply it.  Therefore:

* Learn where something goes by an easier standard (encode what I know when I decide to add a field).
* Create a chain of embedments.
* Store and use the incrementing offset to apply the embedment to the right location.
* Use validation and error control throughout.

## Regular Expression Field Identity

I'm thinking that a hint could be used in a document. Say, field1 is a document marker. I can create a regular expression with a function and then ensure it's unique. If it's a stable identifier, great. If not, use the keywords to make a better regular expression. Default keywords make it obvious what I'm looking for.
    
I need an obvious way to see how these locations are being developed. Transparency.

In [13]:
import re

In [14]:
def match_count(r, doc, **kw):
    if not isinstance(r, re.Pattern): r = re.compile(r)
    return len(r.findall(doc))

def is_stable(r, docs, **kw):
    if not isinstance(docs, list): docs = [docs]
    counts = [match_count(r, doc) == 1 for doc in docs]
    return all(counts)

def location_for(r, doc, **kw):
    if not isinstance(r, re.Pattern): r = re.compile(r)
    if not is_stable(r, doc, **kw): return (0, 0)
    return r.search(doc).span()

def insertion_point_for(r, doc, **kw):
    begin, _end = location_for(r, doc, **kw)
    return begin

def replacing_embedment(r, value, doc, **kw):
    begin, end = location_for(r, doc, **kw)
    return ReplacingEmbedment(begin, end, value)

In [15]:
doc = "uber super duper doc"
field = "super"
r = re.compile(field)
assert match_count(r, doc) == 1
assert match_count(field, doc) == 1

assert is_stable(field, doc)
assert is_stable('uper', doc) is False

assert location_for(r, doc) == (5, 10)

In [16]:
e = replacing_embedment(r, 'fantastic', doc)
# e('fantastic', doc)

In [17]:
e(doc)

'uber fantastic duper doc'

### FIXME

Fix this: an embedment without the replacement? I don't like this, but it's almost there.

## Jinja Templates

I can create templates out of Jinja instead.

Benefits:

* no guesswork on fields
* loops, conditions, logic
* stable, well-executed

Costs:

* larger framework
* not a stepping stone to PDF models

### Not Here

I'm not going to fully explore Jinja templates here. I don't want to leave that stuff in this lab right now. Better to come back to this, but leaving notes here.  The template I threw away came from the [Jinja documentation](https://jinja.palletsprojects.com/en/2.11.x/).

In [18]:
# from jinja2 import Environment, PackageLoader, select_autoescape

In [19]:
# title = "Some Title"
# class User:
#     def __init__(self, url, username):
#         self.url = url
#         self.username = username
# users = [User('http://example.com', 'Some User')]

In [20]:
# env = Environment(
#     loader=PackageLoader('slip_box', 'templates'),
#     autoescape=select_autoescape(['html', 'xml'])
# )

In [21]:
# template = env.get_template('test.html')

In [22]:
# print(template.render(title=title, users=users))