# Building DSLs in Python with Operator Overloading

## DSL (Domain specific language)
A [Domain-Specific Language (DSL)](https://martinfowler.com/dsl.html) is a computer language that's targeted to a particular kind of problem.

We're going to dive in a specific type of DSL called internal DSL. Internal DSLs are internal to a host language and conform to the syntax of the host language, however they are structured in a way to provider a feel of a language.

In our case, the host language is Python and we'll be using operator overloading to achieve the feel of the language.

## Python's Operators
Python gives a ton of control over object's [operators](https://docs.python.org/3/reference/datamodel.html), the behaviour can be overwridden using dunder methods. Example below to override addition on custom objects.

In [1]:
from dataclasses import dataclass
from typing import Self


@dataclass
class Pair:
    a: int
    b: int

    def __add__(self, other: Self) -> Self:
        return type(self)(self.a + other.a, self.b + other.b)


Pair(1, 2) + Pair(3, 4)

Pair(a=4, b=6)

## Existing Examples

This kind of operator overloading is very common. If you squint you might see that it's not too far from being an actual DSL.

The objects here can form the building blocks of the DSL and the operators are just syntax on top of it. 

Indeed the most used DSL of this kind is probably Python's [pathlib](https://docs.python.org/3/library/pathlib.html) in the standard library. 

In [2]:
from pathlib import Path

p = Path('/etc')
print(p / "init.d" / "reboot")


/etc/init.d/reboot


In `pathlib` the `__truediv__` is overriden to provide the feel of a posix path `/etc/init.d/reboot`. If I had to guess the code is roughly something like:


In [3]:
import os


class PseudoPosixPath:
    path: tuple[str, ...]

    def __init__(self, *path: str) -> None:
        self.path = path

    def __truediv__(self, other: str) -> Self:
        return type(self)(*self.path, other)
    
    def __str__(self) -> str:
        return os.path.join(*self.path)
    

print(PseudoPosixPath('/etc') / "init.d" / "reboot")

/etc/init.d/reboot


There are plenty of other examples, one such example is in [langchain](https://www.langchain.com/) using `|` to create a DSL with the feel of the unix [pipeline](https://en.wikipedia.org/wiki/Pipeline_(Unix))

```python
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

# Initialize the chat model
llm = ChatOpenAI()

# Define prompts
prompt1 = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
prompt2 = ChatPromptTemplate.from_template("Explain the joke: {joke}")

# Chain the prompts and the model
chain = prompt1 | llm | StrOutputParser() | prompt2 | llm | StrOutputParser()
```

## Advanced Operator Overloading

Let's go a bit deeper into this topic. 

Not all operators are created equal and Python has bitwise and arithmetic operators. For the purpose of DSL we don't really care about the distinction. 

We can also group operators by their arity, we have binary operators with 2 arguments, like the addition operator `1 + 1`. But there are also unary operators such as negation `-`. Since the arity affects the syntax, it will become important in our DSL.

So to illustrate these techniques, let's build a DSL that uses `|` similar to a unix pipe.

In [4]:
from dataclasses import dataclass
from typing import Callable


@dataclass
class apply[T, R]:
    """
    Wrap a callable allowing `|` operator to be used.
    """
    fn: Callable[[T], R]

    def __call__(self, v: T) -> R:
        return self.fn(v)
    
    __ror__ = __call__


[5, 4, 3, 2, 1] | apply(sorted)
    

[1, 2, 3, 4, 5]

In terms of the operator, you were probably expecting `__or__` not the `__ror__` used here. This is one thing I left out from earlier. Since Python binary operators must be overloaded in an object. This raises the question if the 2 operands are of different types. 

Python will check the left hand operator for the overloading first. In this case, it'll check if the list implements `__or__` first. As you'd expect, this will limit the usefulness the overloading. So most binary operators in Python also come with a right hand side version, in this case `__ror__`, though notably not comparison operators `>`, `<` or `=`.

This is how we applied `sorted` on the right hand side in the above example, even though list doesn't implement `__or__`.

### Mixing operators
As an example, let's add a unary operator `~` to mean `or None`. That is if the input of the function is `None` then we return `None` directly otherwise we return the result.

In [5]:
from __future__ import annotations

from dataclasses import dataclass
from typing import Callable


@dataclass
class apply[T, R]:
    """
    Wrap a callable allowing `|` operator to be used.
    """
    fn: Callable[[T], R]

    def __call__(self, v: T) -> R:
        return self.fn(v)
    
    __ror__ = __call__

    def __invert__(self) -> apply[T | None, R | None]:
        def _fn(v: T | None) -> R | None:
            if v is None:
                return None
            return self.fn(v)
        return apply(_fn)
    

series: dict[str, list[int]] = {
    "a": [5, 4, 3, 2, 1],
}


print(series.get("a") | ~apply(sorted))
print(series.get("b") | ~apply(sorted) | ~apply(reversed))

[1, 2, 3, 4, 5]
None


### Precedence
Pretty cool right? This does bring up an unexpected bit of complexity, precedence, that is which operator executes first. Luckily we have well defined behaviour here. Generally expect operators you're familiar with in algebra to follow the same rules you learnt in school. For more complex combinations consult the official docs [here](https://docs.python.org/3/reference/expressions.html#operator-precedence).

Operator precedence actually can be very powerful when designing our DSL. In the above example I can apply `~` without bracket. As we are designing the language, the softer ergonomic feel of operators really does matter!

In my package better-functools, I used `@` and `|` operators. These were chosen carefully to allow mixed use without the need for brackets. e.g.

```py
Pipeline(inputs)
| func(itertools.combinations @ func.arg(Iterable[int]) @ bind(2))
| filter @ bind(sum @ compose(eq @ bind(2020)))
| map @ bind(prod)
| sum
| print
```

### Sided precedence
You might have wondered what would happen if we mix `__or__` and `__ror__`. That is we have the following situation:

In [6]:
class LHS:
    def __or__(self, _):
        return "Left Wins"


class RHS:
    def __ror__(self, _):
        return "Right Wins"
    

LHS() | RHS()

'Left Wins'

As you can see, the left hand side overload wins. Another thing to keep in mind, but also another opportunity. Let's beef up our DSL again.

In [7]:
from __future__ import annotations

from typing import Any, Self


class coalesce:
    class null:
        def __or__(self, _: Any) -> Self:
            return self
    
        def __repr__(self) -> str:
            return "<null>"

    def __ror__[T](self, v: T | None) -> T | null:
        if v is None:
            return self.null()
        return v

    

series: dict[str, list[int]] = {
    "a": [5, 4, 3, 2, 1],
}


print(series.get("a") | coalesce() | apply(sorted))
print(series.get("b") | coalesce() | apply(sorted) | apply(reversed))

[1, 2, 3, 4, 5]
<null>


We've written what's known as a null-coalescing expression. You may have seen this in javascript as syntax `?`. We can chain expressions following the `coalesce()` stage but if the previous expression results in a `None` we return `null` no matter how many operations are chained on the right hand side. Otherwise we apply the operations as expected.

The way we achieve this is by taking advantage of the side precedence of the `|` operator. The `null` object's left hand side `__or__` can override any `__ror__` in the chain, so if it's returned by `coalesce` then the rest of the expression evaluates to `null`. Else we get the value as we'd expect.

We probably should have an extra operation to convert the `null` back into a `None` but we'll leave that for this example.

### Inplace Operators
I also left out inplace operators (operators followed by `=`). These are statements where the result of the operator is assigned back to the original variable.

In [8]:
a = 10
a += 1
print(a)

b = [1, 2, 3]
b += [4]
print(b)

11
[1, 2, 3, 4]


We automatically get the inplace version for free:

In [9]:
seq = series.get("a")
seq |= coalesce()
seq |= apply(sorted)
seq

[1, 2, 3, 4, 5]

I used this to solve a specific problem in [sqlalchemy-builder](https://github.com/Jamie-Chang/sqlalchemy-builder):

```py
statement = select(Model)
if value:
    statement |= where(Model.value == value)
```

There is also a way to override the behaviour of the inplace version using `__ior__` (only works for left hand side as we're modifying the left hand object)

In [10]:
from dataclasses import dataclass
from typing import Literal, Self


@dataclass
class MyList:
    lst: list[int]

    def __ior__(self, operation: Literal["sort", "reverse"]) -> Self:
        if operation == "sort":
            self.lst.sort()
        elif operation == "reverse":
            self.lst.reverse()
        return self

l = MyList([4, 3, 6, 7]) 
l |= "sort"
l |= "reverse"
l

MyList(lst=[7, 6, 4, 3])