### Decorator Application: Single Dispatch Generic Functions

> Aaron's Experiments on Decorator Appl: Single Dispatch Generic Functions

---
## 💡 **Overview - Single Dispatch**
> 1. **Python 沒有內建函式過載**（不像 Java、C#）。
> 2. **用 `if-elif` 判斷型別太麻煩，不好維護**。
> 3. <font color=lightcoral>  **用 `singledispatch` 可以優雅地根據型別選擇對應的函式**，讓程式碼更乾淨、更靈活。  

> 這份 Jupyter Notebook 就是關於 **如何使用 `singledispatch` 來解決這類問題**，從實際範例學習！ 🚀
> 
> 未來進一步拓展：怎麼用 `singledispatchmethod` 來讓**類別方法**也支援這種Single Dispatch的技巧！😃

---
### Aaron - 說人話，輕鬆解釋 Overloading 這裡要做什麼：😆

---

想像一下，我們要寫一個程式，根據**傳入的參數型別**來執行不同的處理方式。例如：

- **傳入數字** 👉 轉換成 HTML 格式，顯示十進制跟十六進制數字。
- **傳入小數** 👉 四捨五入到小數點後兩位。
- **傳入字串** 👉 把 `<`、`>` 這類 HTML 特殊字元轉義，並把換行 `\n` 變成 `<br/>`。
- **傳入列表或元組** 👉 變成 HTML 的項目符號列表（像是 `<ul><li>內容</li></ul>`）。
- **傳入字典** 👉 也變成 HTML 列表，但顯示 `鍵: 值` 的格式。

---

### 🎯 **問題來了！Python 可以用函式過載 Function Overloading 嗎？**
在 **Java** 或 **C#** 這類語言，我們可以針對不同型別寫不同的函式，然後程式會自動挑選對應的函式，例如：

```java
void doSomething(int x);    // 處理整數
void doSomething(String x); // 處理字串
```
這樣呼叫 `doSomething(100)` 和 `doSomething("java")` 會執行不同的函式。

<font color=yellowgreen>但 Python 不是靜態型別語言，不能直接這樣做！😵‍💫 你沒辦法讓 Python 自動判斷「哦，這是數字，我要執行這個函式」或「這是字串，我要執行另一個函式」。</font>

---

### 🚀 **用什麼方法？**
我們可以用 **`functools.singledispatch`**，這是一個 Python 提供的「單一分派（Single Dispatch）」工具，讓我們可以針對**不同的型別**定義不同的函式版本，而不需要用一堆 `if-elif` 來判斷型別。

比如說，我們可以這樣做：
> ```python
> from functools import singledispatch
> 
> @singledispatch
> def htmlize(obj):
>     return f"<p>{str(obj)}</p>"
> 
> @htmlize.register
> def _(arg: int):
>     return f"<p>{arg} (0x{arg:x})</p>"
> 
> @htmlize.register
> def _(arg: float):
>     return f"<p>{round(arg, 2)}</p>"
> 
> @htmlize.register
> def _(arg: str):
>     return f"<p>{arg.replace('<', '&lt;').replace('>', '&gt;').replace('\\n', '<br/>')}</p>"
> ```
🔹 **這樣的好處是**：
1. **不同型別的處理邏輯分開寫**，程式碼清楚又好維護。
2. **想支援新型別**，只要再寫一個 `@htmlize.register` 就行，不用動到原來的函式。



---
### Step by Step 思考如何做：🦶

---
考慮這樣一種應用場景：我們希望提供**類似的功能**，但具體行為會根據傳入參數的型別略有不同。

在這組範例中，我們探討這個問題，特別關注基於**單一參數型別**來區分功能的情境（這就是「**單一分派（single dispatch）**」），而不是根據多個參數的型別來決定執行邏輯（那會是「**多重分派（multi dispatch）**」）。

如果你熟悉 **Java** 或 **C#** 這類物件導向語言，你可能知道它們可以透過**函式過載（overloading）**來輕鬆實現這種功能——也就是針對不同的參數型別定義不同的函式簽章。例如：
```java
void doSomething(int x);
void doSomething(String x);
```
這樣，即使函式名稱相同，呼叫 `do_something(100)` 和 `do_something("java")` 會對應到不同的函式，前者會執行 `do_something(int)`，後者則會執行 `do_something(String)`。

然而，**Python 並不是靜態型別語言**，即使它內建了函式過載機制，也**無法**透過函式簽章來區分不同版本的函式，因為 Python 並不強制參數型別。所以，在最理想的情況下，我們**只能依靠參數數量來區分**函式版本。而且，還需要處理可變數量的 **位置參數（positional arguments）** 和 **關鍵字參數（keyword arguments）**…… 哎，光是想想就覺得麻煩！😵

總之，**傳統的函式過載在 Python 裡行不通，單一分派（single dispatch） 也無法直接使用。**
因此，我們需要**尋找其他解決方案**。

---

假設我們希望將不同型別的資料轉換為 HTML 格式，並且針對不同型別使用不同的呈現方式：
- **整數（int）**：同時顯示 **十進制（base 10）** 和 **十六進制（hex）** 格式。
- **浮點數（float）**：統一**四捨五入到 2 位小數**。
- **字串（string）**：需要 **HTML escape**（轉義 HTML 特殊字元），並且將 **換行符號 `\n` 替換為 `<br/>`**。
- **列表（list）與元組（tuple）**：轉換為 **HTML 的項目符號列表（bulleted list）**。
- **字典（dict）**：也應該轉換為 **bulleted list**，但每個項目應該顯示 `key: value` 格式。

---

為了開始實作，我們**先針對每種型別分別定義獨立的函式**。

這裡的函式會盡量簡單，但在實際應用中，我們應該考慮：
- **`None` 處理**（避免 `None` 值導致錯誤）
- **處理空列表與空字典**
- **防止傳入錯誤型別的參數**

這樣的設計能夠確保我們的函式更健壯且適用範圍更廣。 🚀

--- 
### Aaron - html escape的說明
---

在 Python 中，`from html import escape` 用於對 HTML 特殊字符進行轉義 (escaping)，以防止 HTML 注入或確保 HTML 內容正確顯示。

### 用法
`escape()` 函數會將以下特殊字符轉換為對應的 HTML 實體：
- `&` → `&amp;`
- `<` → `&lt;`
- `>` → `&gt;`
- `"` → `&quot;` (如果 `quote=True`，則會轉義)
- `'` (單引號) 不會被轉義，除非明確指定 `quote=True`

### 例子

```python
from html import escape

# 基本轉義
html_str = '<div class="content">Hello & Welcome!</div>'
escaped_str = escape(html_str)
print(escaped_str)  # &lt;div class=&quot;content&quot;&gt;Hello &amp; Welcome!&lt;/div&gt;

# 指定 quote=True 來轉義單引號
html_str_with_quotes = "It's a test!"
escaped_str_quotes = escape(html_str_with_quotes, quote=True)
print(escaped_str_quotes)  # It&#x27;s a test!
```

### 主要應用
1. **防止 XSS（跨站腳本攻擊）**：當用戶輸入 HTML 內容時，轉義特殊字符可以防止惡意代碼被執行。
2. **確保 HTML 正確顯示**：確保 `<`、`>` 這類字符不會被解釋為 HTML 標籤，而是顯示為純文本。

這樣，在處理動態內容時，`escape()` 是一個有用的工具來保證安全性和可讀性。  

---

In [1]:
from html import escape

In [2]:
html_str = '<div class="content">Hello & Welcome!</div>'
html_str

'<div class="content">Hello & Welcome!</div>'

In [3]:
escape_str = escape(html_str)
print(escape_str)
escape_str

&lt;div class=&quot;content&quot;&gt;Hello &amp; Welcome!&lt;/div&gt;


'&lt;div class=&quot;content&quot;&gt;Hello &amp; Welcome!&lt;/div&gt;'

In [5]:
html_str_with_quotes = "It's a test!"
escaped_str_quote = escape(html_str_with_quotes, quote=True)
print(escaped_str_quote)
escaped_str_quote

It&#x27;s a test!


'It&#x27;s a test!'

In [13]:
from html import escape

def html_escape(arg):
    return escape(str(arg))

def html_int(a):
    return f'{a}(<i>{str(hex(a))}</i>)'

def html_real(a):
    return f'{round(a, 2):.2f}'

def html_str(s):
    return html_escape(s).replace('\n', '<br/>\n')

def html_list(l):
    items = (f'<li>{html_escape(item)}</li>' for item in l)
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

def html_dict(d):
    items = (f'<li>{html_escape(k)}={html_escape(v)}</li>' for k, v in d.items())
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'
             

In [14]:
print(html_str(
    """this is
a multi line string
with special characters: 10 < 100"""))

this is<br/>
a multi line string<br/>
with special characters: 10 &lt; 100


In [15]:
print(html_int(255))

255(<i>0xff</i>)


In [16]:
print(html_escape(3+10j))

(3+10j)


#### 理想情況下，我們希望只需要呼叫一個單一的函式，可能是 `htmlize`，它能夠根據參數的類型，自動決定應該調用哪種 `html_xxx` 函式的特定版本。例如：

In [21]:
from decimal import Decimal

def htmlize(arg):
    if isinstance(arg, int):
        return html_int(arg)
    elif isinstance(arg, float) or isinstance(arg, Decimal):
        return html_real(arg)
    elif isinstance(arg, list) or isinstance(arg, tuple):
        return html_list(arg)
    elif isinstance(arg, dict):
        return html_dict(arg)
    else:
        # default behavior - just html escape string representation
        return html_escape(str(arg))

#### 現在，我們基本上可以使用相同的函式呼叫來處理不同的類型——`htmlize` 函式是一個**分派器（dispatcher）**，它會根據參數的類型，將請求轉發給不同的函式來處理。（其實，有一種更好的方式來實作這個功能，但我們需要等到學習**抽象基類（abstract base classes - ABCs）**後才能實現。）

In [23]:
print(htmlize([1, 2, 3]))

<ul>
<li>1</li>
<li>2</li>
<li>3</li>
</ul>


In [24]:
print(htmlize(255))

255(<i>0xff</i>)


#### 但這種方法有一些缺點：

In [26]:
print(htmlize(["""first element is
a multi-line string""", (1, 2, 3)]))

<ul>
<li>first element is
a multi-line string</li>
<li>(1, 2, 3)</li>
</ul>


#### 這裡可以看到 multi-line 沒有 newline字元，tuple 沒有變成 html list，integer 沒有16進制表示

我們嘗試修改一下：

原來
```python
def html_escape(arg):
    return escape(str(arg))

def html_int(a):
    return f'{a}(<i>{str(hex(a))}</i>)'

def html_real(a):
    return f'{round(a, 2):.2f}'

def html_str(s):
    return html_escape(s).replace('\n', '<br/>\n')

def html_list(l):
    items = (f'<li>{html_escape(item)}</li>' for item in l)
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

def html_dict(d):
    items = (f'<li>{html_escape(k)}={html_escape(v)}</li>' for k, v in d.items())
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [27]:
def html_list(l):
    items = [f'<li>{htmlize(item)}</li>' for item in l]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'


In [28]:
def html_dict(d):
    items = [f'<li>{htmlize(k)}={htmlize(v)}</li>' for k, v in d.items()]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [29]:
print(htmlize(["""first element is
a multi-line string""", (1, 2, 3)]))

<ul>
<li>first element is
a multi-line string</li>
<li><ul>
<li>1(<i>0x1</i>)</li>
<li>2(<i>0x2</i>)</li>
<li>3(<i>0x3</i>)</li>
</ul></li>
</ul>


Consider an application where we want to provide similar functionality but that varies slightly depending on the argument types passed in.

In this set of examples we consider this problem where functionality differs based on a single argument's type (hence single dispatch) instead of the type of multiple arguments (which would be multi dispatch)

If you have a background in some other OO languages such as Java or C#, you'll know that we can easily do something like this by basically **overloading** functions: using a different data type for the function parameter, hence changing the function signature. Then although the name of the function is the same, calling `do_something(100)` and `do_something('java')` would call a different function, the first one would call the `do_something(int)` function, and the second would call the `do_something(String)` function.

Of course, Python is not statically typed, so even if Python had function overloading built-in, we would not be able to make such a distinction in our function signatures since there is nothing that says that a parameter must be of a specific type, so in a best case scenario we would have to "distinguish" functions with the same name only by the number of parameters they take. And then we'd have to somehow deal with variable numbers of positional and keyword arguments too... Uuugh!
In any event, single dispatch could never work.

Instead we have to come up with a different solution.

Let's say we want to display various data types in html format, with different presentations for integers (we want both base 10 and hex values), floats (we always want it rounded to 2 decimal points), strings (we want the string html-escaped, and all newline characters replaced by `<br/>`), lists and tuples should be implemented using bulleted lists, and the same with dictionaries except we want the name/value pair to be displayed in the bulleted list.

For starters, let's just implement individual functions to do each of those things.

I am going to keep the functions very simple, but in practice you should handle situations like None objects, empty lists and dictionaries, possibly the wrong type being passed to the function, etc.

In [1]:
from html import escape

def html_escape(arg):
    return escape(str(arg))
                      
def html_int(a):
    return '{0}(<i>{1}</i)'.format(a, str(hex(a)))

def html_real(a):
    return '{0:.2f}'.format(round(a, 2))
                                  
def html_str(s):
    return html_escape(s).replace('\n', '<br/>\n')
                                  
def html_list(l):
    items = ('<li>{0}</li>'.format(html_escape(item)) 
             for item in l)
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'
                                  
def html_dict(d):
    items = ('<li>{0}={1}</li>'.format(html_escape(k), html_escape(v)) 
             for k, v in d.items())    
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [2]:
print(html_str("""this is 
a multi line string
with special characters: 10 < 100"""))

this is <br/>
a multi line string<br/>
with special characters: 10 &lt; 100


In [3]:
print(html_int(255))

255(<i>0xff</i)


In [4]:
print(html_escape(3+10j))

(3+10j)


Ideally we would want to just have to call a single function, maybe `htmlize` that would figure out which particular flavor of the `html_xxx` function to call depending on the argument type.

We could try it as follows:

In [5]:
from decimal import Decimal

def htmlize(arg):
    if isinstance(arg, int):
        return html_int(arg)
    elif isinstance(arg, float) or isinstance(arg, Decimal):
        return html_real(arg)
    elif isinstance(arg, str):
        return html_str(arg)
    elif isinstance(arg, list) or isinstance(arg, tuple):
        return html_list(arg)
    elif isinstance(arg, dict):
        return html_dict(arg)
    else:
        # default behavior - just html escape string representation
        return html_escape(str(arg))

Now we can essentially use the same function call to handle different types - the `htmlize` function is a dispatcher - it dispatches the request to a different function based on the argument type. (There's a much better way to do some of this, but we'll have to wait until we cover abstract base classes to do so).

In [6]:
print(htmlize([1, 2, 3]))

<ul>
<li>1</li>
<li>2</li>
<li>3</li>
</ul>


In [7]:
print(htmlize(dict(key1=1, key2=2)))

<ul>
<li>key1=1</li>
<li>key2=2</li>
</ul>


In [8]:
print(htmlize(255))

255(<i>0xff</i)


But there are a number of shortcomings here:

In [9]:
print(htmlize(["""first element is 
a multi-line string""", (1, 2, 3)]))

<ul>
<li>first element is 
a multi-line string</li>
<li>(1, 2, 3)</li>
</ul>


As you can see, the multi-line string did not get the newline characters replaced, the tuple was not rendered as an html list, and the integers do not have their hex representation.

So we just need to redefine the `html_list` and `html_dict` functions to use the `htmlize` function:

In [10]:
def html_list(l):
    items = ['<li>{0}</li>'.format(htmlize(item)) for item in l]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [11]:
def html_dict(d):
    items = ['<li>{0}={1}</li>'.format(html_escape(k), htmlize(v)) for k, v in d.items()]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [12]:
print(htmlize(["""first element is 
a multi-line string""", (1, 2, 3)]))

<ul>
<li>first element is <br/>
a multi-line string</li>
<li><ul>
<li>1(<i>0x1</i)</li>
<li>2(<i>0x2</i)</li>
<li>3(<i>0x3</i)</li>
</ul></li>
</ul>


Much better, but hopefully you spotted something that might seem problematic!

Do we not have a circular reference?

In order to define `html_list` and `html_dict` we needed to call `htmlize`, but in order to define `htmlize` we needed to call `html_list` and `html_dict`.

Remember that in Python we can reference a function **inside** the body of another function **before** the function has been defined, as long as by the time we **call** the first function, the second one has been defined. SO this is actually OK.

If you don't believe me and want to make sure of this yourself, go ahead and reset your Kernel (click on the Kernel | Restart menu option), and run the following code without running anything prior to this.

The `htmlize` function body makes calls to other functions such as `html_escape`, `html_int`, etc that have not actually been defined yet

In [1]:
from html import escape
from decimal import Decimal

def htmlize(arg):
    if isinstance(arg, int):
        return html_int(arg)
    elif isinstance(arg, float) or isinstance(arg, Decimal):
        return html_real(arg)
    elif isinstance(arg, str):
        return html_str(arg)
    elif isinstance(arg, list) or isinstance(arg, tuple) or isinstance(arg, set):
        return html_list(arg)
    elif isinstance(arg, dict):
        return html_dict(arg)
    else:
        # default behavior - just html escape string representation
        return html_escape(str(arg))

Now we define all the functions that `htmlize` uses before we actually call `htmlize` and all is good:

In [14]:
def html_escape(arg):
    return escape(str(arg))
                      
def html_int(a):
    return '{0}(<i>{1}</i)'.format(a, str(hex(a)))

def html_real(a):
    return '{0:.2f}'.format(round(a, 2))
                                  
def html_str(s):
    return html_escape(s).replace('\n', '<br/>\n')
                                  
def html_list(l):
    items = ['<li>{0}</li>'.format(htmlize(item)) for item in l]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'
                                  
def html_dict(d):
    items = ['<li>{0}={1}</li>'.format(html_escape(k), htmlize(v)) for k, v in d.items()]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [15]:
print(htmlize(["""first element is 
a multi-line string""", (1, 2, 3)]))

<ul>
<li>first element is <br/>
a multi-line string</li>
<li><ul>
<li>1(<i>0x1</i)</li>
<li>2(<i>0x2</i)</li>
<li>3(<i>0x3</i)</li>
</ul></li>
</ul>


As you can see this works just fine.

But we still have something undesirable. You'll notice that the dispatch function `htmlize` needs to have this big `if...elif...else` statement that will just keep growing as we need to handle more and more types (including potentially custom types).

This will just get unwieldy, and not very flexible (every time someone creates a new type that has to have a special html representation they will need to go into the `htmlize` function and modify it.

So instead, we are going to try a more flexible approach using decorators.

The way we are going to approach this is to create a dispatcher function, and then separately "register" each type-specific function with the dispatcher.

First, we are going to create a decorator that will do something that may seem kind of silly - it is going to take the decorated function and store it in a dictionary, using a key consisting of the **type** `object`.

Then when the returned closure is called, the closure will call the function stored in that dictionary.

In [16]:
def singledispatch(fn):
    registry = dict()
    registry[object] = fn
    
    def inner(arg):
        return registry[object](arg)

    return inner

In [17]:
@singledispatch
def htmlizer(arg):
    return escape(str(arg))

In [18]:
htmlizer('a < 10')

'a &lt; 10'

Next, we are going to add some functions to that `registry` dictionary, and modify our inner function to choose the correct function from the registry, or pick a default based on the type of the argument:

In [19]:
def singledispatch(fn):
    registry = dict()
    
    registry[object] = fn
    registry[int] = lambda arg: '{0}(<i>{1}</i)'.format(arg, str(hex(arg)))
    registry[float] = lambda arg: '{0:.2f}'.format(round(arg, 2))
    
    def inner(arg):
        fn = registry.get(type(arg), registry[object])
        return fn(arg)
    return inner

In [20]:
@singledispatch
def htmlize(a):
    return escape(str(a))

In [21]:
htmlize(10)

'10(<i>0xa</i)'

In [22]:
htmlize(3.1415)

'3.14'

Now, we want a way to add the specialized functions to the `registry` dictionary from **outside** the `singledispatch` function - to do so we will create a parametrized decorator that will (1) take the type as a parameter, and (2) return a closure that will decorate the function associated with the type:

In [23]:
def singledispatch(fn):
    registry = dict()
    
    registry[object] = fn
    
    def register(type_):
        def inner(fn):
            registry[type_] = fn
        return inner
        
    
    def decorator(arg):
        fn = registry.get(type(arg), registry[object])
        return fn(arg)
    
    return decorator

But of course this is not good enough - how do we get a hold of the `register` function from outside `singledispatch`? Remember, `singledispatch` is a decorator that returns the `decorated` closure, not the `register` closure.

We can do this by adding the `register` function as an **attribute** of the `decorated` function before we return it. 

While we're at it we're also going to:

* add the `registry` dictionary as an attribute as so we can look into it to see what it contains.

* add another function that given a type will return the function associated with that type (or the default function if the type is not found in the dictionary)

In [24]:
def singledispatch(fn):
    registry = dict()
    
    registry[object] = fn
    
    def register(type_):
        def inner(fn):
            registry[type_] = fn
            return fn  # we do this so we can stack register decorators!
        return inner
   
    def decorator(arg):
        fn = registry.get(type(arg), registry[object])
        return fn(arg)
    
    def dispatch(type_):
        return registry.get(type_, registry[object])

    decorator.register = register
    decorator.registry = registry.keys()
    decorator.dispatch = dispatch
    return decorator

In [25]:
@singledispatch
def htmlize(arg):
    return escape(str(arg))

And we can see that `htmlize` (that returned `inner`) function has an attribute called `register`:

In [26]:
htmlize.register

<function __main__.singledispatch.<locals>.register>

as well as that `registry` attribute that we put in just we could see what keys are in the `registry` dictionary:

In [27]:
htmlize.registry

dict_keys([<class 'object'>])

We can also ask it what function it is going to use for any specific type (currently we only have one registered, the default, for the most general `object` type):

In [28]:
htmlize.dispatch(str)

object

And you'll note that the extended scope of `register` and `dispatch` is the same as the extended scope of `htmlize`.

So now we can register some functions (it will store the function with associated data type in the `registry` dictionary):

In [29]:
@htmlize.register(int)
def html_int(a):
    return '{0}(<i>{1}</i)'.format(a, str(hex(a)))

We can peek into the registered types:

In [30]:
htmlize.registry

dict_keys([<class 'object'>, <class 'int'>])

and we can ask the decorated `htmlize` function what function it is going to use for the `int` type:

In [31]:
htmlize.dispatch(int)

<function __main__.html_int>

and we can actually call it as well:

In [32]:
htmlize(100)

'100(<i>0x64</i)'

The huge advantage now is that we can keep registering new handlers from anywhere in our module, or even from outside our module!

In [33]:
@htmlize.register(float)
def html_real(a):
    return '{0:.2f}'.format(round(a, 2))

@htmlize.register(str)
def html_str(s):
    return escape(s).replace('\n', '<br/>\n')

@htmlize.register(tuple)
@htmlize.register(list)
def html_list(l):
    items = ['<li>{0}</li>'.format(htmlize(item)) for item in l]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

@htmlize.register(dict)
def html_dict(d):
    items = ['<li>{0}={1}</li>'.format(htmlize(k), htmlize(v)) for k, v in d.items()]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [34]:
htmlize.registry

dict_keys([<class 'object'>, <class 'int'>, <class 'float'>, <class 'str'>, <class 'list'>, <class 'tuple'>, <class 'dict'>])

In [35]:
print(htmlize([1, 2, 3]))

<ul>
<li>1(<i>0x1</i)</li>
<li>2(<i>0x2</i)</li>
<li>3(<i>0x3</i)</li>
</ul>


In [36]:
print(htmlize((1, 2, 3)))

<ul>
<li>1(<i>0x1</i)</li>
<li>2(<i>0x2</i)</li>
<li>3(<i>0x3</i)</li>
</ul>


In [37]:
print(htmlize("""this
is a multi line string with
a < 10"""))

this<br/>
is a multi line string with<br/>
a &lt; 10


Our single dispatch decorator works quite well - but it has some limitations. For example it cannot handle functions that take in more than one argument (in which case dispatching would be based on the type of the **first** argument), and we also are not allowing for types based on parent classes - for example, integers and booleans are both integral numbers - i.e. they both inherit from the Integral base class. Similarly lists and tuples are both more generic Sequence types. We'll see this in more detail when we get to the topic of abstract base classes (ABC's).

In [38]:
from numbers import Integral

In [39]:
isinstance(100, Integral)

True

In [40]:
isinstance(True, Integral)

True

In [41]:
isinstance(100.5, Integral)

False

In [42]:
type(100) is Integral

False

In [43]:
type(True) is Integral

False

In [44]:
(100).__class__

int

In [45]:
(True).__class__

bool

The way we have implement our decorator, if we register an Integral generic function, it won't pick up either integers or Booleans.

We can certainly fix this shortcoming ourselves, but of course...

We can can use Python's built-in single dispatch support, in ...

you guessed it!

the `functools` module.

In [46]:
from functools import singledispatch
from numbers import Integral
from collections.abc import Sequence

In [47]:
@singledispatch
def htmlize(a):
    return escape(str(a))

The `singledispatch` returned closure has a few attributes we can use:
1. A `register` decorator (just like ours did)
2. A `registry` property that is the registry dictionary
3. A `dispatch` function that can be used to determine which registry key (registered type) it will use for the specified type.

In [48]:
@htmlize.register(Integral)
def htmlize_int(a):
    return '{0}(<i>{1}</i)'.format(a, str(hex(a))) 

In [49]:
htmlize.dispatch(int)

<function __main__.htmlize_int>

In [50]:
htmlize.dispatch(bool)

<function __main__.htmlize_int>

In [51]:
htmlize(100)

'100(<i>0x64</i)'

In [52]:
htmlize(True)

'True(<i>0x1</i)'

In [53]:
@htmlize.register(Sequence)
def html_sequence(l):
    items = ['<li>{0}</li>'.format(htmlize(item)) for item in l]
    return '<ul>\n' + '\n'.join(items) + '\n</ul>'

In [54]:
htmlize.dispatch(list)

<function __main__.html_sequence>

In [55]:
htmlize.dispatch(tuple)

<function __main__.html_sequence>

In [56]:
htmlize.dispatch(str)

<function __main__.html_sequence>

You'll note that a string is also a sequence type, hence our dispatcher will call the `html_sequence` function on a string.

In fact, at this point things would not even run properly.

If we were to call

`htmlize('abc')`

we'd get an infinite recursion!

The call to `htmlize` the string `abc` would treat it as a sequence, which would call `htmlize` character by character. But each character is itself just a string of length 1, so it will `htmlize` for that single character, which would treat it as a sequence, which would call `htmlize` for that single character again, and so on, in an infinite loop. 

In [57]:
htmlize('abc')

RecursionError: maximum recursion depth exceeded

Instead, we are going to register a string handler specifically - that way we will avoid that problem entirely:

In [58]:
@htmlize.register(str)
def html_str(s):
    return escape(s).replace('\n', '<br/>\n')

In [59]:
htmlize.dispatch(str)

<function __main__.html_str>

So, even though a string is both an `str` instance and in general a sequence type, the "closest" type will be picked by the dispatcher (again something our own implementation did not do).

This means, we have something for generic sequences, but something specific for more specialized strings.

In [60]:
htmlize('abc')

'abc'

We can do the same thing with sequences - right now `html_sequence` will be used for both lists and tuples. 

But suppose we want slightly different handling of tuples:

In [61]:
@htmlize.register(tuple)
def html_tuple(t):
    items = [escape(str(item)) for item in t]
    return '({0})'.format(', '.join(items))

In [62]:
htmlize.dispatch(list)

<function __main__.html_sequence>

In [63]:
htmlize.dispatch(tuple)

<function __main__.html_tuple>

In [64]:
print(htmlize(['a', 100, 3.14]))

<ul>
<li>a</li>
<li>100(<i>0x64</i)</li>
<li>3.14</li>
</ul>


In [65]:
print(htmlize(('a', 100, 3.14)))

(a, 100, 3.14)


One thing of note is that we started our decoration with a `@singledispatch` decorator - you'll notice that no specific type was indicated here - and in fact this means the dispatcher will use the generic `object` type.

This means that any object type not specifically handled by our dispatcher will fall back on that `object` key - hence you can think of it as the default for the dispatcher.

In [66]:
type(None)

NoneType

In [67]:
htmlize.dispatch(type(None))

<function __main__.htmlize>

In [68]:
type(1+1j)

complex

In [69]:
htmlize.dispatch(complex)

<function __main__.htmlize>

In [70]:
type(3)

int

In [71]:
htmlize.dispatch(int)

<function __main__.htmlize_int>

Lastly, because the name of the individual specialized functions does not really matter to us (the dispatcher will pick the appropriate function), it is quite common for an underscore character ( \_ ) to be used for the function name - the memory address of each specialized function will be stored in the `registry` dictionary, and the function name does not matter - in fact we can even add lambdas to the registry.

In [72]:
@singledispatch
def htmlize(a):
    return escape(str(a))

In [73]:
@htmlize.register(int)
def _(a):
    return '{0}({1})'.format(a, str(hex(a)))

In [74]:
@htmlize.register(str)
def _(s):
    return escape(s).replace('\n', '<br/>\n')

In [75]:
htmlize.register(float)(lambda f: '{0:.2f}'.format(f))

<function __main__.<lambda>>

In [76]:
htmlize.registry

mappingproxy({object: <function __main__.htmlize>,
              int: <function __main__._>,
              str: <function __main__._>,
              float: <function __main__.<lambda>>})

But note that the `__main__._` function for `int` and `str` are not the same functions (even tough they have the same name):

In [77]:
id(htmlize.registry[str])

3104966916432

In [78]:
id(htmlize.registry[int])

3104967451784

And everything works as expected:

In [79]:
htmlize(100)

'100(0x64)'

In [80]:
htmlize(3.1415)

'3.14'

In [81]:
print(htmlize("""this
is a multi-line string
a < 10"""))

this<br/>
is a multi-line string<br/>
a &lt; 10


If this same name but different function thing has you confused, look at it this way:

In [82]:
def my_func():
    print('my_func initial')

In [83]:
id(my_func)

3104966916296

In [84]:
f = my_func

In [85]:
id(f)

3104966916296

So, `f` and `my_func` point to the same function in memory.

Let's go ahead and "redefine" the function `my_func`:

In [86]:
def my_func():
    print('second my_func')

In fact, we did not "redefine" the previous `my_func`, it still exists in memory (and `f` still points to it). Instead we have re-assigned the function that `my_func` points to:

In [87]:
id(my_func)

3104966914800

But the original `my_func` is still around, and 'f' still has a reference to it:

In [88]:
id(f)

3104966916296

So, we can call each one:

In [89]:
f()

my_func initial


In [90]:
my_func()

second my_func


But the function `__name__` have the same value:

In [91]:
f.__name__

'my_func'

In [92]:
my_func.__name__

'my_func'

Just always keep in mind that labels point to something in memory, it is not the object itself. So in this case we have two distinct objects (functions) which happen to have the same name, but are two very different objects - `f` points to the first one we created, and `my_func` points to the second.