In [1]:
%load_ext autoreload
%autoreload 2

Import dependencies

In [2]:
from haystack import component

Pipelines and components in Haystack can serialize simple components, including custom ones, out of the box. Code like this just work.

In [3]:
@component
class RepeatWordComponent:
    def __init__(self, times: int):
        self.times: int = times
        
    @component.output_types(result=str)
    def run(self, word: str) -> str:
        return word * self.times

On the other hand, this code does not work if the final format is JSON, as the `set` type is not JSON serializable.

```python
@component
class SetIntersector:
    def __init__(self, intersect_with: set):
        self.intersect_with = intersect_with

    @component.output_types(result=set)
    def run(self, data: set):
        return data.intersection(self.intersect_with)
```

In such cases, you can provide your own implementation `from_dict` and `to_dict` to components:

In [4]:
from haystack import component, default_to_dict, default_from_dict

In [5]:
class SetIntersector:
    def __init__(self, intersect_with: set) -> None:
        self.intersect_with: set = intersect_with
        
    @component.output_types(result=set)
    def run(self, data: set) -> set:
        return data.intersection(self.intersect_with)
    
    def to_dict(self):
        return default_to_dict(self, intersect_with=list(self.intersect_with))
    
    @classmethod
    def from_dict(cls, data):
        # convert the set into a list for the dict representation, so it can be converted to JSON
        data['intersect_with'] = set(data['intersect_with'])
        return default_from_dict(cls, data)

Saving the pipeline to a **custom format**. Once a pipeline is available its dictionary format, the last step of serialization is to convert that dictionary into a format you can store or send over the wire. Haystack support YAML out of the box, but if you need a different format, you can write a custom **marshaller**.

A `Marshaller` is a Python class responsible for converting text to a dictionary and a dictionary to text according to a certain format. Marshallers must respect the `Marshaller` protocol, providing the methods `marshal` and `unmarshal`.

This is the code for a custom TOML marshaller that relies on the `rtoml` library.

In [6]:
from typing import Dict, Any, Union
import rtoml

In [7]:
class TomlMarshaller:
    def marshal(self, dict_: Dict[str, Any]) -> str:
        return rtoml.dumps(dict_)
    
    def unmarshal(self, data_: Union[str, bytes]) -> Dict[str, Any]:
        return rtoml.loads(data_)

You can then pass a Marshaller instance to the methods `dump`, `dumps`, `load`, and `loads` of the `Pipeline` class.

In [8]:
from haystack import Pipeline

In [9]:
pipe = Pipeline()
pipe.dumps(TomlMarshaller())

'max_runs_per_component = 100\nconnections = []\n\n[metadata]\n\n[components]\n'