Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: serializing and then deserializing to/from json or dict does not preserve dataclass properties #429

Closed
JohannesMessner opened this issue Jul 6, 2022 · 0 comments

Comments

@JohannesMessner
Copy link
Member

JohannesMessner commented Jul 6, 2022

How to reproduce:

from docarray import Document, dataclass
from docarray.typing import Text


@dataclass
class MyMMDoc:
    t: Text
    

m = MyMMDoc(t='hi')
d = Document(m)
d.is_multimodal
Out[0]: True
d_proto = Document.from_protobuf(d.to_protobuf())
d_proto.is_multimodal
Out[1]: True
d_dict = Document.from_dict(d.to_dict())
d_dict.is_multimodal
Out[2]: False
d_json = Document.from_json(d.to_json())
d_json.is_multimodal
Out[2]: False

Probable Cause:

When calling d.to_dict() or d.to_json(), d gets converted through to_pydantic_model() to a PydanticDocument.
The PydanticDocument contains _metadata (which is responsible for enabling multimodal features) as a private attribute, and private attributes are not kept by pydantic (more info (here)[https://pydantic-docs.helpmanual.io/usage/models/#private-model-attributes]).

Possible Solution:

One idea to solve this coild be with a small special logic that converts _metadata to metadata when converting to PydanticDocument.

Something like this:

    def to_pydantic_model(self) -> 'PydanticDocument':
        """Convert a Document object into a Pydantic model."""
        from ..pydantic_model import PydanticDocument as DP

        _p_dict = {}
        for f in self.non_empty_fields:
            v = getattr(self, f)
            ...
            elif f == '_metadata':
                _p_dict['metadata'] = v
            else:
                _p_dict[f] = v
        return DP(**_p_dict)

And the equivalent in from_pydantic_model().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants