Bug in to_pydantic_model/from_pydantic_model #221

JohannesMessner · 2022-03-23T13:19:46Z

Problem description:

The class methods to_pydantic_model() / from_pydantic_model() do not appear to be inverses of each other.

How to reproduce:

>>> Document.from_pydantic_model(Document(blob=b'hello').to_pydantic_model())
binascii.Error: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4

Seems like this part of the conversion (in the from_pydantic_model() method) might be based on an assumption that does not always hold:

elif f_name == 'blob':
    # here is a dirty fishy itchy trick
    # the original bytes will be encoded two times:
    # first time is real during `to_dict/to_json`, it converts into base64 string
    # second time is at `from_dict/from_json`, it is unnecessary yet inevitable, the result string get
    # converted into a binary string and encoded again.
    # consequently, we need to decode two times here!
    fields[f_name] = base64.b64decode(base64.b64decode(value))

My guess is that the blob is actually only encoded once, but it tries to decode twice, giving the error above.

Related Issue

The .dict() method of PydanticDocument does not implement any decoding of blob, but I think it should:

>>> d = Document(blob=b'hello')
>>> d.blob
b'hello'
>>> d_pyd = d.to_pydantic_model()  # encodes the blob
>>> d_pyd.blob
'aGVsbG8='
>>> d_pyd.dict()['blob']  # I think this should decode the blob and return b'hello'
'aGVsbG8='

@JoanFM I believe that if this behaviour of .dict() is fixed then the trick in the core of creating a DocumentArray becomes unnecessary.

The text was updated successfully, but these errors were encountered:

JohannesMessner added the type/bug Something isn't working label Mar 23, 2022

alaeddine-13 mentioned this issue Mar 23, 2022

fix: encode and decode only once #223

Merged

alaeddine-13 closed this as completed in #223 Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in to_pydantic_model/from_pydantic_model #221

Bug in to_pydantic_model/from_pydantic_model #221

JohannesMessner commented Mar 23, 2022 •

edited

Bug in to_pydantic_model/from_pydantic_model #221

Bug in to_pydantic_model/from_pydantic_model #221

Comments

JohannesMessner commented Mar 23, 2022 • edited

JohannesMessner commented Mar 23, 2022 •

edited