Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LargeBinary and nullable fields crashes when base64 encode is set #409

Closed
collerek opened this issue Nov 3, 2021 · 7 comments
Closed

LargeBinary and nullable fields crashes when base64 encode is set #409

collerek opened this issue Nov 3, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@collerek
Copy link
Owner

collerek commented Nov 3, 2021

Seems there is some issue with LargeBinary and nullable fields when base64 encode is set.

The code:

import databases
import ormar as orm
import sqlalchemy as sa

DB_URL = 'sqlite:///db.sqlite'
database = databases.Database = databases.Database(DB_URL, force_rollback=True)
metadata = sa.MetaData()

class Picture(orm.Model):
    class Meta:
        database = database
        metadata = metadata

    id: int = orm.Integer(primary_key=True)
    name: str = orm.String(max_length=32, nullable=True)
    file: bytes = orm.LargeBinary(max_length=1000000, represent_as_base64_str=True, nullable=True)


async def create():
    engine = sa.create_engine(DB_URL)
    await database.connect()
    metadata.drop_all(engine)
    metadata.create_all(engine)
    pict_a = await Picture.objects.create(name='A picture')
    print(pict_a.json(indent=4))
    await database.disconnect()


if __name__ == '__main__':
    import asyncio
    asyncio.run(create())

The error:

➤ python largebin.py                                                                                                                                                                21/11/03 05:56:20
Traceback (most recent call last):
  File "/home/igor/ormar_test/largebin.py", line 34, in <module>
    asyncio.run(create())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/igor/ormar_test/largebin.py", line 25, in create
    pict_a = await Picture.objects.create(name='A picture')
  File "/home/igor/ormar_test/venv/lib/python3.9/site-packages/ormar/queryset/queryset.py", line 1028, in create
    instance = await instance.save()
  File "/home/igor/ormar_test/venv/lib/python3.9/site-packages/ormar/models/model.py", line 73, in save
    self.update_from_dict(
  File "/home/igor/ormar_test/venv/lib/python3.9/site-packages/ormar/models/newbasemodel.py", line 847, in update_from_dict
    setattr(self, key, value)
  File "/home/igor/ormar_test/venv/lib/python3.9/site-packages/ormar/models/newbasemodel.py", line 179, in __setattr__
    if hasattr(self, name):
  File "/home/igor/ormar_test/venv/lib/python3.9/site-packages/ormar/models/descriptors/descriptors.py", line 63, in __get__
    value = base64.b64encode(value).decode()
  File "/usr/lib/python3.9/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'NoneType'

Or am I doing something very wrong?

PS: If base64 representation is set to "False" , it works nicely. Or if I pass a base64 encoded stream (when base64 is set to True) as

pict_a = await Picture.objects.create(name='A picture', file=base64.b64encode(b'afakepict'))

Also works.

Ugly workaround

An ugly workaround is set the field with a default encoded base64 empty stream as

file: bytes = orm.LargeBinary(max_length=1000000, represent_as_base64_str=True, default=base64.b64encode(b''))

Originally posted by @igormorgado in #406

@collerek collerek added the bug Something isn't working label Nov 3, 2021
@xkortex
Copy link

xkortex commented Dec 10, 2021

The workaround isn't working for me, I'm still seeing

asyncpg.exceptions.DataError: invalid input for query argument $3: '' (a bytes-like object is required, not 'str')

Same situation, here's my annotation:

features: bytes = ormar.LargeBinary(
        max_length=512, represent_as_base64_str=True, nullable=True
    )  

I'm on py3.8, ormar 0.10.19, at the moment.

I've also tried

features: bytes = ormar.LargeBinary( default=base64.b64encode(b''),
        max_length=512, represent_as_base64_str=True, nullable=True
    )
features: Optional[bytes] = ormar.LargeBinary( default=base64.b64encode(b''),
        max_length=512, represent_as_base64_str=True, nullable=True
    )
features: str = ormar.LargeBinary( default=base64.b64encode(b''),
        max_length=512, represent_as_base64_str=True, nullable=True
    )    

and a bunch of other variations on str/bytes/None.

This bug appears to affect bulk_update and bulk_create but save seems OK.

@igormorgado
Copy link

igormorgado commented Dec 10, 2021

Pay attention I do not set nullable=True. Check if it "works"

@xkortex
Copy link

xkortex commented Dec 10, 2021

@igormorgado - I'm aware of the difference between "" and b"". I've tried many different variants. I'm starting with an object that's already in the database, dumping it to json (which correctly renders b64 string), parsing it back to object, and trying to run bulk_update. Modelname.save() works when I start with a dict with a bytes field, parse it using internal_obj = Modelname(**mydict) and then internal_obj.save(). But other operations seem to fail. I'm still exploring the state space.

@xkortex
Copy link

xkortex commented Dec 10, 2021

Somewhat related, I noticed when trying different combos, this gives an error if you try to pass it None:

  File "/app/REPO/api/v0/events_api.py", line 67, in post_bulk_update
    await model.objects.bulk_update(map(model.parse, object_list[:1]))
  File "/usr/local/lib/python3.8/site-packages/ormar/queryset/queryset.py", line 1090, in bulk_update
    for objt in objects:
  File "/app/REPO/models/events.py", line 36, in parse
    return cls.parse_obj(obj)
  File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
  File "/usr/local/lib/python3.8/site-packages/ormar/models/newbasemodel.py", line 134, in __init__
    new_kwargs, through_tmp_dict = self._process_kwargs(kwargs)
  File "/usr/local/lib/python3.8/site-packages/ormar/models/newbasemodel.py", line 254, in _process_kwargs
    new_kwargs: Dict[str, Any] = {
  File "/usr/local/lib/python3.8/site-packages/ormar/models/newbasemodel.py", line 255, in <dictcomp>
    k: self._convert_to_bytes(
  File "/usr/local/lib/python3.8/site-packages/ormar/models/newbasemodel.py", line 789, in _convert_to_bytes
    value = base64.b64decode(value)
  File "/usr/local/lib/python3.8/base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "/usr/local/lib/python3.8/base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'

if not isinstance(value, str) and field.represent_as_base64_str:
return base64.b64encode(value).decode()

value should probably be None-checked and return None if field type is nullable.

@xkortex
Copy link

xkortex commented Dec 10, 2021

So this is the line that trips in the traceback:

await self.database.execute_many(expr, ready_objects)

I suspect this is where it is going sideways. I think it isn't doing the correct conversion. I'm going to hacking around a bit and see what I find.

for objt in objects:
new_kwargs = objt.dict()
if pk_name not in new_kwargs or new_kwargs.get(pk_name) is None:
raise ModelPersistenceError(
"You cannot update unsaved objects. "
f"{self.model.__name__} has to have {pk_name} filled."
)
new_kwargs = self.model.parse_non_db_fields(new_kwargs)
new_kwargs = self.model.substitute_models_with_pks(new_kwargs)
new_kwargs = self.model.translate_columns_to_aliases(new_kwargs)
new_kwargs = {"new_" + k: v for k, v in new_kwargs.items() if k in columns}
ready_objects.append(new_kwargs)

@xkortex
Copy link

xkortex commented Dec 10, 2021

In new_kwargs, the field is a string type:

2021-12-10 21:24:45.575 | DEBUG    | ormar.queryset.queryset:bulk_update:1119 - [{..., 'new_features': 'aGVsbG8='}]

in other words, it's a Dict[str, str] being executed.

Whereas in .save(), we see that it is self being sent to the db engine:

await self.signals.post_save.send(sender=self.__class__, instance=self)

The culprit is definitely new_kwargs = objt.dict(). This does the "expected" behavior of casting bytes to str-b64 (from the perspective of wanting to convert something to the correct wire format) but now it's in the wrong type for the database driver.

@collerek
Copy link
Owner Author

Should be fixed in 0.10.23 - please update and check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants