Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Jsonschema of DocList not correct #1521

Open
JohannesMessner opened this issue May 9, 2023 · 4 comments · May be fixed by #1876
Open

Bug: Jsonschema of DocList not correct #1521

JohannesMessner opened this issue May 9, 2023 · 4 comments · May be fixed by #1876
Assignees

Comments

@JohannesMessner
Copy link
Member

JohannesMessner commented May 9, 2023

DocList does not generate a correct JSONSchema; the schema of the documents in the doclist do not show up:

from docarray import BaseDoc, DocList
import pydantic


class Doc(BaseDoc):
    ...


print(pydantic.schema_of(DocList[Doc]))
print(pydantic.schema_of(List[Doc]))

class DocDoc(BaseDoc):
    docs: DocList[Doc]

print(print(DocDoc.schema()))

This is the cause of OpenAPI through FastAPI not working properly, as seen in #1449

@JohannesMessner
Copy link
Member Author

JohannesMessner commented May 16, 2023

After digging into this for a while it seems like we have a bit of a problem.

tl;dr: We cannot have working validation for DocList and have working jsonschema at the same time. The reason our validation currently works at all is that we have a bug that breaks pydantic in just the right way.

longer version:

  • I discovered that DocList does not fully behave like a built-in generic: get_origin(DocList[Doc]) returns None instead of DocList. This means that pydantic does not pick it up as a list and does not treat it properly. For most use cases this can be worked around though, by addiing __origin__ and __args__ like below, and pydantic's custom implementation of get_origin will pick it up properly:
# in AnyDocArray.__class_getitem__
class _DocArrayTyped(cls):  # type: ignore
    doc_type: Type[BaseDoc] = cast(Type[BaseDoc], item)
    __origin__: Type['AnyDocArray'] = cls  # add this
    __args__: Tuple[Any, ...] = (item,)  # add this
  • The step above fixes Jsonschema and FastAPI, great success!
  • However, then validation does not work anymore:
class MyDoc(BaseDoc):
    docs: DocList[ImageDoc]
d = MyDoc(docs=DocList[ImageDoc]([ImageDoc()]))
print(type(d.docs))  # this will be a `list` instead of `DocList[ImageDoc]`

This error can be reproduced in "pure" pydantic, showing that it is not per se a docarray issue:

T_item = TypeVar('T_item')


class MyOtherList(List[T_item], typing.Generic[T_item]):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value, field, config):
        print('validating :)')  # this is never called/printed
        if isinstance(value, MyList):
            print(f'{type(value)=}')
            return value
        raise TypeError(f'Cannot validate')


class OtherDoc(BaseModel):
    docs: MyOtherList[int]


d = OtherDoc(docs=MyOtherList[int]([1, 2, 3]))
print(type(d.docs))  # this will be a list instead of MyList

Conclusion: The reason validation on DocList works right now is that we failed to implement some list-like things, and so pydantic doesn't pick it up as a real list. If we fix this bug, validation will break. Thus, we cannot have working validation and working jsonschema/fastapi at the same time.

@samsja
Copy link
Member

samsja commented May 16, 2023

So we wait for Pydantic v2 to fix this ? Or we can undo the list like things and try to fix it (I don't think we should do it )

@JohannesMessner
Copy link
Member Author

I would wait

@JoanFM
Copy link
Member

JoanFM commented May 16, 2023

We wait and document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Blocked
Development

Successfully merging a pull request may close this issue.

3 participants