Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow modalities of multimodal docs to be accessed #425

Merged
merged 15 commits into from
Jul 21, 2022

Conversation

JohannesMessner
Copy link
Member

@JohannesMessner JohannesMessner commented Jun 30, 2022

This allows Documents that come from a multimodal dataclass to expose their multimodal attributes like any other 'native' attribute (check last line in the code snippet):

MyText = TypeVar('MyText', bound=str)


def my_setter(value) -> 'Document':
    return Document(text=value + ' but custom!')


def my_getter(doc: 'Document'):
    return doc.text


@dataclass
class MyMultiModalDoc:
    avatar: Image
    description: Text
    heading: MyText = field(setter=my_setter, getter=my_getter, default='')


m = MyMultiModalDoc(avatar='testflow.jpg', description='hello, world', heading='hello, world')
d = Document(m)
print(d.heading)  # prints a Document with text='hello, world but custom!'
print(d.heading.text)  # prints 'hello, world but custom!'

This also extends to use inside of an Executor:

class MyExec(Executor):
    @requests
    def foo(self, docs, **kwargs):
        model = ...  # embedding model
        for d in docs:
            d.heading.embed(model)

Advanced usage example

This also handles list-types, nested dataclasses, and list-types of nested data classes:

Click to expand: Advanced example
MyText = TypeVar('MyText', bound=str)


def my_setter(value) -> 'Document':
    return Document(text=value + ' but custom!')


def my_getter(doc: 'Document'):
    return doc.text


@dataclass
class InnerDoc:
    avatar: Image
    description: Text
    heading: MyText = field(setter=my_setter, getter=my_getter, default='')


@dataclass
class MyMultiModalDoc:
    avatar: Image
    description: Text
    heading_list: List[Text]
    other_doc: InnerDoc
    other_doc_list: List[InnerDoc]
    heading: MyText = field(setter=my_setter, getter=my_getter, default='')


inner_doc_list = [InnerDoc(avatar='testflow.jpg', description='hello, world', heading=f'{i} hello, world') for i in range(3)]

m = MyMultiModalDoc(avatar='testflow.jpg', description='hello, world', heading='hello, world',
                    heading_list=['hello', 'world'], other_doc=InnerDoc(avatar='testflow.jpg',
                                                                        description='inner hello, world',
                                                                        heading='inner hello, world'),
                    other_doc_list=inner_doc_list)
d = Document(m)
print(d.heading.text)  # returns 'hello, world but custom!'
print(d.heading_list.texts)  # returns ['hello', 'world']
print(d.other_doc)  # returns the inner doc
print(d.other_doc.heading.text)  # returns 'inner hello, world but custom!'
print(d.other_doc_list)  # returns DocArray of inner docs
print(d.other_doc_list[1].heading.text)  # returns '1 inner hello, world but custom!'

This should solve the issue of multimodal features not being easily usable inside of Jina.

Still TODO:

@codecov
Copy link

codecov bot commented Jun 30, 2022

Codecov Report

Merging #425 (186b45c) into main (64721a6) will increase coverage by 2.18%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #425      +/-   ##
==========================================
+ Coverage   84.45%   86.63%   +2.18%     
==========================================
  Files         134      134              
  Lines        6486     6497      +11     
==========================================
+ Hits         5478     5629     +151     
+ Misses       1008      868     -140     
Flag Coverage Δ
docarray 86.63% <100.00%> (+2.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/document/mixins/multimodal.py 94.18% <100.00%> (+0.85%) ⬆️
docarray/array/mixins/setitem.py 81.81% <0.00%> (+0.82%) ⬆️
docarray/array/storage/weaviate/find.py 78.48% <0.00%> (+1.26%) ⬆️
docarray/base.py 98.64% <0.00%> (+1.35%) ⬆️
docarray/array/storage/sqlite/seqlike.py 84.31% <0.00%> (+1.96%) ⬆️
docarray/array/mixins/find.py 87.35% <0.00%> (+2.29%) ⬆️
docarray/array/mixins/io/binary.py 97.45% <0.00%> (+2.54%) ⬆️
docarray/array/storage/weaviate/seqlike.py 77.41% <0.00%> (+3.22%) ⬆️
docarray/array/mixins/io/from_gen.py 83.63% <0.00%> (+3.63%) ⬆️
docarray/array/storage/weaviate/backend.py 87.50% <0.00%> (+3.67%) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64721a6...186b45c. Read the comment docs.

tests/unit/document/test_multi_modal.py Outdated Show resolved Hide resolved
@JoanFM JoanFM changed the title feat: allow modalities of multimodal docs to be accessed like normal … feat: allow modalities of multimodal docs to be accessed Jul 20, 2022
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss tests to validate and see how Documents are changed from the Document and not the DataClass.

Saying:

How do I set d.header.text? or d.header? If Document has header as subdocument

@JohannesMessner JohannesMessner marked this pull request as ready for review July 21, 2022 06:49
mm_attr_da = self.get_multi_modal_attribute(attr)
return mm_attr_da if len(mm_attr_da) > 1 else mm_attr_da[0]
else:
raise AttributeError(f'{self.__class__.__name__} has no attribute {attr}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should refer to the class here but rather to the object. An Python object can have an attribute without it being define in the class ( we can add attribute on the fly after object instantiation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the error message? i don't think that the object itself has a __name__, could you write a code change suggestion showing what you would do?

Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

@JohannesMessner JohannesMessner mentioned this pull request Jul 21, 2022
10 tasks
@JoanFM JoanFM merged commit bb132b3 into main Jul 21, 2022
@JoanFM JoanFM deleted the feat-multimodal-attribute branch July 21, 2022 12:22
@hanxiao
Copy link
Member

hanxiao commented Jul 22, 2022

Please make sure to bump minor version before the release, this is a significant feature and worth a minor not patch release @JoanFM

@JoanFM
Copy link
Member

JoanFM commented Jul 23, 2022

@hanxiao we know, we will work on the blog post and documentation before though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants