docs: multi modalities (#1317)

* docs: add multi modalities section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: data types section add : Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: first draft image modality Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: image display with mkdocs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: image display with mkdocs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: image display with mkdocs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: fix second image Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: second image Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add empty sections and 3d mesh iframe Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add first draft of 3d mesh section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: update image display_notebook.jpg for image section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: remove duplicate mesh display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: section header in mesh section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add first draft of audio section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: update audio file Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add first draft of video section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix video display in video section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: first draft table section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * chore: add mkdocs-video Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: move mkdocs-video from markdown-extensions to plugins section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add header to empty sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: use resized video Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video display Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: display video Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: enable copy to clipboard in mkdocs for code snippets Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: add extra.css file to change highlight color in code blocks Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: image and other sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply samis suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: note with cmd instead of python field Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix audio section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix black docs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: audio tensor import in docarray.typing and audiodoc documentation Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: update video section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: video doc and audio docs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: mesh 3d section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: table section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: remove duplicates in intro sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: move indexing part in video bytes to make more readable Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * refactor: change all DocArray to DocList Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: rebase missed dash Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: mypy, add type hints Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add emojis to headers Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: text section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: getting started sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: multimodal section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: collapse output sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: collapse sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up data types section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: add data types section to tests Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: add books.csv to toydata Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: move apple png to toydata dir Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply johannes' suggestions from code review Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: move apple.pngfix: fix docstrings for predefined docs, without testing for now Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: mark missing links Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: adjust links Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: remove link placeholders Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add missing links Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: add csv and tsv file to toydata dir Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: docs tests Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix audio section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: image section Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: fix tests Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: adjust test_docs Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: adjust paths to github files Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: doc string test for documents Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: swap docvec and anydocarray sections Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: run grammarly on .md files Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com>
docarray · Apr 12, 2023 · b8f178e · b8f178e
1 parent 7b47249
commit b8f178e
Show file tree

Hide file tree

Showing 42 changed files with 4,439 additions and 518 deletions.
diff --git a/docarray/array/any_array.py b/docarray/array/any_array.py
@@ -121,7 +121,7 @@ def _set_data_column(
         field: str,
         values: Union[List, T, 'AbstractTensor'],
     ):
-        """Set all Documents in this DocList using the passed values
+        """Set all Documents in this [`DocList`][docarray.typing.DocList] using the passed values
 
         :param field: name of the fields to extract
         :values: the values to set at the DocList level
@@ -140,7 +140,7 @@ def to_protobuf(self) -> 'DocListProto':
         ...
 
     def _to_node_protobuf(self) -> 'NodeProto':
-        """Convert a DocList into a NodeProto protobuf message.
+        """Convert a [`DocList`][docarray.typing.DocList] into a NodeProto protobuf message.
          This function should be called when a DocList
         is nested into another Document that need to be converted into a protobuf
 
@@ -157,82 +157,81 @@ def traverse_flat(
     ) -> Union[List[Any], 'AbstractTensor']:
         """
         Return a List of the accessed objects when applying the `access_path`. If this
-        results in a nested list or list of DocLists, the list will be flattened
+        results in a nested list or list of [`DocList`s][docarray.typing.DocList], the list will be flattened
         on the first level. The access path is a string that consists of attribute
-        names, concatenated and "__"-separated. It describes the path from the first
-        level to an arbitrary one, e.g. 'content__image__url'.
+        names, concatenated and `"__"`-separated. It describes the path from the first
+        level to an arbitrary one, e.g. `'content__image__url'`.
 
-        :param access_path: a string that represents the access path ("__"-separated).
+        :param access_path: a string that represents the access path (`"__"`-separated).
         :return: list of the accessed objects, flattened if nested.
 
-        EXAMPLE USAGE
-        .. code-block:: python
-            from docarray import BaseDoc, DocList, Text
+        ```python
+        from docarray import BaseDoc, DocList, Text
 
 
-            class Author(BaseDoc):
-                name: str
+        class Author(BaseDoc):
+            name: str
 
 
-            class Book(BaseDoc):
-                author: Author
-                content: Text
+        class Book(BaseDoc):
+            author: Author
+            content: Text
 
 
-            docs = DocList[Book](
-                Book(author=Author(name='Jenny'), content=Text(text=f'book_{i}'))
-                for i in range(10)  # noqa: E501
-            )
+        docs = DocList[Book](
+            Book(author=Author(name='Jenny'), content=Text(text=f'book_{i}'))
+            for i in range(10)  # noqa: E501
+        )
 
-            books = docs.traverse_flat(access_path='content')  # list of 10 Text objs
+        books = docs.traverse_flat(access_path='content')  # list of 10 Text objs
 
-            authors = docs.traverse_flat(access_path='author__name')  # list of 10 strings
+        authors = docs.traverse_flat(access_path='author__name')  # list of 10 strings
+        ```
 
         If the resulting list is a nested list, it will be flattened:
 
-        EXAMPLE USAGE
-        .. code-block:: python
-            from docarray import BaseDoc, DocList
-
+        ```python
+        from docarray import BaseDoc, DocList
 
-            class Chapter(BaseDoc):
-                content: str
 
+        class Chapter(BaseDoc):
+            content: str
 
-            class Book(BaseDoc):
-                chapters: DocList[Chapter]
 
+        class Book(BaseDoc):
+            chapters: DocList[Chapter]
 
-            docs = DocList[Book](
-                Book(chapters=DocList[Chapter]([Chapter(content='some_content') for _ in range(3)]))
-                for _ in range(10)
-            )
 
-            chapters = docs.traverse_flat(access_path='chapters')  # list of 30 strings
+        docs = DocList[Book](
+            Book(chapters=DocList[Chapter]([Chapter(content='some_content') for _ in range(3)]))
+            for _ in range(10)
+        )
 
-        If your DocList is in doc_vec mode, and you want to access a field of
-        type AnyTensor, the doc_vec tensor will be returned instead of a list:
+        chapters = docs.traverse_flat(access_path='chapters')  # list of 30 strings
+        ```
 
-        EXAMPLE USAGE
-        .. code-block:: python
-            class Image(BaseDoc):
-                tensor: TorchTensor[3, 224, 224]
+        If your [`DocList`][docarray.typing.DocList] is in doc_vec mode, and you want to access a field of
+        type [`AnyTensor`][docarray.typing.AnyTensor], the doc_vec tensor will be returned instead of a list:
 
+        ```python
+        class Image(BaseDoc):
+            tensor: TorchTensor[3, 224, 224]
 
-            batch = DocList[Image](
-                [
-                    Image(
-                        tensor=torch.zeros(3, 224, 224),
-                    )
-                    for _ in range(2)
-                ]
-            )
 
-            batch_stacked = batch.stack()
-            tensors = batch_stacked.traverse_flat(
-                access_path='tensor'
-            )  # tensor of shape (2, 3, 224, 224)
+        batch = DocList[Image](
+            [
+                Image(
+                    tensor=torch.zeros(3, 224, 224),
+                )
+                for _ in range(2)
+            ]
+        )
 
+        batch_stacked = batch.stack()
+        tensors = batch_stacked.traverse_flat(
+            access_path='tensor'
+        )  # tensor of shape (2, 3, 224, 224)
+        ```
         """
         ...
 
@@ -264,7 +263,7 @@ def _flatten_one_level(sequence: List[Any]) -> List[Any]:
 
     def summary(self):
         """
-        Print a summary of this DocList object and a summary of the schema of its
+        Print a summary of this [`DocList`][docarray.typing.DocList] object and a summary of the schema of its
         Document type.
         """
         DocArraySummary(self).summary()
@@ -276,13 +275,13 @@ def _batch(
         show_progress: bool = False,
     ) -> Generator[T, None, None]:
         """
-        Creates a `Generator` that yields `DocList` of size `batch_size`.
+        Creates a `Generator` that yields [`DocList`][docarray.typing.DocList] of size `batch_size`.
         Note, that the last batch might be smaller than `batch_size`.
 
         :param batch_size: Size of each generated batch.
         :param shuffle: If set, shuffle the Documents before dividing into minibatches.
         :param show_progress: if set, show a progress bar when batching documents.
-        :yield: a Generator of `DocList`, each in the length of `batch_size`
+        :yield: a Generator of [`DocList`][docarray.typing.DocList], each in the length of `batch_size`
         """
         from rich.progress import track
 

diff --git a/docarray/array/doc_list/doc_list.py b/docarray/array/doc_list/doc_list.py
@@ -68,9 +68,8 @@ class DocList(
     homogeneous and follow the same schema. To precise this schema you can use
     the `DocList[MyDocument]` syntax where MyDocument is a Document class
     (i.e. schema). This creates a DocList that can only contains Documents of
-    the type 'MyDocument'.
+    the type `MyDocument`.
 
-    ---
 
     ```python
     from docarray import BaseDoc, DocList
@@ -86,36 +85,39 @@ class Image(BaseDoc):
     docs = DocList[Image](
         Image(url='http://url.com/foo.png') for _ in range(10)
     )  # noqa: E510
-    ```
 
-    ---
+
+    # If your DocList is homogeneous (i.e. follows the same schema), you can access
+    # fields at the DocList level (for example `docs.tensor` or `docs.url`).
+
+    print(docs.url)
+    # [ImageUrl('http://url.com/foo.png', host_type='domain'), ...]
 
 
-    If your DocList is homogeneous (i.e. follows the same schema), you can access
-    fields at the DocList level (for example `docs.tensor` or `docs.url`).
-    You can also set fields, with `docs.tensor = np.random.random([10, 100])`:
+    # You can also set fields, with `docs.tensor = np.random.random([10, 100])`:
 
-        print(docs.url)
-        # [ImageUrl('http://url.com/foo.png', host_type='domain'), ...]
-        import numpy as np
+    import numpy as np
 
-        docs.tensor = np.random.random([10, 100])
-        print(docs.tensor)
-        # [NdArray([0.11299577, 0.47206767, 0.481723  , 0.34754724, 0.15016037,
-        #          0.88861321, 0.88317666, 0.93845579, 0.60486676, ... ]), ...]
+    docs.tensor = np.random.random([10, 100])
 
-    You can index into a DocList like a numpy doc_list or torch tensor:
+    print(docs.tensor)
+    # [NdArray([0.11299577, 0.47206767, 0.481723  , 0.34754724, 0.15016037,
+    #          0.88861321, 0.88317666, 0.93845579, 0.60486676, ... ]), ...]
 
 
-        docs[0]  # index by position
-        docs[0:5:2]  # index by slice
-        docs[[0, 2, 3]]  # index by list of indices
-        docs[True, False, True, True, ...]  # index by boolean mask
+    # You can index into a DocList like a numpy doc_list or torch tensor:
 
-    You can delete items from a DocList like a Python List
+    docs[0]  # index by position
+    docs[0:5:2]  # index by slice
+    docs[[0, 2, 3]]  # index by list of indices
+    docs[True, False, True, True, ...]  # index by boolean mask
 
-        del docs[0]  # remove first element from DocList
-        del docs[0:5]  # remove elements for 0 to 5 from DocList
+
+    # You can delete items from a DocList like a Python List
+
+    del docs[0]  # remove first element from DocList
+    del docs[0:5]  # remove elements for 0 to 5 from DocList
+    ```
 
     :param docs: iterable of Document
 
@@ -135,10 +137,10 @@ def construct(
         docs: Sequence[T_doc],
     ) -> T:
         """
-        Create a DocList without validation any data. The data must come from a
+        Create a `DocList` without validation any data. The data must come from a
         trusted source
         :param docs: a Sequence (list) of Document with the same schema
-        :return:
+        :return: a `DocList` object
         """
         new_docs = cls.__new__(cls)
         new_docs._data = docs if isinstance(docs, list) else list(docs)
@@ -154,13 +156,13 @@ def __eq__(self, other: Any) -> bool:
 
     def _validate_docs(self, docs: Iterable[T_doc]) -> Iterable[T_doc]:
         """
-        Validate if an Iterable of Document are compatible with this DocList
+        Validate if an Iterable of Document are compatible with this `DocList`
         """
         for doc in docs:
             yield self._validate_one_doc(doc)
 
     def _validate_one_doc(self, doc: T_doc) -> T_doc:
-        """Validate if a Document is compatible with this DocList"""
+        """Validate if a Document is compatible with this `DocList`"""
         if not issubclass(self.doc_type, AnyDoc) and not isinstance(doc, self.doc_type):
             raise ValueError(f'{doc} is not a {self.doc_type}')
         return doc
@@ -178,25 +180,25 @@ def __bytes__(self) -> bytes:
 
     def append(self, doc: T_doc):
         """
-        Append a Document to the DocList. The Document must be from the same class
-        as the doc_type of this DocList otherwise it will fail.
+        Append a Document to the `DocList`. The Document must be from the same class
+        as the `.doc_type` of this `DocList` otherwise it will fail.
         :param doc: A Document
         """
         self._data.append(self._validate_one_doc(doc))
 
     def extend(self, docs: Iterable[T_doc]):
         """
-        Extend a DocList with an Iterable of Document. The Documents must be from
-        the same class as the doc_type of this DocList otherwise it will
+        Extend a `DocList` with an Iterable of Document. The Documents must be from
+        the same class as the `.doc_type` of this `DocList` otherwise it will
         fail.
         :param docs: Iterable of Documents
         """
         self._data.extend(self._validate_docs(docs))
 
     def insert(self, i: int, doc: T_doc):
         """
-        Insert a Document to the DocList. The Document must be from the same
-        class as the doc_type of this DocList otherwise it will fail.
+        Insert a Document to the `DocList`. The Document must be from the same
+        class as the doc_type of this `DocList` otherwise it will fail.
         :param i: index to insert
         :param doc: A Document
         """
@@ -238,10 +240,10 @@ def _set_data_column(
         field: str,
         values: Union[List, T, 'AbstractTensor'],
     ):
-        """Set all Documents in this DocList using the passed values
+        """Set all Documents in this `DocList` using the passed values
 
         :param field: name of the fields to set
-        :values: the values to set at the DocList level
+        :values: the values to set at the `DocList` level
         """
         ...
 
@@ -253,11 +255,11 @@ def stack(
         tensor_type: Type['AbstractTensor'] = NdArray,
     ) -> 'DocVec':
         """
-        Convert the DocList into a DocVec. `Self` cannot be used
+        Convert the `DocList` into a `DocVec`. `Self` cannot be used
         afterwards
         :param tensor_type: Tensor Class used to wrap the doc_vec tensors. This is useful
         if the BaseDoc has some undefined tensor type like AnyTensor or Union of NdArray and TorchTensor
-        :return: A DocVec of the same document type as self
+        :return: A `DocVec` of the same document type as self
         """
         from docarray.array.doc_vec.doc_vec import DocVec
 
@@ -291,7 +293,7 @@ def traverse_flat(
     @classmethod
     def from_protobuf(cls: Type[T], pb_msg: 'DocListProto') -> T:
         """create a Document from a protobuf message
-        :param pb_msg: The protobuf message from where to construct the DocList
+        :param pb_msg: The protobuf message from where to construct the `DocList`
         """
         return super().from_protobuf(pb_msg)