Skip to content

Commit

Permalink
chore(docs): support pydantic data model
Browse files Browse the repository at this point in the history
  • Loading branch information
hanxiao committed Jan 14, 2022
1 parent abb332b commit b3debde
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 3 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@

DocArray is a library for nested, unstructured data such as text, image, audio, video, or 3D mesh. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.

🌌 **All data types**: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
🌌 **Rich data types**: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.

🐍 **Pythonic experience**: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.

🧑‍🔬 **Data science powerhouse**: greatly accelerate data scientists' work on embedding, matching, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚡 **Portable**: ready-to-wire at anytime with efficient and compact serialization from/to Protobuf, bytes, base64, JSON, CSV, DataFrame.
🚡 **Portable**: ready-to-wire at anytime with fast and compressed serialization from/to Protobuf, bytes, base64, JSON, CSV, DataFrame. Built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices.

<!-- end elevator-pitch -->

Expand Down
6 changes: 6 additions & 0 deletions docs/fundamentals/document/serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,16 @@ One should use {ref}`DocumentArray for serializing multiple Documents<docarray-s

## From/to JSON

```{tip}
If you are building a webservice and want to use JSON for passing DocArray objects, then data validation and field-filtering can be crucial. In this case, it is highly recommended to check out {ref}`fastapi-support` and follow the methods there.
```

```{important}
This feature requires `protobuf` dependency. You can do `pip install "docarray[full]"` to install it.
```



You can serialize a Document as a JSON string via {meth}`~docarray.document.mixins.porting.PortingMixin.to_json`, and then read from it via {meth}`~docarray.document.mixins.porting.PortingMixin.from_json`.

```python
Expand Down
10 changes: 9 additions & 1 deletion docs/fundamentals/documentarray/serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@

DocArray is designed to be "ready-to-wire" at anytime. Serialization is important. DocumentArray provides multiple serialization methods that allows one transfer DocumentArray object over network and across different microservices.

- JSON string: `.from_json()`/`.to_json()`
- JSON string: `.from_json()`/`.to_json()`
- Pydantic model: `.from_pydantic_model()`/`.to_pydantic_model()`
- Bytes (compressed): `.from_bytes()`/`.to_bytes()`
- Base64 (compressed): `.from_base64()`/`.to_base64()`
- Protobuf Message: `.from_protobuf()`/`.to_protobuf()`
Expand All @@ -13,10 +14,17 @@ DocArray is designed to be "ready-to-wire" at anytime. Serialization is importan

## From/to JSON


```{tip}
If you are building a webservice and want to use JSON for passing DocArray objects, then data validation and field-filtering can be crucial. In this case, it is highly recommended to check out {ref}`fastapi-support` and follow the methods there.
```

```{important}
This feature requires `protobuf` dependency. You can do `pip install "docarray[full]"` to install it.
```



```python
from docarray import DocumentArray, Document

Expand Down
1 change: 1 addition & 0 deletions docs/fundamentals/fastapi-support/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(fastapi-support)=
# FastAPI/pydantic Support

Long story short, DocArray supports [pydantic data model](https://pydantic-docs.helpmanual.io/) via {class}`~docarray.document.pydantic_model.PydanticDocument` and {class}`~docarray.document.pydantic_model.PydanticDocumentArray`.
Expand Down
3 changes: 3 additions & 0 deletions docs/get-started/what-is.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ DocArray is designed to maximize the local experience, with the requirement of c
| Nested data ||||||
| Mixed data of the above four ||||||
| Easy to (de)serialize ||||||
| Data validation (of the output) ||||||
| Pythonic experience ||||✔️️||
| IO support for filetypes ||||||
| Deep learning framework support ||||||
Expand Down Expand Up @@ -118,6 +119,8 @@ Beside code refactoring and optimization, many features have been improved, incl
- revised documentations and examples
- ... and many more.

When first using DocArray, some Jina 2.x user may realize the static typing seems missing. This is due to a deliberate decision of DocArray: DocArray guarantees the types and constraints of the wire data, not the input data. In other words, only the functions that are listed under {ref}`docarray-serialization` chapter will trigger the data validation.

To learn DocArray, the recommendation here is to forget about everything in Jina 2.x, although some interfaces may look familiar. Read [the fundamental sections](../fundamentals/document/index.md) from beginning.

```{important}
Expand Down

0 comments on commit b3debde

Please sign in to comment.