-
Notifications
You must be signed in to change notification settings - Fork 224
Open
Open
Enhancement
Copy link
Labels
good first issueGood for newcomersGood for newcomershacktoberfesthelp wantedExtra attention is neededExtra attention is neededpython-sdkCocoIndex Python SDKCocoIndex Python SDK
Description
Some Background
Regarding data types / schemas, there're multiple forms:
- Python native type annotation, e.g.
int
,dict[str, Any]
, a specific data class. They're directly used in users code as type hints. AnalyzedTypeInfo
: basically a more structured representation of 1. Used by our Python SDK internally only.- Strong-typed schema representing in CocoIndex's type system, these classes, they mirror engine's data schema representation. They're exposed to some third party APIs, e.g. custom targets (custom target connectors can inspect schema of the data exporting to them), and also custom functions / sources in the future.
- Generic-typed JSON-equivalent values, in types such as
dict[str, Any]
(for JSON object),list[dict[str, Any]]
(for JSON array),str
(for JSON string), etc. They can be directly passed from/to engine in Rust.
Task
We have logic to convert Python's native type annotation to engine type. Currently we're doing 1->2->4 (code), because 3 was just introduced recently.
We want to:
- Change the logic of 2->4 to 2->3, i.e. convert
AnalyzedTypeInfo
to strong-typed schema representation first. This will make our code easier to read and maintain (3 is easier to build than 4, and can leverage mypy type checks etc.) - After got 3, existing callers can simply call the
encode()
method to get 4. So we don't have to expose convenient methods to directly return 4 in thetyping
package. - Tests in
test_typing.py
should be updated accordingly, to check the output of 3 instead of 4 (3 is more structured than 4, and easier to check).
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershacktoberfesthelp wantedExtra attention is neededExtra attention is neededpython-sdkCocoIndex Python SDKCocoIndex Python SDK