[REFACTOR] type annotation -> CocoIndex type encoding logic in Python SDK should return strong-typed schema class

# Some Background

Regarding data types / schemas, there're multiple forms:

1. Python native type annotation, e.g. `int`, `dict[str, Any]`, a specific data class. They're directly used in users code as type hints.
2. [`AnalyzedTypeInfo`](https://github.com/cocoindex-io/cocoindex/blob/d064c52ce8f3b5cf93d21f5a6213d09c8b0ee18f/python/cocoindex/typing.py#L237-L249): basically a more structured representation of 1. Used by our Python SDK internally only.
3. Strong-typed schema representing in CocoIndex's type system, [these classes](https://github.com/cocoindex-io/cocoindex/blob/d064c52ce8f3b5cf93d21f5a6213d09c8b0ee18f/python/cocoindex/typing.py#L514-L693), they mirror engine's data schema representation. They're exposed to some third party APIs, e.g. custom targets (custom target connectors can inspect schema of the data exporting to them), and also custom functions / sources in the future.
4. Generic-typed JSON-equivalent values, in types such as `dict[str, Any]` (for JSON object), `list[dict[str, Any]]` (for JSON array), `str` (for JSON string), etc. They can be directly passed from/to engine in Rust.

# Task

We have logic to convert Python's native type annotation to engine type. Currently we're doing 1->2->4 ([code](https://github.com/cocoindex-io/cocoindex/blob/d064c52ce8f3b5cf93d21f5a6213d09c8b0ee18f/python/cocoindex/typing.py#L357-L502)), because 3 was just introduced recently.

We want to:
- Change the logic of 2->4 to 2->3, i.e. convert `AnalyzedTypeInfo` to strong-typed schema representation first. This will make our code easier to read and maintain  (3 is easier to build than 4, and can leverage mypy type checks etc.)
- After got 3, existing callers can simply call the `encode()` method to get 4. So we don't have to expose convenient methods to directly return 4 in the `typing` package.
- Tests in [`test_typing.py`](https://github.com/cocoindex-io/cocoindex/blob/main/python/cocoindex/tests/test_typing.py) should be updated accordingly, to check the output of 3 instead of 4 (3 is more structured than 4, and easier to check).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REFACTOR] type annotation -> CocoIndex type encoding logic in Python SDK should return strong-typed schema class #1083

Some Background

Task

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REFACTOR] type annotation -> CocoIndex type encoding logic in Python SDK should return strong-typed schema class #1083

Description

Some Background

Task

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions