Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3748] feat: add New IODescriptor "TypedJSON" #3749

Closed

Conversation

KimSoungRyoul
Copy link
Contributor

@KimSoungRyoul KimSoungRyoul commented Apr 9, 2023

#3748

todo List after discussion approve?

  • under the JSON IO descriptor. not new IO descriptor.
  • use BentoML openapi.Schema not depend on pydantic
  • import typing_extensions.TypedDict if less than py3.8
  • docs
  • examples
# example how to use 


class IrisFeatures(t.TypedDict, total=False): 
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

    # Optional Field
    request_id: t.Optional[int]


@svc.api(
    input=JSON(
        typed_dict=IrisFeatures,
        sample={"aa": 123, "bb": "sdfsdf323sf"},
    ),
    output=JSON.from_sample({"qqqq": "wwww"}),
)
async def predict_json(features: IrisFeatures) -> JSON:
    if features.get("request_id"):
       features["request_id"] = uuid()
    bentoml_logger.info(f"----------------{features}")

    return {"qqqq2222": "wwww2222"}

@KimSoungRyoul KimSoungRyoul requested a review from a team as a code owner April 9, 2023 08:03
@KimSoungRyoul KimSoungRyoul requested review from aarnphm and removed request for a team April 9, 2023 08:03
@parano parano requested a review from bojiang April 10, 2023 19:29
@parano
Copy link
Member

parano commented Apr 10, 2023

Hi @KimSoungRyoul, thanks for suggesting the feature and the PR!

Like @aarnphm suggested in the issue, I'd recommend adding typed_dict option to the JSON IODescriptor, as an alternative to Pydantic for schema validation. The example usage you write in the PR description looks good!

One question I have is - does typed_dict allow the user to define the behavior when an extra field is found in the JSON message? With pydantic, user can do something like this:

       class IrisFeatures(BaseModel):
           sepal_len: float
           sepal_width: float
           petal_len: float
           petal_width: float

           # Optional field
           request_id: Optional[int]

           # Use custom Pydantic config for additional validation options
           class Config:
               extra = 'forbid'

Does the same config field work when we use pydantic.create_model_from_typeddict under the hood?

@KimSoungRyoul
Copy link
Contributor Author

KimSoungRyoul commented Apr 13, 2023

@parano

yes TypedDict can allow and forbid extra field, when use total=False/True option

TypedDict is simply validated by mypy. (not in runtime like a pydantic)

# example service.py


import typing as t


class IrisFeatures(t.TypedDict, total=False): # default is total=True
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

    # Optional Field
    request_id: t.Optional[int]


@svc.api(
    input=JSON(
        typed_dict=IrisFeatures,
        sample={"aa": 123, "bb": "sdfsdf323sf"},
    ),
    output=JSON.from_sample({"qqqq": "wwww"}),
)
async def predict_json(features: IrisFeatures) -> JSON:
    # there is no error, when we run 'bentoml serve service:svc' but mypy command raise error
    print(features["not_defined_attr"])  # <-- [mypy](service.py:65: error: TypedDict "IrisFeatures" has no key "not_defined_attr"  [typeddict-item])
    
    
    # "mypy service.py" raise error, when IrisFeatures don't have option "total=True"
    typed_dict_features: IrisFeatures =  { # < -- error: Missing key "request_id" for TypedDict "IrisFeatures"  [typeddict-item]
        "sepal_len": 3.0,
        "sepal_width": 2.0,
        "petal_len": 1.0,
        "petal_width": 4.0,
        # "request_id": "~~~"
    }

    print(typed_dict_features)

    bentoml_logger.info(f"----------------{features}")

    return {"qqqq2222": "wwww2222"}

unlike pydantic, typedDict only check during the typed checking phase
At runtime, it is the same as a dict in python (and cannot even be checked when crate new instance(dict) : AFAIK)

@KimSoungRyoul
Copy link
Contributor Author

KimSoungRyoul commented Apr 13, 2023

CI / python3.7_http_e2e_tests (windows-latest) (pull_request) is failed caused by 422 (too many request)

how can I rerun ?

++

python/cpython#83015

TypedDict has issue with Optional

due to this issue, pydantic has imposed a restriction, if we don't impl our own typeddict_to_schema() function inside bentoml (instead of pydantic.create_model_from_typeddict()),
bentoml users will get an error when using TypedDict in python versions between py3.8 to py3.9.2.

# save to use TypedDict 
if sys.version_info >= (3, 9, 2):
    from typing import TypedDict, Optional 
else:
    from typing_extensions import TypedDict

# but
#  from typing_extensions import TypedDict <-- if use these import in py3.8.x  
#   File "pydantic/annotated_types.py", line 36, in pydantic.annotated_types.create_model_from_typeddict
# TypeError: You should use `typing_extensions.TypedDict` instead of `typing.TypedDict` with Python < 3.9.2. Without it, # there is no way to differentiate required and optional fields when subclassed.


class IrisFeatures(TypedDict):
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

   # Optional Field
    request_id: Optional[int]

@svc.api(
    input=JSON(
        typed_dict=IrisFeatures,
       # sample={"aa": 123, "bb": "sdfsdf323sf"},
    ),
    output=JSON.from_sample({"qqqq": "wwww"}),
)
async def predict_json(features: IrisFeatures) -> JSON:

therefore second todoList in PR Description might be implemented .... 😢

  • use BentoML openapi.Schema not depend on pydantic

Copy link
Member

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I think we should just wait until 3.7 reaches EOL. cc @parano

@@ -180,6 +186,8 @@ def __init__(
self,
*,
pydantic_model: t.Type[pydantic.BaseModel] | None = None,
typed_dict: t.Type[TypedDict] | None = None,
sample: JSONType | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not put sample here since there is no need to.

Suggested change
sample: JSONType | None = None,

Comment on lines +215 to +218
if typed_dict:
from pydantic import create_model_from_typeddict

self._typed_dict_model = create_model_from_typeddict(typed_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean typed_dict depends on pydantic? it seems kinda strange that in order to use TypedDict you need to have pydantic installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case then typed_dict and pydantic_model should be mutually exclusive then.

Might also need a import check or just lazy load it

Comment on lines +9 to +12
if sys.version_info >= (3, 9, 2):
from typing import TypedDict
else:
from typing_extensions import TypedDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 3.7 will reach EOL in 2 months, it is probably best not to do this and just use t.TypedDict instead when BentoML drops 3.7 support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is checking for 3.9.2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, read the above comment. Looks like this will have to stay in even if we drop 3.7 support? Definitely not a reason to delay this PR, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also need to maintain typing_extensions in pyproject.toml

@parano
Copy link
Member

parano commented Apr 17, 2023

CI / python3.7_http_e2e_tests (windows-latest) (pull_request) is failed caused by 422 (too many request)

how can I rerun ?

Just triggered a rerun on the failed e2e test

@KimSoungRyoul
Copy link
Contributor Author

KimSoungRyoul commented Apr 25, 2023

Hi I've been busy at work lately and haven't been paying attention.
I feel positive about reflecting this feature after dropping python 3.7.
Also, separately, the current PR should be worked on to change to typedDict -> OAS3.0 Schema without the dependency on pydantic.
but I have a lot on my plate these days 😭 , so I'll close this PR temporarily and ask review again when the work is done.

P.S. TypedDict appears to have an unstable internal implementation between 3.8 and 3.11 in cpython history,
so we (bentoml) recommend using at typed_extensions in 3.11 and below to avoid having to deal with unstable internal implementation on a case-by-case basis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants