Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support json serialize all kind's of huggingface pipelines inputs/outputs #692

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

pepesi
Copy link
Contributor

@pepesi pepesi commented Aug 17, 2022

huggingface_runtime output JSON serializer does not support NumPy basic datatypes when the data is a dict value

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pepesi ,

Nice spot! Thanks a lot for contributing this one.

Changes look good to me! 👍

Before merging though, would you be able to provide a test case that covers the issue that this PR fixes?

@pepesi pepesi marked this pull request as draft August 17, 2022 09:14
@pepesi pepesi marked this pull request as ready for review August 17, 2022 09:51
@pepesi
Copy link
Contributor Author

pepesi commented Aug 17, 2022

I fixed the lint error and add some tests for the NumpyEncoder, but I can't run the test successfully in my local environment because of the error ImportError: cannot import name 'deepspeed_reinit' from 'transformers.deepspeed'. Can I run tests on Github Actions?

@pepesi
Copy link
Contributor Author

pepesi commented Aug 17, 2022

I fixed the lint error and add some tests for the NumpyEncoder, but I can't run the test successfully in my local environment because of the error ImportError: cannot import name 'deepspeed_reinit' from 'transformers.deepspeed'. Can I run tests on Github Actions?

Tests passed on my Desktop

@pepesi
Copy link
Contributor Author

pepesi commented Aug 18, 2022

After test more huggingface models, I think the NumpyEncoder should renamed to HuggingfaceOutputJSONEncoder. Because huggingface pipeline output is not only numpy datatypes but also Pillow's Image, before more tests run, i'm not sure how many python types exists in pipeline's outputs.

@pepesi pepesi marked this pull request as draft August 18, 2022 09:54
@pepesi pepesi changed the title fix NumpyEncoder not supported types support json serialize all kind's of huggingface pipelines outputs Aug 24, 2022
@pepesi
Copy link
Contributor Author

pepesi commented Aug 24, 2022

A small script tested passed in my local, it seems ok now, but I won't add it to tests, because I think it's so heavy

from transformers.pipelines import pipeline
from transformers import Conversation
import json
import numpy as np
from PIL import Image
import io
import base64


class CommonJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
            return float(str(obj))
        if isinstance(obj, (np.int_, np.int8, np.int16, np.int32, np.int64)):
            return int(obj)
        if isinstance(obj, Image.Image):
            buf = io.BytesIO()
            obj.save(buf, format="png")
            return base64.b64encode(buf.getvalue()).decode()
        if isinstance(obj, Conversation):
            return {
                'uuid': str(obj.uuid),
                'past_user_inputs': obj.past_user_inputs,
                'generated_responses': obj.generated_responses,
                'new_user_input': obj.new_user_input
            }
        return json.JSONEncoder.default(self, obj)


sumarytext = "The tower is 324 metres (1,063 ft) tall, about\
the same height as an 81-storey building, and the tallest structure \
in Paris. Its base is square, measuring 125 metres (410 ft) on each side.\
During its construction, the Eiffel Tower surpassed the\
Washington Monument to become the tallest man-made structure\
in the world, a title it held for 41 years until the Chrysler\
Building in New York City was finished in 1930. It was the\
first structure to reach a height of 300 metres. Due to the\
addition of a broadcasting aerial at the top of the tower in\
1957, it is now taller than the Chrysler Building by 5.2 metres\
(17 ft). Excluding transmitters, the Eiffel Tower is the second\
tallest free-standing structure in France after the Millau Viaduct."


ALL_TASKS_TESTS = {
    "audio-classification": [
        {"args": (), "kwargs": {"inputs": "fixtures/audio.mp3"}}
    ],
    "automatic-speech-recognition": [
        {"args": (), "kwargs": {"inputs": "fixtures/audio.mp3"}}
    ],
    "feature-extraction": [
        {
            "args": (),
            "kwargs": {
                "inputs": "this pure text",
            },
        }
    ],
    "text-classification": [
        {
            "args": (
                "A greeting brings the care of the heart,\
        a blessing brings the care of the body, and a short\
        message brings the love. May you be lucky and happy,\
        and your life is sweet as honey, and your career is \
        successful and official!",
            ),
            "kwargs": {},
        }
    ],
    "token-classification": [
        {
            "args": (),
            "kwargs": {"inputs": "Hello I'm Omar and I live in Zürich."}
        }
    ],
    "question-answering": [
        {
            "args": (),
            "kwargs": {
                "question": "what's her job?",
                "context": "Her name is Lily, she is a singer",
            },
        }
    ],
    "table-question-answering": [
        {
            "args": (),
            "kwargs": {
                "table": {
                    "actors": [
                        "brad pitt",
                        "leonardo di caprio",
                        "george clooney"
                    ],
                    "age": ["56", "45", "59"],
                    "number of movies": ["87", "53", "69"],
                    "date of birth": [
                        "7 february 1967",
                        "10 june 1996",
                        "28 november 1967",
                    ],
                },
                "query": "when brad pitt born?",
            },
        }
    ],
    "visual-question-answering": [
        {
            "args": (),
            "kwargs": {
                "image": "fixtures/dogs.jpg",
                "question": "how many dogs here?"
            },
        }
    ],
    "fill-mask": [
        {
            "args": (),
            "kwargs": {
                "inputs": "i am come from <mask>",
            },
        }
    ],
    "summarization": [{"args": (sumarytext,), "kwargs": {}}],
    "translation_en_to_de": [
        {"args": ("My name is Sarah and I live in London",), "kwargs": {}}
    ],
    "text2text-generation": [
        {
            "args": (
                "question: What is 42 ? context: 42 is the answer to life,\
            the universe and everything",
            ),
            "kwargs": {},
        },
        {
            "args": ("translate from English to French: I'm very happy",),
            "kwargs": {}
        },
    ],
    "text-generation": [
        {"args": ("My name is Tylor Swift and I am",), "kwargs": {}}
    ],
    "zero-shot-classification": [
        {
            "args": (),
            "kwargs": {
                "sequences": "Last week I upgraded my iOS version and ever\
            since then my phone has been overheating whenever I\
            use your app.",
                "candidate_labels": [
                    "mobile",
                    "website",
                    "billing",
                    "account",
                    "access",
                ],
                "multi_class": False,
            },
        }
    ],
    "zero-shot-image-classification": [
        {
            "args": (),
            "kwargs": {
                "images": "fixtures/dogs.jpg",
                "candidate_labels": ["dogs", "cats", "tigers"],
            },
        }
    ],
    "conversational": [
        {
            "args": (),
            "kwargs": {
                "conversations": [
                    Conversation("Hello"),
                    Conversation("how are you"),
                    Conversation("where are you from"),
                ],
            },
        }
    ],
    "image-classification": [
        {
            "args": (),
            "kwargs": {
                "images": "fixtures/dogs.jpg",
            },
        }
    ],
    "image-segmentation": [
        {
            "args": (),
            "kwargs": {
                "inputs": "fixtures/dogs.jpg",
            },
        }
    ],
    "object-detection": [
        {
            "args": (),
            "kwargs": {
                "inputs": "fixtures/dogs.jpg",
            },
        }
    ],
}


all_types = {}
outs = []


def inspect_types(output):
    if isinstance(output, dict):
        for k, v in output.items():
            inspect_types(k)
            inspect_types(v)
    elif isinstance(output, (list, tuple)):
        for el in output:
            inspect_types(el)
    elif isinstance(output, (int, str, float)):
        all_types[type(output)] = 1
    else:
        all_types[type(output)] = 1


for task, argslist in ALL_TASKS_TESTS.items():
    print("-" * 80)
    print(task)
    p = pipeline(task)
    for arg in argslist:
        output = p(*arg["args"], **arg["kwargs"])
        inspect_types(output)
        outs.append(output)


print("all kinds of datatypes:")
print(list(all_types.keys()))


print(json.dumps(outs, cls=CommonJSONEncoder))

@pepesi pepesi marked this pull request as ready for review August 24, 2022 02:14
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pepesi ,

Massive thanks for the effort you're putting to make the encoder for HuggingFace models fully complete! It's certainly not an easy task! 🚀 💪

runtimes/huggingface/mlserver_huggingface/common.py Outdated Show resolved Hide resolved
@pepesi
Copy link
Contributor Author

pepesi commented Aug 25, 2022

timm is the requirement for some visualization-related Huggingface tasks, Pillow is one of the dependencies for it;

ffmpeg is the system requirement for some audio-related Huggingface tasks, I found there is no place to install single apt package for a specific runtime, So I add it to MLServer's basic runtime

torch-scatter for task table-question-anwser

@pepesi pepesi marked this pull request as draft August 29, 2022 02:14
@pepesi
Copy link
Contributor Author

pepesi commented Aug 29, 2022

WIP

@pepesi pepesi marked this pull request as ready for review October 10, 2022 01:43
@pepesi pepesi changed the title support json serialize all kind's of huggingface pipelines outputs support json serialize all kind's of huggingface pipelines inputs/outputs Oct 11, 2022
@pepesi pepesi force-pushed the bugfix-numpyencoder branch 2 times, most recently from 84f6be5 to a3176e7 Compare October 14, 2022 01:40
@pepesi
Copy link
Contributor Author

pepesi commented Oct 14, 2022

What changes in this PR

I'm using Seldon and MLServer in a project, that is aimed to fast deploy and experience any Hugginface model; when I test with some visual-related models, I go some JSON serialize error like this #656, At first, I just want fix the JSON serialize error. But when I get further, I get more trouble with inputs;

changes:

  1. add is_single filed in parameters to support decode request input as a single value, this field is now affected StringCodec NumpyCodec Base64Codec;
  2. add HuggingfaceRuntimeCodec, which can auto decode the inputs as pipelines args

May not be compatible
NumpyCodec always converts the inputs to a single np.ndarray before, now if no parameter is_single provided, NumpyCodec would decodes the inputs as an list.

A remedy here, change the is_sinle default value to None, then change NumpyCodec`s default behavior, if is_single is None, set is_single to True; but which may confusing

@adriangonz any advise here?

@adriangonz
Copy link
Contributor

Hey @pepesi ,

Thanks a lot for the time you've put on this PR - I know it hasn't been an easy journey!

It's grown quite massively, so it will take us a bit of time to review it. Particularly, considering that there are breaking changes to some of the existing functionality. We will do our best though!

Dockerfile Outdated
@@ -25,7 +25,7 @@ ENV MLSERVER_MODELS_DIR=/mnt/models \
RUN apt-get update && \
apt-get -y --no-install-recommends install \
unattended-upgrades \
libgomp1 libgl1-mesa-dev libglib2.0-0 build-essential && \
libgomp1 libgl1-mesa-dev libglib2.0-0 build-essential ffmpeg && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have now moved to a different base image for Docker, but it should already install ffmpeg (although in a diff way). Therefore, feel free to pick up master's version of the Dockerfile when rebasing 👍

Comment on lines 95 to 96
if request_input.parameters is not None:
new_input._parameters.update(request_input.parameters.dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about this change and exact motivation behind. It is true that currently there is no option to set parameters on the inputs and request as a whole but I am worried that modifying triton's internal parameters (as indicated by _) could lead to unexpected breaking changes in the future.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pepesi ,

I've finally found the time to make a first pass at this one.

Before anything else, I have to say this PR is really impressive! The attention to detail shown, and the thoroughness, is incredible! Massive thanks for spending the time on these changes! 🚀

I've added a couple comments below. Would be great to hear your thoughts on those ones. Besides that, I'm not 100% convinced on the introduction of the is_single parameter and wanted to hear your reasoning behind why it is required.

My view is that codecs should always operate with lists of things (i.e. batches), and then is up to the runtime to deal with these. This simplifies other features like batching, as well as simplifies the code, which doesn't need to deal with multiple types (i.e. type T or list of type T).

mlserver/codecs/numpy.py Outdated Show resolved Hide resolved
mlserver/codecs/utils.py Outdated Show resolved Hide resolved
Comment on lines 63 to 69
if JSONCodec.can_encode(data):
return JSONCodec
if ImageCodec.can_encode(data):
return ImageCodec
if ConversationCodec.can_encode(data):
return ConversationCodec
return find_input_codec_by_payload(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decorate them with @register_input_codec or @register_request_codec we should then be able to find them with find_input_codec(payload=data) (as in, they'll go into the general codec registry, which is not a bad thing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some data may encode by multiple codecs, but there is no priority with codecs, mlserver.codecs._find_codec_by_payload just return the first codecs matched; I want to find codecs in order with priority, and that's the reason;

If my understanding of the encoder registry is wrong, please let me know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Ok, that makes sense. In that case, it may be worth dropping a comment on that method to briefly explain that reasoning (i.e. and avoid people changing that in the future).


@register_input_codec
class ImageCodec(InputCodecTy):
ContentType = CONTENT_TYPE_IMAGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we need separate constants for these values? As in, instead of just keeping the actual value here and referring to ImageCodec.ContentType everywhere else?



@register_input_codec
class ImageCodec(InputCodecTy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that, to help future maintainers, we should split these codecs into their own files. As in, instead of keeping them all under mlserver_huggingface/codecs.py, we should create a new mlserver_huggingface/codecs package (i.e. folder), and then have:

  • mlserver_huggingface/codecs/image.py
  • mlserver_huggingface/codecs/conversation.py
  • mlserver_huggingface/codecs/json.py
  • ....

Each of these files could also have any codec-specific helpers (like get_img_bytes, which could live in mlserver_huggingface/codecs/image.py).

This will also help with the code review 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is right. I will revise it according to that

runtimes/huggingface/mlserver_huggingface/common.py Outdated Show resolved Hide resolved
runtimes/huggingface/mlserver_huggingface/runtime.py Outdated Show resolved Hide resolved
Comment on lines 95 to 96
if request_input.parameters is not None:
new_input._parameters.update(request_input.parameters.dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

We initially considered leveraging the _parameters field, but then decided not to, in order to avoid using Tritonclient's internal interface. Instead of that, the general advise when using mlserver infer is to just configure the content_type through the model's metadata (i.e. the model-settings.json file).

Would be great to hear your thoughts behind this change though.

runtimes/huggingface/mlserver_huggingface/codecs.py Outdated Show resolved Hide resolved
@pepesi
Copy link
Contributor Author

pepesi commented Nov 4, 2022

Firstly I agree with the opinion, codecs should always operate with lists of things (i.e. batches), and then is up to the runtime to deal with these. Before that, I didn't realize that it was a principle to follow. So I think I will remove is_single next


And then, Let me explain why I add is_single before.

For visual-question-answering and table-question-answering task pipelines, they support args like this pipeline(image=Image.Image, question=str), but codecs would decode the request data like this [Image.Image] [question] (a list). So I add the parameter to support decode data as a single value, which would make the codecs decode the request data to like Image.Image str(a single value).

due to adding the parameter is_single, triton httpclient request need to pass it to the server also, that why I modify InferInput._parameters. as @RafalSkolasinski said, it may cause breaking changes.

@adriangonz
Copy link
Contributor

adriangonz commented Nov 9, 2022

Got it! Thanks for that explanation @pepesi .

In that case, probably best is to handle the list -> single element logic within the HugginFace runtime itself. Actually, thinking about that, we may need to do that to support adaptive batching anyway (i.e. handle the case where an incoming request has more than a single data point). That's totally outside the scope of this PR though! Don't worry about that use case!

Looking forward to next batch of changes! Keep up the great work! We are getting closer. 🚀

@pepesi pepesi marked this pull request as draft November 17, 2022 06:20
@pepesi pepesi force-pushed the bugfix-numpyencoder branch 2 times, most recently from 0a99402 to 766a0cd Compare December 1, 2022 08:34
@pepesi pepesi marked this pull request as ready for review December 1, 2022 08:35
@pepesi pepesi requested review from adriangonz and removed request for axsaucedo December 1, 2022 08:36
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @pepesi ! Thanks a lot for making those changes.

I know this has been a long journey - what started as a fix for numpy, is now a full-blown update to the HF runtime - but it's such a great addition! Massive thanks for all the effort that has gone behind this contribution. 🚀

I think the changes looks great. I've added a minor question, but I don't think it's necessarily a blocker. It would be great to have your thoughts on that one. Besides that, once tests are green, this should be good to go ahead! 👍

runtimes/huggingface/tests/test_runtime.py Outdated Show resolved Hide resolved
@adriangonz
Copy link
Contributor

Thanks a lot for making those changes @pepesi. Once again, massive thanks for all the effort you've put into this one.

PR looks great and tests are green, so this should be good to go! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants