Convert numpy.float32 output values to float #656

ryanpmccaffrey · 2022-07-06T18:29:38Z

The output of certain huggingface models (e.g., pipelines with the task of token-classification) outputs values of type numpy.float32. Data of this type is not JSON serializable. Therefore, these models will fail when dumping the output, which is typically addressed with a post-processing step.

In mlserver-huggingface runtime.py before Line 86 could you add a post-processing fix to convert values from numpy.float32 to float?

Minimum viable code to reproduce issue:

import json
from transformers import pipeline

nlp = pipeline("ner", model='dslim/bert-base-NER')
output = nlp("My name is Wolfgang and I live in Berlin")
json.dumps(output)

TypeError: Object of type float32 is not JSON serializable

The text was updated successfully, but these errors were encountered:

ryanpmccaffrey · 2022-07-07T01:56:22Z

For reference:
numpy/numpy#16432

adriangonz · 2022-07-11T10:24:38Z

Hey @ryanpmccaffrey,

Thanks for raising this one.

MLServer supports the use of content types which let you describe how should MLServer treat each input and output. This is how, for instance, MLServer can encode back multiple response types on the MLflow runtime. However, it seems like the HuggingFace runtime forces every response back to be a JSON-encoded string.

We'll have a look into this to make the HF runtime more flexible (i.e. similar to how the MLServer-MLflow runtime behaves).

dtpryce · 2022-07-18T11:11:03Z

@adriangonz I have also recently seen a similar error when trying to create a custom runtime that might include numpy int types.

e.g.

async def predict(self, payload: types.InferenceRequest) -> types.InferenceResponse:
    request = self._extract_json(payload)[0]
    outcomes = self.clf.predict(request)
    output = {
        "id": request["id"],
        "target": outcome[0]
    }
    return types.InferenceResponse(
            id=payload.id,
            outputs=[
                types.ResponseOutput(
                    name=request['id'],
                    shape=[len(output)],
                    datatype="BYTES",
                    data=[output],
                    parameters=types.Parameters(content_type="str")
                )
            ]
        )

The key line is the use of output[0] from scikit-learn this variable is a numpy array and hence it's values numpy types and also hence not serializable.
I think this is outside the current content types since this is a custom input / output JSON-encoding where we input using JSON string and return JSON dict (essentially). The easy fix in this example is to change target vaue to list(output)[0] and then standard python types prevail. I wonder if within MLServer we should update the examples to discuss this or that we can extend the content types to types within types?

Just my thoughts, hope it can help! If you need I can raise a new bug?

adriangonz · 2022-07-19T16:17:50Z

Hey @dtpryce ,

Not sure I follow.

In this case, wouldn't you need to call json.dumps() on your output variable so that the output gets sent back as a string? If that's not the case, could you open up a separate issue?

Regarding the original issue, @ryanpmccaffrey this should be already fixed in #664, so I'll be closing this one.

pepesi · 2022-08-17T08:14:34Z

I have the same problem when I run some Huggingface models using SeldonCore.

After checking the output data, I made some changes in this pr #692

adriangonz closed this as completed Jul 19, 2022

pepesi mentioned this issue Oct 14, 2022

support json serialize all kind's of huggingface pipelines inputs/outputs #692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert numpy.float32 output values to float #656

Convert numpy.float32 output values to float #656

ryanpmccaffrey commented Jul 6, 2022

ryanpmccaffrey commented Jul 7, 2022

adriangonz commented Jul 11, 2022

dtpryce commented Jul 18, 2022

adriangonz commented Jul 19, 2022

pepesi commented Aug 17, 2022

Convert numpy.float32 output values to float #656

Convert numpy.float32 output values to float #656

Comments

ryanpmccaffrey commented Jul 6, 2022

ryanpmccaffrey commented Jul 7, 2022

adriangonz commented Jul 11, 2022

dtpryce commented Jul 18, 2022

adriangonz commented Jul 19, 2022

pepesi commented Aug 17, 2022