Can't save bento when using transformers custom pipeline #2534

reniew · 2022-06-02T04:41:46Z

Is your feature request related to a problem? Please describe.
Custom Transformer Pipeline is not available for bentoml.transformer.save(), since custom defined pipeline task can't pass task name validator.

Describe the solution you'd like
Pre-defined transformers task is too restrictive, it is more useful to applying custom pipeline.

The text was updated successfully, but these errors were encountered:

aarnphm · 2022-06-02T05:45:14Z

Can you explain a bit more about your usecase? Have you tried the new API save_model from the rc releases?

reniew · 2022-06-02T06:29:38Z

Custom Piepeline

class MyPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        if "maybe_arg" in kwargs:
            preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
        return preprocess_kwargs, {}, {}

    def preprocess(self, inputs, maybe_arg=2):
        text = inputs['text']
        input_ids = self.tokenizer(text, return_tensors='pt')
        return input_ids

    def _forward(self, model_inputs):
        outputs = self.model(**model_inputs)
        return outputs

    def postprocess(self, model_outputs):
        return model_outputs.logits.softmax(1)[0, 1].item()

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer)

After define custom transformers pipeline, saving pipeline using save_model raise Error

bentoml.transformers.save_model('qna_model', my_pipeline)

Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-a44d1553bd60> in <module>
----> 1 bentoml.transformers.save_model('qna_model', my_pipeline)

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in save_model(name, pipeline, signatures, labels, custom_objects, metadata)
    240         framework_versions={"transformers": get_pkg_version("transformers")},
    241     )
--> 242     options = TransformersOptions(task=pipeline.task)
    243 
    244     if signatures is None:

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in __init__(self, task, pipeline, kwargs)
      8         _setattr('kwargs', __attr_factory_kwargs())
      9     if _config._run_validators is True:
---> 10         __attr_validator_task(self, __attr_task, self.task)
     11         __attr_validator_pipeline(self, __attr_pipeline, self.pipeline)

~/.local/lib/python3.8/site-packages/attr/_make.py in __call__(self, inst, attr, value)
   3094     def __call__(self, inst, attr, value):
   3095         for v in self._validators:
-> 3096             v(inst, attr, value)
   3097 
   3098 

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in <lambda>(instance, attribute, value)
     87         validator=[
     88             attr.validators.instance_of(str),
---> 89             lambda instance, attribute, value: transformers.pipelines.check_task(value),  # type: ignore
     90         ]
     91     )

~/.local/lib/python3.8/site-packages/transformers/pipelines/__init__.py in check_task(task)
    367         raise KeyError(f"Invalid translation task {task}, use 'translation_XX_to_YY' format")
    368 
--> 369     raise KeyError(f"Unknown task {task}, available tasks are {get_supported_tasks() + ['translation_XX_to_YY']}")
    370 
    371 

KeyError: "Unknown task , available tasks are ['audio-classification', 'automatic-speech-recognition', 'conversational', 'feature-extraction', 'fill-mask', 'image-classification', 'image-segmentation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text2text-generation', 'token-classification', 'translation', 'zero-shot-classification', 'zero-shot-image-classification', 'translation_XX_to_YY']"

I use 1.0.0rc0 version!

aarnphm · 2022-06-02T07:17:44Z

Thanks for the code sample. I will get back to you asap

ssheng · 2022-06-02T10:20:31Z

The current Transformers save_model implementation checks the pipeline task name against a set of tasks supported by Transformers. So custom pipeline will fail. @aarnphm and I will discuss offline and get back to you.

ebbunnim · 2022-06-17T08:47:21Z

@reniew cc. @aarnphm @ssheng

Hi, reniew.
After adding {task_name}.py file inside of src/transformers/pipelines/ folder, we have to specifying the task spec.
Did you update src/transformers/pipelines/init.py?

[Example for existed code]

    "text-classification": {  // nlp task name
        "impl": TextClassificationPipeline,
        "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
        "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": "distilbert-base-uncased-finetuned-sst-2-english",
                "tf": "distilbert-base-uncased-finetuned-sst-2-english",
            },
        },
        "type": "text",
    },

Based on the above process, I've made a customized pipeline and made a bento model successfully.

reniew · 2022-06-17T09:13:12Z

I don't understand updating src/transformers/pipelines/init.py.
Is it means updating my transformers package code? or give you task spec like above format?

aarnphm · 2022-06-17T16:47:41Z

Hmm, it seems from the transformers package that you have to manually update the SUPPORTED_TASK https://huggingface.co/docs/transformers/add_new_pipeline#adding-it-to-the-list-of-supported-tasks in the transformers package code.

aarnphm · 2022-06-17T17:10:22Z

I opened a ticket upstream huggingface/transformers#17762 since I believe this implementation should be supported from transformers itself. I will try to reach out to the huggingface team and we will see how it pans out.

fyi, we simplify the implementation for 1.0 where we only support the pipelines abstraction from transformers now. We did have support for saving models, tokenizer, config, etc. separately before, but that mingles with a lot of transformers internal implementation. We prefer not to coupling such logics into bentoml.

Obviously for supporting custom pipeline, it would be best if we save the model and all of the components of your pipeline to bentoml and leave loading your pipeline to you.

Me and @ssheng will discuss more.

reniew · 2022-06-20T05:19:32Z

I try the other way that saving, loading custom transformers pipeline like below code, not adding new transformers pipeline.

class MyPipeline(TextClassificationPipeline):
    def preprocess(self, text, maybe_arg=2):
        input_ids = self.tokenizer(text, padding=True, truncation=True, return_tensors='pt')
        return input_ids

    def _forward(self, model_inputs):
        outputs = self.model(**model_inputs)
        return outputs.logits.softmax(1)[:, 1]

    def postprocess(self, model_outputs):
        return model_outputs.item()

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, task='text-classification')
my_pipeline(text_list)

tag = bentoml.transformers.save_model("custom-prediction", my_pipeline)
loaded = bentoml.transformers.load_model("custom-prediction:latest")

But after loading by bentoml.load_model, it return different result

my_pipeline(text_list)
# [0.06637582182884216, 0.5219951868057251]

loaded(text_list)
# [{'label': 'LABEL_0', 'score': 0.9336243271827698},
# {'label': 'LABEL_1', 'score': 0.5219951272010803}]

I think it cause by logic to load pipeline in bentoml.transformers.load_model

transformers.pipeline(task=pipeline_task, model=bento_model.path, **pipeline_kwargs)  # type: ignore

It seems to be not support custom pipeline class. In this situation, it would be better to use load by torch model not using transformers?

aarnphm · 2022-06-20T05:31:39Z

I think that is provably the case here. Since transformers itself doesnt have good support for custom pipeline, saving pytorch model and mimic the inference processing would be better here. I will follow up with the custom pipeline proposal on huggingface end.

aarnphm · 2022-06-30T23:01:30Z

BentoML now support custom pipelines and this will be included in rc3 releases. Thanks for opening this issue.

yubozhao assigned ssheng Jun 10, 2022

aarnphm self-assigned this Jun 17, 2022

ssheng mentioned this issue Jun 27, 2022

feat: Support Transformers custom pipeline #2640

Merged

5 tasks

aarnphm closed this as completed Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't save bento when using transformers custom pipeline #2534

Can't save bento when using transformers custom pipeline #2534

reniew commented Jun 2, 2022

aarnphm commented Jun 2, 2022

reniew commented Jun 2, 2022

aarnphm commented Jun 2, 2022

ssheng commented Jun 2, 2022

ebbunnim commented Jun 17, 2022

reniew commented Jun 17, 2022

aarnphm commented Jun 17, 2022

aarnphm commented Jun 17, 2022

reniew commented Jun 20, 2022

aarnphm commented Jun 20, 2022

aarnphm commented Jun 30, 2022

Can't save bento when using transformers custom pipeline #2534

Can't save bento when using transformers custom pipeline #2534

Comments

reniew commented Jun 2, 2022

aarnphm commented Jun 2, 2022

reniew commented Jun 2, 2022

aarnphm commented Jun 2, 2022

ssheng commented Jun 2, 2022

ebbunnim commented Jun 17, 2022

reniew commented Jun 17, 2022

aarnphm commented Jun 17, 2022

aarnphm commented Jun 17, 2022

reniew commented Jun 20, 2022

aarnphm commented Jun 20, 2022

aarnphm commented Jun 30, 2022