Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't save bento when using transformers custom pipeline #2534

Closed
reniew opened this issue Jun 2, 2022 · 11 comments
Closed

Can't save bento when using transformers custom pipeline #2534

reniew opened this issue Jun 2, 2022 · 11 comments
Assignees

Comments

@reniew
Copy link

reniew commented Jun 2, 2022

Is your feature request related to a problem? Please describe.
Custom Transformer Pipeline is not available for bentoml.transformer.save(), since custom defined pipeline task can't pass task name validator.

Describe the solution you'd like
Pre-defined transformers task is too restrictive, it is more useful to applying custom pipeline.

@aarnphm
Copy link
Contributor

aarnphm commented Jun 2, 2022

Can you explain a bit more about your usecase? Have you tried the new API save_model from the rc releases?

@reniew
Copy link
Author

reniew commented Jun 2, 2022

Custom Piepeline

class MyPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        if "maybe_arg" in kwargs:
            preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
        return preprocess_kwargs, {}, {}

    def preprocess(self, inputs, maybe_arg=2):
        text = inputs['text']
        input_ids = self.tokenizer(text, return_tensors='pt')
        return input_ids

    def _forward(self, model_inputs):
        outputs = self.model(**model_inputs)
        return outputs

    def postprocess(self, model_outputs):
        return model_outputs.logits.softmax(1)[0, 1].item()

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer)

After define custom transformers pipeline, saving pipeline using save_model raise Error

bentoml.transformers.save_model('qna_model', my_pipeline)

Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-a44d1553bd60> in <module>
----> 1 bentoml.transformers.save_model('qna_model', my_pipeline)

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in save_model(name, pipeline, signatures, labels, custom_objects, metadata)
    240         framework_versions={"transformers": get_pkg_version("transformers")},
    241     )
--> 242     options = TransformersOptions(task=pipeline.task)
    243 
    244     if signatures is None:

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in __init__(self, task, pipeline, kwargs)
      8         _setattr('kwargs', __attr_factory_kwargs())
      9     if _config._run_validators is True:
---> 10         __attr_validator_task(self, __attr_task, self.task)
     11         __attr_validator_pipeline(self, __attr_pipeline, self.pipeline)

~/.local/lib/python3.8/site-packages/attr/_make.py in __call__(self, inst, attr, value)
   3094     def __call__(self, inst, attr, value):
   3095         for v in self._validators:
-> 3096             v(inst, attr, value)
   3097 
   3098 

~/.local/lib/python3.8/site-packages/bentoml/_internal/frameworks/transformers.py in <lambda>(instance, attribute, value)
     87         validator=[
     88             attr.validators.instance_of(str),
---> 89             lambda instance, attribute, value: transformers.pipelines.check_task(value),  # type: ignore
     90         ]
     91     )

~/.local/lib/python3.8/site-packages/transformers/pipelines/__init__.py in check_task(task)
    367         raise KeyError(f"Invalid translation task {task}, use 'translation_XX_to_YY' format")
    368 
--> 369     raise KeyError(f"Unknown task {task}, available tasks are {get_supported_tasks() + ['translation_XX_to_YY']}")
    370 
    371 

KeyError: "Unknown task , available tasks are ['audio-classification', 'automatic-speech-recognition', 'conversational', 'feature-extraction', 'fill-mask', 'image-classification', 'image-segmentation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text2text-generation', 'token-classification', 'translation', 'zero-shot-classification', 'zero-shot-image-classification', 'translation_XX_to_YY']"

I use 1.0.0rc0 version!

@aarnphm
Copy link
Contributor

aarnphm commented Jun 2, 2022

Thanks for the code sample. I will get back to you asap

@ssheng
Copy link
Collaborator

ssheng commented Jun 2, 2022

The current Transformers save_model implementation checks the pipeline task name against a set of tasks supported by Transformers. So custom pipeline will fail. @aarnphm and I will discuss offline and get back to you.

@ebbunnim
Copy link

@reniew cc. @aarnphm @ssheng

Hi, reniew.
After adding {task_name}.py file inside of src/transformers/pipelines/ folder, we have to specifying the task spec.
Did you update src/transformers/pipelines/init.py?

[Example for existed code]

    "text-classification": {  // nlp task name
        "impl": TextClassificationPipeline,
        "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
        "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": "distilbert-base-uncased-finetuned-sst-2-english",
                "tf": "distilbert-base-uncased-finetuned-sst-2-english",
            },
        },
        "type": "text",
    },

Based on the above process, I've made a customized pipeline and made a bento model successfully.

@reniew
Copy link
Author

reniew commented Jun 17, 2022

I don't understand updating src/transformers/pipelines/init.py.
Is it means updating my transformers package code? or give you task spec like above format?

@aarnphm
Copy link
Contributor

aarnphm commented Jun 17, 2022

Hmm, it seems from the transformers package that you have to manually update the SUPPORTED_TASK https://huggingface.co/docs/transformers/add_new_pipeline#adding-it-to-the-list-of-supported-tasks in the transformers package code.

@aarnphm
Copy link
Contributor

aarnphm commented Jun 17, 2022

I opened a ticket upstream huggingface/transformers#17762 since I believe this implementation should be supported from transformers itself. I will try to reach out to the huggingface team and we will see how it pans out.

fyi, we simplify the implementation for 1.0 where we only support the pipelines abstraction from transformers now. We did have support for saving models, tokenizer, config, etc. separately before, but that mingles with a lot of transformers internal implementation. We prefer not to coupling such logics into bentoml.

Obviously for supporting custom pipeline, it would be best if we save the model and all of the components of your pipeline to bentoml and leave loading your pipeline to you.

Me and @ssheng will discuss more.

@aarnphm aarnphm self-assigned this Jun 17, 2022
@reniew
Copy link
Author

reniew commented Jun 20, 2022

I try the other way that saving, loading custom transformers pipeline like below code, not adding new transformers pipeline.

class MyPipeline(TextClassificationPipeline):
    def preprocess(self, text, maybe_arg=2):
        input_ids = self.tokenizer(text, padding=True, truncation=True, return_tensors='pt')
        return input_ids

    def _forward(self, model_inputs):
        outputs = self.model(**model_inputs)
        return outputs.logits.softmax(1)[:, 1]

    def postprocess(self, model_outputs):
        return model_outputs.item()

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, task='text-classification')
my_pipeline(text_list)

tag = bentoml.transformers.save_model("custom-prediction", my_pipeline)
loaded = bentoml.transformers.load_model("custom-prediction:latest")

But after loading by bentoml.load_model, it return different result

my_pipeline(text_list)
# [0.06637582182884216, 0.5219951868057251]

loaded(text_list)
# [{'label': 'LABEL_0', 'score': 0.9336243271827698},
# {'label': 'LABEL_1', 'score': 0.5219951272010803}]

I think it cause by logic to load pipeline in bentoml.transformers.load_model

transformers.pipeline(task=pipeline_task, model=bento_model.path, **pipeline_kwargs)  # type: ignore

It seems to be not support custom pipeline class. In this situation, it would be better to use load by torch model not using transformers?

@aarnphm
Copy link
Contributor

aarnphm commented Jun 20, 2022

I think that is provably the case here. Since transformers itself doesnt have good support for custom pipeline, saving pytorch model and mimic the inference processing would be better here. I will follow up with the custom pipeline proposal on huggingface end.

@aarnphm
Copy link
Contributor

aarnphm commented Jun 30, 2022

BentoML now support custom pipelines and this will be included in rc3 releases. Thanks for opening this issue.

@aarnphm aarnphm closed this as completed Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants