diff --git a/docs/source/index.rst b/docs/source/index.rst index 26e950875ef97e..30ad430f4f3be3 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -207,3 +207,4 @@ conversion utilities for the following models: model_doc/dpr internal/modeling_utils internal/tokenization_utils + internal/pipelines_utils \ No newline at end of file diff --git a/docs/source/internal/pipelines_utils.rst b/docs/source/internal/pipelines_utils.rst new file mode 100644 index 00000000000000..c6fda75803c291 --- /dev/null +++ b/docs/source/internal/pipelines_utils.rst @@ -0,0 +1,40 @@ +Utilities for pipelines +----------------------- + +This page lists all the utility functions the library provides for pipelines. + +Most of those are only useful if you are studying the code of the models in the library. + + +Argument handling +~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.pipelines.ArgumentHandler + +.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler + +.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler + + +Data format +~~~~~~~~~~~ + +.. autoclass:: transformers.pipelines.PipelineDataFormat + :members: + +.. autoclass:: transformers.pipelines.CsvPipelineDataFormat + :members: + +.. autoclass:: transformers.pipelines.JsonPipelineDataFormat + :members: + +.. autoclass:: transformers.pipelines.PipedPipelineDataFormat + :members: + + +Utilities +~~~~~~~~~ + +.. autofunction:: transformers.pipelines.get_framework + +.. autoclass:: transformers.pipelines.PipelineException diff --git a/docs/source/main_classes/model.rst b/docs/source/main_classes/model.rst index bea43e94f65ae6..d89e788f191b77 100644 --- a/docs/source/main_classes/model.rst +++ b/docs/source/main_classes/model.rst @@ -41,3 +41,9 @@ The other methods that are common to each model are defined in :class:`~transfor .. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin :members: + + +Generative models +~~~~~~~~~~~~~~~~~ + +Coming soon diff --git a/docs/source/main_classes/pipelines.rst b/docs/source/main_classes/pipelines.rst index 214858fb5abe25..067b7eca9308b0 100644 --- a/docs/source/main_classes/pipelines.rst +++ b/docs/source/main_classes/pipelines.rst @@ -3,13 +3,23 @@ Pipelines The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity -Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. +Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the +:doc:`task summary <../task_summary>` for examples of use. There are two categories of pipeline abstractions to be aware about: -- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines -- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline` - or :class:`~transformers.QuestionAnsweringPipeline` +- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines. +- The other task-specific pipelines: + + - :class:`~transformers.ConversationalPipeline` + - :class:`~transformers.FeatureExtractionPipeline` + - :class:`~transformers.FillMaskPipeline` + - :class:`~transformers.QuestionAnsweringPipeline` + - :class:`~transformers.SummarizationPipeline` + - :class:`~transformers.TextClassificationPipeline` + - :class:`~transformers.TextGenerationPipeline` + - :class:`~transformers.TokenClassificationPipeline` + - :class:`~transformers.TranslationPipeline` The pipeline abstraction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -21,61 +31,75 @@ other pipeline but requires an additional argument which is the `task`. The task specific pipelines -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Parent class: Pipeline -========================================= - -.. autoclass:: transformers.Pipeline - :members: predict, transform, save_pretrained +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TokenClassificationPipeline +ConversationalPipeline ========================================== -.. autoclass:: transformers.TokenClassificationPipeline +.. autoclass:: transformers.Conversation -NerPipeline +.. autoclass:: transformers.ConversationalPipeline + :special-members: __call__ + :members: + +FeatureExtractionPipeline ========================================== -This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined above. Please refer to that pipeline for -documentation and usage examples. +.. autoclass:: transformers.FeatureExtractionPipeline + :special-members: __call__ + :members: FillMaskPipeline ========================================== .. autoclass:: transformers.FillMaskPipeline + :special-members: __call__ + :members: -FeatureExtractionPipeline -========================================== - -.. autoclass:: transformers.FeatureExtractionPipeline - -TextClassificationPipeline +NerPipeline ========================================== -.. autoclass:: transformers.TextClassificationPipeline +This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined below. Please refer to that +pipeline for documentation and usage examples. QuestionAnsweringPipeline ========================================== .. autoclass:: transformers.QuestionAnsweringPipeline - + :special-members: __call__ + :members: SummarizationPipeline ========================================== .. autoclass:: transformers.SummarizationPipeline + :special-members: __call__ + :members: +TextClassificationPipeline +========================================== + +.. autoclass:: transformers.TextClassificationPipeline + :special-members: __call__ + :members: TextGenerationPipeline ========================================== .. autoclass:: transformers.TextGenerationPipeline + :special-members: __call__ + :members: - -ConversationalPipeline +TokenClassificationPipeline ========================================== -.. autoclass:: transformers.Conversation +.. autoclass:: transformers.TokenClassificationPipeline + :special-members: __call__ + :members: + -.. autoclass:: transformers.ConversationalPipeline \ No newline at end of file +Parent class: :obj:`Pipeline` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.Pipeline + :members: diff --git a/src/transformers/pipelines.py b/src/transformers/pipelines.py index b40f734ef2b672..3cd252fd8f4a33 100755 --- a/src/transformers/pipelines.py +++ b/src/transformers/pipelines.py @@ -33,7 +33,7 @@ from .configuration_auto import AutoConfig from .configuration_utils import PretrainedConfig from .data import SquadExample, squad_convert_examples_to_features -from .file_utils import is_tf_available, is_torch_available +from .file_utils import add_end_docstrings, is_tf_available, is_torch_available from .modelcard import ModelCard from .tokenization_auto import AutoTokenizer from .tokenization_bert import BasicTokenizer @@ -82,8 +82,13 @@ def get_framework(model=None): - """ Select framework (TensorFlow/PyTorch) to use. - If both frameworks are installed and no specific model is provided, defaults to using PyTorch. + """ + Select framework (TensorFlow or PyTorch) to use. + + Args: + model (:obj:`str`, :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`, `optional`): + If both frameworks are installed, picks the one corresponding to the model passed (either a model class or + the model name). If no specific model is provided, defaults to using PyTorch. """ if is_tf_available() and is_torch_available() and model is not None and not isinstance(model, str): # Both framework are available but the user supplied a model class instance. @@ -103,7 +108,12 @@ def get_framework(model=None): class PipelineException(Exception): """ - Raised by pipelines when handling __call__ + Raised by a :class:`~transformers.Pipeline` when handling __call__. + + Args: + task (:obj:`str`): The task of the pipeline. + model (:obj:`str`): The model used by the pipeline. + reason (:obj:`str`): The error message to display. """ def __init__(self, task: str, model: str, reason: str): @@ -115,7 +125,7 @@ def __init__(self, task: str, model: str, reason: str): class ArgumentHandler(ABC): """ - Base interface for handling varargs for each Pipeline + Base interface for handling arguments for each :class:`~transformers.pipelines.Pipeline`. """ @abstractmethod @@ -125,7 +135,7 @@ def __call__(self, *args, **kwargs): class DefaultArgumentHandler(ArgumentHandler): """ - Default varargs argument parser handling parameters for each Pipeline + Default argument parser handling parameters for each :class:`~transformers.pipelines.Pipeline`. """ @staticmethod @@ -178,18 +188,25 @@ class PipelineDataFormat: """ Base class for all the pipeline supported data format both for reading and writing. Supported data formats currently includes: - - JSON - - CSV - - stdin/stdout (pipe) + - JSON + - CSV + - stdin/stdout (pipe) - PipelineDataFormat also includes some utilities to work with multi-columns like mapping from datasets columns - to pipelines keyword arguments through the `dataset_kwarg_1=dataset_column_1` format. + :obj:`PipelineDataFormat` also includes some utilities to work with multi-columns like mapping from datasets + columns to pipelines keyword arguments through the :obj:`dataset_kwarg_1=dataset_column_1` format. + + Args: + output_path (:obj:`str`, `optional`): Where to save the outgoing data. + input_path (:obj:`str`, `optional`): Where to look for the input data. + column (:obj:`str`, `optional`): The column to read. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to overwrite the :obj:`output_path`. """ SUPPORTED_FORMATS = ["json", "csv", "pipe"] def __init__( - self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, + self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite: bool = False, ): self.output_path = output_path self.input_path = input_path @@ -212,19 +229,25 @@ def __iter__(self): raise NotImplementedError() @abstractmethod - def save(self, data: dict): + def save(self, data: Union[dict, List[dict]]): """ - Save the provided data object with the representation for the current `DataFormat`. - :param data: data to store - :return: + Save the provided data object with the representation for the current + :class:`~transformers.pipelines.PipelineDataFormat`. + + Args: + data (:obj:`dict` or list of :obj:`dict`): The data to store. """ raise NotImplementedError() def save_binary(self, data: Union[dict, List[dict]]) -> str: """ Save the provided data object as a pickle-formatted binary data on the disk. - :param data: data to store - :return: (str) Path where the data has been saved + + Args: + data (:obj:`dict` or list of :obj:`dict`): The data to store. + + Returns: + :obj:`str`: Path where the data has been saved. """ path, _ = os.path.splitext(self.output_path) binary_path = os.path.extsep.join((path, "pickle")) @@ -237,7 +260,26 @@ def save_binary(self, data: Union[dict, List[dict]]) -> str: @staticmethod def from_str( format: str, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, - ): + ) -> "PipelineDataFormat": + """ + Creates an instance of the right subclass of :class:`~transformers.pipelines.PipelineDataFormat` depending + on :obj:`format`. + + Args: + format: (:obj:`str`): + The format of the desired pipeline. Acceptable values are :obj:`"json"`, :obj:`"csv"` or :obj:`"pipe"`. + output_path (:obj:`str`, `optional`): + Where to save the outgoing data. + input_path (:obj:`str`, `optional`): + Where to look for the input data. + column (:obj:`str`, `optional`): + The column to read. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to overwrite the :obj:`output_path`. + + Returns: + :class:`~transformers.pipelines.PipelineDataFormat`: The proper data format. + """ if format == "json": return JsonPipelineDataFormat(output_path, input_path, column, overwrite=overwrite) elif format == "csv": @@ -249,6 +291,17 @@ def from_str( class CsvPipelineDataFormat(PipelineDataFormat): + """ + Support for pipelines using CSV data format. + + Args: + output_path (:obj:`str`, `optional`): Where to save the outgoing data. + input_path (:obj:`str`, `optional`): Where to look for the input data. + column (:obj:`str`, `optional`): The column to read. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to overwrite the :obj:`output_path`. + """ + def __init__( self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, ): @@ -264,6 +317,13 @@ def __iter__(self): yield row[self.column[0]] def save(self, data: List[dict]): + """ + Save the provided data object with the representation for the current + :class:`~transformers.pipelines.PipelineDataFormat`. + + Args: + data (:obj:`List[dict]`): The data to store. + """ with open(self.output_path, "w") as f: if len(data) > 0: writer = csv.DictWriter(f, list(data[0].keys())) @@ -272,6 +332,17 @@ def save(self, data: List[dict]): class JsonPipelineDataFormat(PipelineDataFormat): + """ + Support for pipelines using JSON file format. + + Args: + output_path (:obj:`str`, `optional`): Where to save the outgoing data. + input_path (:obj:`str`, `optional`): Where to look for the input data. + column (:obj:`str`, `optional`): The column to read. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to overwrite the :obj:`output_path`. + """ + def __init__( self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, ): @@ -288,6 +359,12 @@ def __iter__(self): yield entry[self.column[0]] def save(self, data: dict): + """ + Save the provided data object in a json file. + + Args: + data (:obj:`dict`): The data to store. + """ with open(self.output_path, "w") as f: json.dump(data, f) @@ -298,6 +375,13 @@ class PipedPipelineDataFormat(PipelineDataFormat): For multi columns data, columns should separated by \t If columns are provided, then the output will be a dictionary with {column_x: value_x} + + Args: + output_path (:obj:`str`, `optional`): Where to save the outgoing data. + input_path (:obj:`str`, `optional`): Where to look for the input data. + column (:obj:`str`, `optional`): The column to read. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to overwrite the :obj:`output_path`. """ def __iter__(self): @@ -317,6 +401,12 @@ def __iter__(self): yield line def save(self, data: dict): + """ + Print the data. + + Args: + data (:obj:`dict`): The data to store. + """ print(data) def save_binary(self, data: Union[dict, List[dict]]) -> str: @@ -343,24 +433,7 @@ def predict(self, X): raise NotImplementedError() -class Pipeline(_ScikitCompat): - """ - The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across - different pipelines. - - Base class implementing pipelined operations. - Pipeline workflow is defined as a sequence of the following operations: - - Input -> Tokenization -> Model Inference -> Post-Processing (Task dependent) -> Output - - Pipeline supports running on CPU or GPU through the device argument. Users can specify - device argument as an integer, -1 meaning "CPU", >= 0 referring the CUDA device ordinal. - - Some pipeline, like for instance FeatureExtractionPipeline ('feature-extraction') outputs large - tensor object as nested-lists. In order to avoid dumping such large structure as textual data we - provide the binary_output constructor argument. If set to True, the output will be stored in the - pickle format. - +PIPELINE_INIT_ARGS = r""" Arguments: model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from @@ -369,28 +442,44 @@ class Pipeline(_ScikitCompat): tokenizer (:obj:`~transformers.PreTrainedTokenizer`): The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): + modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`): Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. + framework (:obj:`str`, `optional`): + The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework + must be installed. If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): + and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no + model is provided. + task (:obj:`str`, defaults to :obj:`""`): + A task-identifier for the pipeline. + args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`): Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model + device (:obj:`int`, `optional`, defaults to -1): + Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id. binary_output (:obj:`bool`, `optional`, defaults to :obj:`False`): - Flag indicating if the output the pipeline should happen in a binary format (i.e. pickle) or as raw text. + Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text. +""" - Return: - :obj:`List` or :obj:`Dict`: - Pipeline returns list or dictionary depending on: - - Whether the user supplied multiple samples - - Whether the pipeline exposes multiple fields in the output object +@add_end_docstrings(PIPELINE_INIT_ARGS) +class Pipeline(_ScikitCompat): + """ + The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across + different pipelines. + + Base class implementing pipelined operations. + Pipeline workflow is defined as a sequence of the following operations: + + Input -> Tokenization -> Model Inference -> Post-Processing (task dependent) -> Output + + Pipeline supports running on CPU or GPU through the device argument (see below). + + Some pipeline, like for instance :class:`~transformers.FeatureExtractionPipeline` (:obj:`'feature-extraction'` ) + output large tensor object as nested-lists. In order to avoid dumping such large structure as textual data we + provide the :obj:`binary_output` constructor argument. If set to :obj:`True`, the output will be stored in the + pickle format. """ default_input_names = None @@ -408,7 +497,7 @@ def __init__( ): if framework is None: - framework = get_framework() + framework = get_framework(model) self.task = task self.model = model @@ -428,9 +517,13 @@ def __init__( if task_specific_params is not None and task in task_specific_params: self.model.config.update(task_specific_params.get(task)) - def save_pretrained(self, save_directory): + def save_pretrained(self, save_directory: str): """ - Save the pipeline's model and tokenizer to the specified save_directory + Save the pipeline's model and tokenizer. + + Args: + save_directory (:obj:`str`): + A path to the directory where to saved. It will be created if it doesn't exist. """ if os.path.isfile(save_directory): logger.error("Provided path ({}) should be a directory, not a file".format(save_directory)) @@ -458,14 +551,17 @@ def predict(self, X): def device_placement(self): """ Context Manager allowing tensor allocation on the user-specified device in framework agnostic way. - example: - # Explicitly ask for tensor allocation on CUDA device :0 - nlp = pipeline(..., device=0) - with nlp.device_placement(): - # Every framework specific tensor allocation will be done on the request device - output = nlp(...) + Returns: Context manager + + Examples:: + + # Explicitly ask for tensor allocation on CUDA device :0 + pipe = pipeline(..., device=0) + with pipe.device_placement(): + # Every framework specific tensor allocation will be done on the request device + output = pipe(...) """ if self.framework == "tf": with tf.device("/CPU:0" if self.device == -1 else "/device:GPU:{}".format(self.device)): @@ -479,14 +575,22 @@ def device_placement(self): def ensure_tensor_on_device(self, **inputs): """ Ensure PyTorch tensors are on the specified device. - :param inputs: - :return: + + Args: + inputs (keyword arguments that should be :obj:`torch.Tensor`): The tensors to place on :obj:`self.device`. + + Return: + :obj:`Dict[str, torch.Tensor]`: The same as :obj:`inputs` but on the proper device. """ return {name: tensor.to(self.device) for name, tensor in inputs.items()} - def check_model_type(self, supported_models): + def check_model_type(self, supported_models: Union[List[str], dict]): """ - Check if the model class is in the supported class list of the pipeline. + Check if the model class is in supported by the pipeline. + + Args: + supported_models (:obj:`List[str]` or :obj:`dict`): + The list of models supported by the pipeline, or a dictionary with model class values. """ if not isinstance(supported_models, list): # Create from a model mapping supported_models = [item[1].__name__ for item in supported_models.items()] @@ -538,15 +642,14 @@ def _forward(self, inputs, return_tensors=False): return predictions.numpy() +# Can't use @add_end_docstrings(PIPELINE_INIT_ARGS) here because this one does not accept `binary_output` class FeatureExtractionPipeline(Pipeline): """ - Feature extraction pipeline using Model head. This pipeline extracts the hidden states from the base transformer, - which can be used as features in downstream tasks. + Feature extraction pipeline using no model head. This pipeline extracts the hidden states from the base + transformer, which can be used as features in downstream tasks. - This feature extraction pipeline can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): - - - "feature-extraction", for extracting features of a sequence. + This feature extraction pipeline can currently be loaded from :func:`~transformers.pipeline` using the task + identifier: :obj:`"feature-extraction"`. All models may be used for this pipeline. See a list of all models, including community-contributed models on `huggingface.co/models `__. @@ -559,18 +662,21 @@ class FeatureExtractionPipeline(Pipeline): tokenizer (:obj:`~transformers.PreTrainedTokenizer`): The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): + modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`): Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. + framework (:obj:`str`, `optional`): + The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework + must be installed. If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): + and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no + model is provided. + task (:obj:`str`, defaults to :obj:`""`): + A task-identifier for the pipeline. + args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`): Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model + device (:obj:`int`, `optional`, defaults to -1): + Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id. """ @@ -596,20 +702,29 @@ def __init__( ) def __call__(self, *args, **kwargs): + """ + Extract the features of the input(s). + + Args: + args (:obj:`str` or :obj:`List[str]`): One or several texts (or one list of texts) to get the features of. + + Return: + A nested list of :obj:`float`: The features computed by the model. + """ return super().__call__(*args, **kwargs).tolist() +@add_end_docstrings(PIPELINE_INIT_ARGS) class TextGenerationPipeline(Pipeline): """ - Language generation pipeline using any ModelWithLMHead head. This pipeline predicts the words that will follow a specified text prompt. + Language generation pipeline using any :obj:`ModelWithLMHead`. This pipeline predicts the words that will follow a + specified text prompt. - This language generation pipeline can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): + This language generation pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"text-generation"`. - - "text-generation", for generating text from a specified prompt. - - The models that this pipeline can use are models that have been trained with an autoregressive language modeling objective, - which includes the uni-directional models in the library (e.g. gpt2). + The models that this pipeline can use are models that have been trained with an autoregressive language modeling + objective, which includes the uni-directional models in the library (e.g. gpt2). See the list of available community models on `huggingface.co/models `__. """ @@ -673,7 +788,30 @@ def _parse_and_tokenize(self, *args, padding=True, add_special_tokens=True, **kw def __call__( self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs ): + """ + Complete the prompt(s) given as inputs. + + Args: + args (:obj:`str` or :obj:`List[str]`): + One or several prompts (or one list of prompts) to complete. + return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to include the tensors of predictions (as token indinces) in the outputs. + return_text (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to include the decoded texts in the outputs. + clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to clean up the potential extra spaces in the text output. + generate_kwargs: + Additional keyword arguments to pass along to the generate method of the model (see the generate + method corresponding to your framework `here <./model.html#generative-models>`__). + + Return: + A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the + following keys: + - **generated_text** (:obj:`str`, present when ``return_text=True``) -- The generated text. + - **generated_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``) + -- The token ids of the generated text. + """ text_inputs = self._args_parser(*args) results = [] @@ -758,41 +896,25 @@ def __call__( return results +@add_end_docstrings( + PIPELINE_INIT_ARGS, + r""" + return_all_scores (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether to return all prediction scores or just the one of the predicted class. + """, +) class TextClassificationPipeline(Pipeline): """ - Text classification pipeline using ModelForSequenceClassification head. See the - `sequence classification usage <../usage.html#sequence-classification>`__ examples for more information. - - This text classification pipeline can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): + Text classification pipeline using any :obj:`ModelForSequenceClassification`. See the + `sequence classification examples <../task_summary.html#sequence-classification>`__ for more information. - - "sentiment-analysis", for classifying sequences according to positive or negative sentiments. + This text classification pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"sentiment-analysis"` (for classifying sequences according to positive or negative + sentiments). The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. See the up-to-date list of available models on `huggingface.co/models `__. - - Arguments: - model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): - The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - tokenizer (:obj:`~transformers.PreTrainedTokenizer`): - The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from - :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ def __init__(self, return_all_scores: bool = False, **kwargs): @@ -807,6 +929,22 @@ def __init__(self, return_all_scores: bool = False, **kwargs): self.return_all_scores = return_all_scores def __call__(self, *args, **kwargs): + """ + Classify the text(s) given as inputs. + + Args: + args (:obj:`str` or :obj:`List[str]`): + One or several textts (or one list of prompts) to classify. + + Return: + A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the + following keys: + + - **label** (:obj:`str`) -- The label predicted. + - **score** (:obj:`float`) -- The corresponding probability. + + If ``self.return_all_scores=True``, one such dictionary is returned per label. + """ outputs = super().__call__(*args, **kwargs) scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) if self.return_all_scores: @@ -853,46 +991,23 @@ def __call__(self, sequences, labels, hypothesis_template): return sequence_pairs +@add_end_docstrings(PIPELINE_INIT_ARGS) class ZeroShotClassificationPipeline(Pipeline): """ - NLI-based zero-shot classification pipeline using a ModelForSequenceClassification head with models trained on - NLI tasks. + NLI-based zero-shot classification pipeline using a :obj:`ModelForSequenceClassification` trained on NLI (natural + language inference) tasks. Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis - pair and passed to the pre-trained model. Then logit for `entailment` is then taken as the logit for the + pair and passed to the pretrained model. Then, the logit for `entailment` is taken as the logit for the candidate label being valid. Any NLI model can be used as long as the first output logit corresponds to `contradiction` and the last to `entailment`. - This pipeline can currently be loaded from the :func:`~transformers.pipeline` method using the following task - identifier(s): + This NLI pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"zero-shot-classification"`. - - "zero-shot-classification" - - The models that this pipeline can use are models that have been fine-tuned on a Natural Language Inference task. + The models that this pipeline can use are models that have been fine-tuned on an NLI task. See the up-to-date list of available models on `huggingface.co/models `__. - - Arguments: - model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): - The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - tokenizer (:obj:`~transformers.PreTrainedTokenizer`): - The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from - :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ def __init__(self, args_parser=ZeroShotClassificationArgumentHandler(), *args, **kwargs): @@ -915,29 +1030,33 @@ def _parse_and_tokenize(self, *args, padding=True, add_special_tokens=True, **kw def __call__(self, sequences, candidate_labels, hypothesis_template="This example is {}.", multi_class=False): """ - NLI-based zero-shot classification. Any combination of sequences and labels can be passed and each - combination will be posed as a premise/hypothesis pair and passed to the pre-trained model. Then logit for - `entailment` is then taken as the logit for the candidate label being valid. Any NLI model can be used as - long as the first output logit corresponds to `contradiction` and the last to `entailment`. + Classify the sequence(s) given as inputs. Args: - sequences (:obj:`str` or obj:`List`): - The sequence or sequences to classify. Truncated if model input is too large. - candidate_labels (:obj:`str` or obj:`List`): + sequences (:obj:`str` or obj:`List[str]`): + The sequence(s) to classify, will be truncated if the model input is too large. + candidate_labels (:obj:`str` or obj:`List[str]`): The set of possible class labels to classify each sequence into. Can be a single label, a string of comma-separated labels, or a list of labels. - hypothesis_template (obj:`str`, defaults to "This example is {}."): + hypothesis_template (obj:`str`, `optional`, defaults to :obj:`"This example is {}."`): The template used to turn each label into an NLI-style hypothesis. This template must include a {} or similar syntax for the candidate label to be inserted into the template. For example, the default - template is "This example is {}." With the candidate label "sports", this would be fed into the model - like ` sequence to classify This example is sports . `. The default template works - well in many cases, but it may be worthwhile to experiment with different templates depending on the - task setting. - multi_class (obj:`bool`, defaults to False): - When False, it is assumed that only one candidate label can be true, and the scores are normalized - such that the sum of the label likelihoods for each sequence is 1. When True, the labels are - considered independent and probabilities are normalized for each candidate by doing a of softmax of + template is :obj:`"This example is {}."` With the candidate label :obj:`"sports"`, this would be fed + into the model like :obj:`" sequence to classify This example is sports . "`. The + default template works well in many cases, but it may be worthwhile to experiment with different + templates depending on the task setting. + multi_class (obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not multiple candidate labels can be true. If :obj:`False`, the scores are normalized + such that the sum of the label likelihoods for each sequence is 1. If :obj:`True`, the labels are + considered independent and probabilities are normalized for each candidate by doing a softmax of the entailment score vs. the contradiction score. + Return: + A :obj:`dict` or a list of :obj:`dict`: Each result comes as a dictionary with the + following keys: + + - **sequence** (:obj:`str`) -- The sequence for which this is the output. + - **labels** (:obj:`List[str]`) -- The labels sorted by order of likelihood. + - **scores** (:obj:` List[float]`) -- The probabilities for each of the labels. """ outputs = super().__call__(sequences, candidate_labels, hypothesis_template) num_sequences = 1 if isinstance(sequences, str) else len(sequences) @@ -973,42 +1092,28 @@ def __call__(self, sequences, candidate_labels, hypothesis_template="This exampl return result +@add_end_docstrings( + PIPELINE_INIT_ARGS, + r""" + topk (:obj:`int`, defaults to 5): The number of predictions to return. + """, +) class FillMaskPipeline(Pipeline): """ - Masked language modeling prediction pipeline using ModelWithLMHead head. See the - `masked language modeling usage <../usage.html#masked-language-modeling>`__ examples for more information. + Masked language modeling prediction pipeline using any :obj:`ModelWithLMHead`. See the + `masked language modeling examples <../task_summary.html#masked-language-modeling>`__ for more information. - This mask filling pipeline can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): - - - "fill-mask", for predicting masked tokens in a sequence. + This mask filling pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"fill-mask"`. The models that this pipeline can use are models that have been trained with a masked language modeling objective, which includes the bi-directional models in the library. See the up-to-date list of available models on `huggingface.co/models `__. - Arguments: - model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): - The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - tokenizer (:obj:`~transformers.PreTrainedTokenizer`): - The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from - :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. + .. note:: - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. + This pipeline only works for inputs with exactly one token masked. """ def __init__( @@ -1053,6 +1158,21 @@ def ensure_exactly_one_mask_token(self, masked_index: np.ndarray): ) def __call__(self, *args, **kwargs): + """ + Fill the masked token in the text(s) given as inputs. + + Args: + args (:obj:`str` or :obj:`List[str]`): One or several texts (or one list of prompts) with masked tokens. + + Return: + A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the + following keys: + + - **sequence** (:obj:`str`) -- The corresponding input with the mask token prediction. + - **score** (:obj:`float`) -- The corresponding probability. + - **token** (:obj:`int`) -- The predicted token id (to replace the masked one). + - **token** (:obj:`str`) -- The predicted token (to replace the masked one). + """ inputs = self._parse_and_tokenize(*args, **kwargs) outputs = self._forward(inputs, return_tensors=True) @@ -1105,41 +1225,27 @@ def __call__(self, *args, **kwargs): return results +@add_end_docstrings( + PIPELINE_INIT_ARGS, + r""" + ignore_labels (:obj:`List[str]`, defaults to :obj:`["O"]`): + A list of labels to ignore. + grouped_entities (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to group the tokens corresponding to the same entity together in the predictions or not. + """, +) class TokenClassificationPipeline(Pipeline): """ - Named Entity Recognition pipeline using ModelForTokenClassification head. See the - `named entity recognition usage <../usage.html#named-entity-recognition>`__ examples for more information. + Named Entity Recognition pipeline using any :obj:`ModelForTokenClassification`. See the + `named entity recognition examples <../task_summary.html#named-entity-recognition>`__ for more information. - This token recognition pipeline can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): - - - "ner", for predicting the classes of tokens in a sequence: person, organisation, location or miscellaneous. + This token recognition pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"ner"` (for predicting the classes of tokens in a sequence: person, organisation, location + or miscellaneous). The models that this pipeline can use are models that have been fine-tuned on a token classification task. See the up-to-date list of available models on `huggingface.co/models `__. - - Arguments: - model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): - The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - tokenizer (:obj:`~transformers.PreTrainedTokenizer`): - The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from - :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ default_input_names = "sequences" @@ -1179,6 +1285,24 @@ def __init__( self.grouped_entities = grouped_entities def __call__(self, *args, **kwargs): + """ + Classify each token of the text(s) given as inputs. + + Args: + args (:obj:`str` or :obj:`List[str]`): + One or several texts (or one list of texts) for token classification. + + Return: + A list or a list of list of :obj:`dict`: Each result comes as a list of dictionaries (one for each token in + the corresponding input, or each entity if this pipeline was instantiated with + :obj:`grouped_entities=True`) with the following keys: + + - **word** (:obj:`str`) -- The token/word classified. + - **score** (:obj:`float`) -- The corresponding probability for :obj:`entity`. + - **entity** (:obj:`str`) -- The entity predicted for that token/word. + - **index** (:obj:`int`, only present when ``self.grouped_entities=False``) -- The index of the + corresponding token in the sentence. + """ inputs = self._args_parser(*args, **kwargs) answers = [] for sentence in inputs: @@ -1235,7 +1359,10 @@ def __call__(self, *args, **kwargs): def group_sub_entities(self, entities: List[dict]) -> dict: """ - Returns grouped sub entities + Group together the adjacent tokens with the same entity predicted. + + Args: + entities (:obj:`dict`): The entities predicted by the pipeline. """ # Get the first entity in the entity group entity = entities[0]["entity"] @@ -1251,7 +1378,10 @@ def group_sub_entities(self, entities: List[dict]) -> dict: def group_entities(self, entities: List[dict]) -> List[dict]: """ - Returns grouped entities + Find and group together the adjacent tokens with the same entity predicted. + + Args: + entities (:obj:`dict`): The entities predicted by the pipeline. """ entity_groups = [] @@ -1295,10 +1425,10 @@ def group_entities(self, entities: List[dict]) -> List[dict]: class QuestionAnsweringArgumentHandler(ArgumentHandler): """ QuestionAnsweringPipeline requires the user to provide multiple arguments (i.e. question & context) to be mapped - to internal SquadExample / SquadFeature structures. + to internal :class:`~transformers.SquadExample`. - QuestionAnsweringArgumentHandler manages all the possible to create SquadExample from the command-line supplied - arguments. + QuestionAnsweringArgumentHandler manages all the possible to create a :class:`~transformers.SquadExample` from + the command-line supplied arguments. """ def __call__(self, *args, **kwargs): @@ -1354,41 +1484,18 @@ def __call__(self, *args, **kwargs): return inputs +@add_end_docstrings(PIPELINE_INIT_ARGS) class QuestionAnsweringPipeline(Pipeline): """ - Question Answering pipeline using ModelForQuestionAnswering head. See the - `question answering usage <../usage.html#question-answering>`__ examples for more information. - - This question answering can currently be loaded from the :func:`~transformers.pipeline` method using - the following task identifier(s): + Question Answering pipeline using any :obj:`ModelForQuestionAnswering`. See the + `question answering examples <../task_summary.html#question-answering>`__ for more information. - - "question-answering", for answering questions given a context. + This question answering pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"question-answering"`. The models that this pipeline can use are models that have been fine-tuned on a question answering task. See the up-to-date list of available models on `huggingface.co/models `__. - - Arguments: - model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): - The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - tokenizer (:obj:`~transformers.PreTrainedTokenizer`): - The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from - :class:`~transformers.PreTrainedTokenizer`. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ default_input_names = "question,context" @@ -1423,15 +1530,19 @@ def create_sample( question: Union[str, List[str]], context: Union[str, List[str]] ) -> Union[SquadExample, List[SquadExample]]: """ - QuestionAnsweringPipeline leverages the SquadExample/SquadFeatures internally. - This helper method encapsulate all the logic for converting question(s) and context(s) to SquadExample(s). + QuestionAnsweringPipeline leverages the :class:`~transformers.SquadExample` internally. + This helper method encapsulate all the logic for converting question(s) and context(s) to + :class:`~transformers.SquadExample`. + We currently support extractive question answering. + Arguments: - question: (str, List[str]) The question to be ask for the associated context - context: (str, List[str]) The context in which we will look for the answer. + question (:obj:`str` or :obj:`List[str]`): The question(s) asked. + context (:obj:`str` or :obj:`List[str]`): The context(s) in which we will look for the answer. Returns: - SquadExample initialized with the corresponding question and context. + One or a list of :class:`~transformers.SquadExample`: The corresponding + :class:`~transformers.SquadExample` grouping question and context. """ if isinstance(question, list): return [SquadExample(None, q, c, None, None, None) for q, c in zip(question, context)] @@ -1440,18 +1551,45 @@ def create_sample( def __call__(self, *args, **kwargs): """ + Answer the question(s) given as inputs by using the context(s). + Args: - We support multiple use-cases, the following are exclusive: - X: sequence of SquadExample - data: sequence of SquadExample - question: (str, List[str]), batch of question(s) to map along with context - context: (str, List[str]), batch of context(s) associated with the provided question keyword argument - Returns: - dict: {'answer': str, 'score": float, 'start": int, "end": int} - answer: the textual answer in the intial context - score: the score the current answer scored for the model - start: the character index in the original string corresponding to the beginning of the answer' span - end: the character index in the original string corresponding to the ending of the answer' span + args (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`): + One or several :class:`~transformers.SquadExample` containing the question and context. + X (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`, `optional`): + One or several :class:`~transformers.SquadExample` containing the question and context + (will be treated the same way as if passed as the first positional argument). + data (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`, `optional`): + One or several :class:`~transformers.SquadExample` containing the question and context + (will be treated the same way as if passed as the first positional argument). + question (:obj:`str` or :obj:`List[str]`): + One or several question(s) (must be used in conjunction with the :obj:`context` argument). + context (:obj:`str` or :obj:`List[str]`): + One or several context(s) associated with the qustion(s) (must be used in conjunction with the + :obj:`question` argument). + topk (:obj:`int`, `optional`, defaults to 1): + The number of answers to return (will be chosen by order of likelihood). + doc_stride (:obj:`int`, `optional`, defaults to 128): + If the context is too long to fit with the question for the model, it will be split in several chunks + with some overlap. This argument controls the size of that overlap. + max_answer_len (:obj:`int`, `optional`, defaults to 15): + The maximum length of predicted answers (e.g., only answers with a shorter length are considered). + max_seq_len (:obj:`int`, `optional`, defaults to 384): + The maximum length of the total sentence (context + question) after tokenization. The context will be + split in several chunks (using :obj:`doc_stride`) if needed. + max_question_len (:obj:`int`, `optional`, defaults to 64): + The maximum length of the question after tokenization. It will be truncated if needed. + handle_impossible_answer (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not we accept impossible as an answer. + + Return: + A :obj:`dict` or a list of :obj:`dict`: Each result comes as a dictionary with the + following keys: + + - **score** (:obj:`float`) -- The probability associated to the answer. + - **start** (:obj:`int`) -- The start index of the answer (in the tokenized version of the input). + - **end** (:obj:`int`) -- The end index of the answer (in the tokenized version of the input). + - **answer** (:obj:`str`) -- The answer to the question. """ # Set defaults values kwargs.setdefault("topk", 1) @@ -1551,17 +1689,18 @@ def __call__(self, *args, **kwargs): def decode(self, start: np.ndarray, end: np.ndarray, topk: int, max_answer_len: int) -> Tuple: """ - Take the output of any QuestionAnswering head and will generate probalities for each span to be + Take the output of any :obj:`ModelForQuestionAnswering` and will generate probalities for each span to be the actual answer. + In addition, it filters out some unwanted/impossible cases like answer len being greater than max_answer_len or answer end position being before the starting position. The method supports output the k-best answer through the topk argument. Args: - start: numpy array, holding individual start probabilities for each token - end: numpy array, holding individual end probabilities for each token - topk: int, indicates how many possible answer span(s) to extract from the model's output - max_answer_len: int, maximum size of the answer to extract from the model's output + start (:obj:`np.ndarray`): Individual start probabilities for each token. + end (:obj:`np.ndarray`): Individual end probabilities for each token. + topk (:obj:`int`): Indicates how many possible answer span(s) to extract from the model output. + max_answer_len (:obj:`int`): Maximum size of the answer to extract from the model's output. """ # Ensure we have batch axis if start.ndim == 1: @@ -1589,18 +1728,18 @@ def decode(self, start: np.ndarray, end: np.ndarray, topk: int, max_answer_len: start, end = np.unravel_index(idx_sort, candidates.shape)[1:] return start, end, candidates[0, start, end] - def span_to_answer(self, text: str, start: int, end: int): + def span_to_answer(self, text: str, start: int, end: int) -> Dict[str, Union[str, int]]: """ When decoding from token probalities, this method maps token indexes to actual word in the initial context. Args: - text: str, the actual context to extract the answer from - start: int, starting answer token index - end: int, ending answer token index + text (:obj:`str`): The actual context to extract the answer from. + start (:obj:`int`): The answer starting token index. + end (:obj:`int`): The answer end token index. Returns: - dict: {'answer': str, 'start': int, 'end': int} + Dictionary like :obj:`{'answer': str, 'start': int, 'end': int}` """ words = [] token_idx = char_start_idx = char_end_idx = chars_idx = 0 @@ -1634,9 +1773,18 @@ def span_to_answer(self, text: str, start: int, end: int): } +@add_end_docstrings(PIPELINE_INIT_ARGS) class SummarizationPipeline(Pipeline): """ - Summarize news articles and other documents + Summarize news articles and other documents. + + This summarizing pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"summarization"`. + + The models that this pipeline can use are models that have been fine-tuned on a summarization task, + which is currently, '`bart-large-cnn`', '`t5-small`', '`t5-base`', '`t5-large`', '`t5-3b`', '`t5-11b`'. + See the up-to-date list of available models on + `huggingface.co/models `__. Usage:: @@ -1647,39 +1795,6 @@ class SummarizationPipeline(Pipeline): # use t5 in tf summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf") summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=20) - - The models that this pipeline can use are models that have been fine-tuned on a summarization task, - which is currently, '`bart-large-cnn`', '`t5-small`', '`t5-base`', '`t5-large`', '`t5-3b`', '`t5-11b`'. - See the up-to-date list of available models on - `huggingface.co/models `__. - - Arguments: - model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): - The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string - checkpoint identifier or an actual pre-trained model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - - If :obj:`None`, the default of the pipeline will be loaded. - tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`): - The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`, - a string checkpoint identifier or an actual pre-trained tokenizer inheriting from - :class:`~transformers.PreTrainedTokenizer`. - - If :obj:`None`, the default of the pipeline will be loaded. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ def __init__(self, *args, **kwargs): @@ -1694,20 +1809,29 @@ def __call__( self, *documents, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs ): r""" - Args: - *documents: (list of strings) articles to be summarized - return_text: (bool, default=True) whether to add a decoded "summary_text" to each result - return_tensors: (bool, default=False) whether to return the raw "summary_token_ids" to each result - - clean_up_tokenization_spaces: (`optional`) bool whether to include extra spaces in the output - **generate_kwargs: extra kwargs passed to `self.model.generate`_ + Summarize the text(s) given as inputs. - Returns: - list of dicts with 'summary_text' and/or 'summary_token_ids' for each document_to_summarize + Args: + documents (`str` or :obj:`List[str]`): + One or several articles (or one list of articles) to summarize. + return_text (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to include the decoded texts in the outputs + return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to include the tensors of predictions (as token indinces) in the outputs. + clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to clean up the potential extra spaces in the text output. + generate_kwargs: + Additional keyword arguments to pass along to the generate method of the model (see the generate + method corresponding to your framework `here <./model.html#generative-models>`__). - .. _`self.model.generate`: - https://huggingface.co/transformers/model_doc/bart.html#transformers.BartForConditionalGeneration.generate + Return: + A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the + following keys: + - **summary_text** (:obj:`str`, present when ``return_text=True``) -- The summary of the corresponding + input. + - **summary_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``) + -- The token ids of the summary. """ assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True" assert len(documents) > 0, "Please provide a document to summarize" @@ -1779,43 +1903,21 @@ def __call__( return results +@add_end_docstrings(PIPELINE_INIT_ARGS) class TranslationPipeline(Pipeline): """ Translates from one language to another. - Usage:: - en_fr_translator = pipeline("translation_en_to_fr") - en_fr_translator("How old are you?") + This translation pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"translation_xx_to_yy"`. - The models that this pipeline can use are models that have been fine-tuned on a translation task, - currently: "t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b" + The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on `huggingface.co/models `__. - Arguments: - model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): - The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string - checkpoint identifier or an actual pre-trained model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - If :obj:`None`, the default of the pipeline will be loaded. - tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`): - The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`, - a string checkpoint identifier or an actual pre-trained tokenizer inheriting from - :class:`~transformers.PreTrainedTokenizer`. - If :obj:`None`, the default of the pipeline will be loaded. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. + Usage:: + en_fr_translator = pipeline("translation_en_to_fr") + en_fr_translator("How old are you?") """ def __init__(self, *args, **kwargs): @@ -1829,17 +1931,28 @@ def __call__( self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs ): r""" + Translate the text(s) given as inputs. + Args: - *args: (list of strings) texts to be translated - return_text: (bool, default=True) whether to add a decoded "translation_text" to each result - return_tensors: (bool, default=False) whether to return the raw "translation_token_ids" to each result + args (:obj:`str` or :obj:`List[str]`): + Texts to be translated. + return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to include the tensors of predictions (as token indinces) in the outputs. + return_text (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to include the decoded texts in the outputs. + clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to clean up the potential extra spaces in the text output. + generate_kwargs: + Additional keyword arguments to pass along to the generate method of the model (see the generate + method corresponding to your framework `here <./model.html#generative-models>`__). - **generate_kwargs: extra kwargs passed to `self.model.generate`_ + Return: + A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the + following keys: - Returns: - list of dicts with 'translation_text' and/or 'translation_token_ids' for each text_to_translate - .. _`self.model.generate`: - https://huggingface.co/transformers/model_doc/bart.html#transformers.BartForConditionalGeneration.generate + - **translation_text** (:obj:`str`, present when ``return_text=True``) -- The translation. + - **translation_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``) + -- The token ids of the translation. """ assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True" @@ -1901,10 +2014,20 @@ def __call__( class Conversation: """ Utility class containing a conversation and its history. This class is meant to be used as an input to the - :obj:`~transformers.ConversationalPipeline`. The conversation contains a number of utility function to manage the addition of new - user input and generated model responses. A conversation needs to contain an unprocessed user input before being - passed to the :obj:`~transformers.ConversationalPipeline`. This user input is either created when the class is instantiated, or by calling - `append_response("input")` after a conversation turn. + :class:`~transformers.ConversationalPipeline`. The conversation contains a number of utility function to manage the + addition of new user input and generated model responses. A conversation needs to contain an unprocessed user input + before being passed to the :class:`~transformers.ConversationalPipeline`. This user input is either created when + the class is instantiated, or by calling :obj:`conversional_pipeline.append_response("input")` after a conversation + turn. + + Arguments: + text (:obj:`str`, `optional`): + The initial user input to start the conversation. If not provided, a user input needs to be provided + manually using the :meth:`~transformers.Conversation.add_user_input` method before the conversation can + begin. + conversation_id (:obj:`uuid.UUID`, `optional`): + Unique identifier for the conversation. If not provided, a random UUID4 id will be assigned to the + conversation. Usage:: @@ -1917,14 +2040,6 @@ class Conversation: conversation.append_response("The Big lebowski.") conversation.add_user_input("Is it good?") - - Arguments: - text (:obj:`str`, `optional`, defaults to :obj:`None`): - The initial user input to start the conversation. - If :obj:`None`, a user input needs to be provided manually using `add_user_input` before the conversation can begin. - conversation_id (:obj:`uuid.UUID`, `optional`, defaults to :obj:`None`): - Unique identifier for the conversation - If :obj:`None`, the random UUID4 id will be assigned to the conversation. """ def __init__(self, text: str = None, conversation_id: UUID = None): @@ -1938,12 +2053,13 @@ def __init__(self, text: str = None, conversation_id: UUID = None): def add_user_input(self, text: str, overwrite: bool = False): """ - Add a user input to the conversation for the next round. This populates the internal `new_user_input` field. + Add a user input to the conversation for the next round. This populates the internal :obj:`new_user_input` + field. Args: - text: str, the user input for the next conversation round - overwrite: bool, flag indicating if existing and unprocessed user input should be overwritten when this function is called - + text (:obj:`str`): The user input for the next conversation round. + overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not existing and unprocessed user input should be overwritten when this function is called. """ if self.new_user_input: if overwrite: @@ -1963,8 +2079,8 @@ def add_user_input(self, text: str, overwrite: bool = False): def mark_processed(self): """ - Mark the conversation as processed (moves the content of `new_user_input` to `past_user_inputs`) and empties the - `new_user_input` field. + Mark the conversation as processed (moves the content of :obj:`new_user_input` to :obj:`past_user_inputs`) and + empties the :obj:`new_user_input` field. """ if self.new_user_input: self.past_user_inputs.append(self.new_user_input) @@ -1975,17 +2091,17 @@ def append_response(self, response: str): Append a response to the list of generated responses. Args: - response: str, the model generated response + response (:obj:`str`): The model generated response. """ self.generated_responses.append(response) def set_history(self, history: List[int]): """ - Updates the value of the history of the conversation. The history is represented by a list of `token_ids`. The - history is used by the model to generate responses based on the previous conversation turns. + Updates the value of the history of the conversation. The history is represented by a list of :obj:`token_ids`. + The history is used by the model to generate responses based on the previous conversation turns. Args: - history: (list of int), history of tokens provided and generated for this conversation + history (:obj:`List[int]`): History of tokens provided and generated for this conversation. """ self.history = history @@ -1994,7 +2110,7 @@ def __repr__(self): Generates a string representation of the conversation. Return: - :obj:`str` or :obj:`Dict`: + :obj:`str`: Example: Conversation id: 7d15686b-dc94-49f2-9c4b-c9eac6a1f114 @@ -2010,10 +2126,25 @@ def __repr__(self): return output +@add_end_docstrings( + PIPELINE_INIT_ARGS, + r""" + min_length_for_response (:obj:`int`, `optional`, defaults to 32): + The minimum length (in number of tokens) for a response. + """, +) class ConversationalPipeline(Pipeline): """ Multi-turn conversational pipeline. + This conversational pipeline can currently be loaded from :func:`~transformers.pipeline` using the following + task identifier: :obj:`"conversational"`. + + The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task, + currently: `'microsoft/DialoGPT-small'`, `'microsoft/DialoGPT-medium'`, `'microsoft/DialoGPT-large'`. + See the up-to-date list of available models on + `huggingface.co/models `__. + Usage:: conversational_pipeline = pipeline("conversational") @@ -2027,36 +2158,6 @@ class ConversationalPipeline(Pipeline): conversation_2.add_user_input("What is the genre of this book?") conversational_pipeline([conversation_1, conversation_2]) - - The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task, - currently: "microsoft/DialoGPT-small", "microsoft/DialoGPT-medium", "microsoft/DialoGPT-large" - See the up-to-date list of available models on - `huggingface.co/models `__. - - Arguments: - model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): - The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string - checkpoint identifier or an actual pre-trained model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - If :obj:`None`, the default of the pipeline will be loaded. - tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`): - The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`, - a string checkpoint identifier or an actual pre-trained tokenizer inheriting from - :class:`~transformers.PreTrainedTokenizer`. - If :obj:`None`, the default of the pipeline will be loaded. - modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): - Model card attributed to the model for this pipeline. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. - If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. - args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): - Reference to the object in charge of parsing supplied pipeline parameters. - device (:obj:`int`, `optional`, defaults to :obj:`-1`): - Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model - on the associated CUDA device id. """ def __init__(self, min_length_for_response=32, *args, **kwargs): @@ -2075,12 +2176,20 @@ def __call__( **generate_kwargs ): r""" + Generate responses for the conversation(s) given as inputs. + Args: - conversations: (list of :class:`~transformers.pipelines.Conversation`) Conversations to generate responses for - **generate_kwargs: extra kwargs passed to `self.model.generate`_ + conversations (a :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`): + Conversations to generate responses for. + clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to clean up the potential extra spaces in the text output. + generate_kwargs: + Additional keyword arguments to pass along to the generate method of the model (see the generate + method corresponding to your framework `here <./model.html#generative-models>`__). Returns: - list of conversations with updated generated responses for those containing a new user input + :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`: Conversation(s) with + updated generated responses for those containing a new user input. """ # Input validation @@ -2315,56 +2424,58 @@ def pipeline( **kwargs ) -> Pipeline: """ - Utility factory method to build a pipeline. + Utility factory method to build a :class:`~transformers.Pipeline`. - Pipeline are made of: - - - A Tokenizer instance in charge of mapping raw textual input to token - - A Model instance - - Some (optional) post processing for enhancing model's output + Pipelines are made of: + - A :doc:`tokenizer ` in charge of mapping raw textual input to token. + - A :doc:`model ` to make predictions from the inputs. + - Some (optional) post processing for enhancing model's output. Args: task (:obj:`str`): The task defining which pipeline will be returned. Currently accepted tasks are: - - "feature-extraction": will return a :class:`~transformers.FeatureExtractionPipeline` - - "sentiment-analysis": will return a :class:`~transformers.TextClassificationPipeline` - - "ner": will return a :class:`~transformers.TokenClassificationPipeline` - - "question-answering": will return a :class:`~transformers.QuestionAnsweringPipeline` - - "fill-mask": will return a :class:`~transformers.FillMaskPipeline` - - "summarization": will return a :class:`~transformers.SummarizationPipeline` - - "translation_xx_to_yy": will return a :class:`~transformers.TranslationPipeline` - - "text-generation": will return a :class:`~transformers.TextGenerationPipeline` - model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): - The model that will be used by the pipeline to make predictions. This can be :obj:`None`, - a model identifier or an actual pre-trained model inheriting from - :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for - TensorFlow. - - If :obj:`None`, the default for this pipeline will be loaded. - config (:obj:`str` or :obj:`~transformers.PretrainedConfig`, `optional`, defaults to :obj:`None`): - The configuration that will be used by the pipeline to instantiate the model. This can be :obj:`None`, - a model identifier or an actual pre-trained model configuration inheriting from + - :obj:`"feature-extraction"`: will return a :class:`~transformers.FeatureExtractionPipeline`. + - :obj:`"sentiment-analysis"`: will return a :class:`~transformers.TextClassificationPipeline`. + - :obj:`"ner"`: will return a :class:`~transformers.TokenClassificationPipeline`. + - :obj:`"question-answering"`: will return a :class:`~transformers.QuestionAnsweringPipeline`. + - :obj:`"fill-mask"`: will return a :class:`~transformers.FillMaskPipeline`. + - :obj:`"summarization"`: will return a :class:`~transformers.SummarizationPipeline`. + - :obj:`"translation_xx_to_yy"`: will return a :class:`~transformers.TranslationPipeline`. + - :obj:`"text-generation"`: will return a :class:`~transformers.TextGenerationPipeline`. + - :obj:`"conversation"`: will return a :class:`~transformers.ConversationalPipeline`. + model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`): + The model that will be used by the pipeline to make predictions. This can be a model identifier or an + actual instance of a pretrained model inheriting from :class:`~transformers.PreTrainedModel` (for PyTorch) + or :class:`~transformers.TFPreTrainedModel` (for TensorFlow). + + If not provided, the default for the :obj:`task` will be loaded. + config (:obj:`str` or :obj:`~transformers.PretrainedConfig`, `optional`): + The configuration that will be used by the pipeline to instantiate the model. This can be a model + identifier or an actual pretrained model configuration inheriting from :class:`~transformers.PretrainedConfig`. - If :obj:`None`, the default for this pipeline will be loaded. - tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`): - The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`, - a model identifier or an actual pre-trained tokenizer inheriting from + If not provided, the default for the :obj:`task` will be loaded. + tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`): + The tokenizer that will be used by the pipeline to encode data for the model. This can be a model + identifier or an actual pretrained tokenizer inheriting from :class:`~transformers.PreTrainedTokenizer`. - If :obj:`None`, the default for this pipeline will be loaded. - framework (:obj:`str`, `optional`, defaults to :obj:`None`): - The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be - installed. + If not provided, the default for the :obj:`task` will be loaded. + framework (:obj:`str`, `optional`): + The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework + must be installed. If no framework is specified, will default to the one currently installed. If no framework is specified - and both frameworks are installed, will default to PyTorch. + and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no + model is provided. + kwargs: + Additional keyword arguments passed along to the specific pipeline init (see the documentation for the + corresponding pipeline class for possible values). Returns: - :class:`~transformers.Pipeline`: Class inheriting from :class:`~transformers.Pipeline`, according to - the task. + :class:`~transformers.Pipeline`: A suitable pipeline for the task. Examples::