In this turtorial we will learn about **pipeline()**. Easy way of using "models" for inference.  Pipelines are object abstract which offer a simple "API" dedicated to several tasks such as Sentiment Analysis, Masked Language Modeling, Feature Extraction, and so on. <br>

*Pipelines* are made of: 

- A *tokenizer* in charge of mapping raw textual input to tokens.
- A *model* to make predictions from the inputs.
- Some (optional) post processing for enhancing model's output.



In [1]:
##importing pipeline 
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm
2024-04-29 20:02:18.411440: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-29 20:02:18.655411: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-29 20:02:18.655469: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-29 20:02:18.697877: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-29 20:02:18.797447: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-29 20:02:18.799474:

### Understanding Parameters of pipeline()
we can call pipleline for different tasks as stated above. We can do so with "task" parameter. 

- task: define which pipeline will be returned. Some accepted "task" are:
    - "audio-classification": will retrun a AudioClassificationPipeline
    - "text-classification"
    - "image-classification"
    - "image=feature-extraction"
    - "document-question-answering"
    - "fill-mask"
    - "image-to-image": Reutn ImagetoImagePipeline
    - and so on

In [2]:
### we can call pipleline for different tasks as stated above. We can do so with "task" parameter. 

sentiment_pipe=pipeline(task="text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
model_name=sentiment_pipe.model.config.name_or_path
print('Model Name: ',model_name)

Model Name:  distilbert/distilbert-base-uncased-finetuned-sst-2-english


In [4]:
sentiment_pipe.model ## this is the model architecture

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [5]:
### now we can just feed the input for classification
single_input="The movie sucks."
sentiment_pipe(single_input)
# print(f'Single Input O')
# 


[{'label': 'NEGATIVE', 'score': 0.9996511936187744}]

In [6]:
multiple_inputs=["I like the movie.","The movie was waste of time."] ## must be provided with list. 
sentiment_pipe(multiple_inputs) ## We will have list of results 

[{'label': 'POSITIVE', 'score': 0.9998428821563721},
 {'label': 'NEGATIVE', 'score': 0.9998151659965515}]

### parameter name: model, tokenizer, config

- model ("str" or PreTrainedModel or TFPreTrainedModel, optional): The model will be used used by the pipeline to make predictions. We can provide "srt" or model_identifier or an actual instance of pretrainedModel.

    Specify the name of the model to directly load model from the "hub".*** We can ignore "task" parameters if we are using this model_identifier('str') parameter.However, we should use "task" parameter if we are using pretrained_model. ****Also, if the name of the model is not provided then "default" model for the task will be used.*

- Config(str or PretrainedConfig, optional): Configuration that will be used by the pipeline to call the model. This can be a model identifier ("str" parameters) or an actual pretrained model configuration inheriting from PretrainedConfig. 
    **if not provided**, the default configuration will be used. If model is provided, the default config for the model will be used. If the model will not be given then the default config for the given task will be used.

- tokenizer(str or PreTrainedTokenizer, optional): Tokenizer will be used by the pipeline to encoder data for the model. This can be a model identifier (using, "str") or an actual pretrained tokenizer. 

    **if not provided**, the default tokenizer for the *given model* will be loaded(*if it is string*). If model is not given nor str is provided, then then default tokenizer for the config(*given*) is used. However, if *config* is also not given then the default "tokenizer" for the given task will be used. 

- feature_extractor(str or PreTrainedFeatureExtractor,*optional*)": Feature extractor that will be used to encode data for the model. This can be a model identifier or actual pretrained feature extractor inheriting from *PreTrainedFeatureExtractor*.

    **feature extractors are used for non-NLP models(Speech or Vision models).NLP model uses "tokenizer" to encode the data.**  Multimodels uses both "tokenizer" as well as "feature_extractor" to encode "text" and other modalities(vision, audio), respectively. 

    The selection of "feature_extractor" works similarly to that of "tokenizer" parameters.


### In the following, we will call a pipeline using "identifier" for the model, config parameters.

In [7]:
## retriving "Identifier name of above sentiment_pipe"
model_id=sentiment_pipe.model.config.name_or_path
## Could not find a way to retrive Config Name
tokenizer_id=sentiment_pipe.tokenizer.name_or_path

print(f'Model ID:{model_id}\n\ntokenizer_id:{tokenizer_id}')

Model ID:distilbert/distilbert-base-uncased-finetuned-sst-2-english

tokenizer_id:distilbert/distilbert-base-uncased-finetuned-sst-2-english


In [8]:

sentiment_pipe=pipeline(model=model_id) ##model_id must be in "hub"

"""
Here, we only provided model parameters. Thus, tokenizer and config will be used by default for the given model.
""";

In [10]:
## Providing Config id
sentiment_pipe=pipeline(model=model_id,
tokenizer=tokenizer_id)

In [11]:
sentiment_pipe.model ## Summarization of model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [12]:
sentiment_pipe(multiple_inputs)

[{'label': 'POSITIVE', 'score': 0.9998428821563721},
 {'label': 'NEGATIVE', 'score': 0.9998151659965515}]

In [13]:
### We can extract the name of the model 
model_name=sentiment_pipe.model.config.name_or_path
print('Model Name: ',model_name)

Model Name:  distilbert/distilbert-base-uncased-finetuned-sst-2-english


What if we provide model for "text-classification", and provide task "audio-classification"? 

In [14]:
## What if we use model_name as well as task; what if use for other task

sentiment_pipe=pipeline(task='text-classification',model=model_name)

In [15]:
sentiment_pipe=pipeline(task='audio-classification',model=model_name) ## We will see error model is not supported for this task

The model 'DistilBertForSequenceClassification' is not supported for audio-classification. Supported models are ['ASTForAudioClassification', 'Data2VecAudioForSequenceClassification', 'HubertForSequenceClassification', 'SEWForSequenceClassification', 'SEWDForSequenceClassification', 'UniSpeechForSequenceClassification', 'UniSpeechSatForSequenceClassification', 'Wav2Vec2ForSequenceClassification', 'Wav2Vec2BertForSequenceClassification', 'Wav2Vec2ConformerForSequenceClassification', 'WavLMForSequenceClassification', 'WhisperForAudioClassification'].


### In the following, we will call a pipeline using "PreTrainedModel" (model), PreTrainedconfig (config), and PreTrainedTokenizer (tokenizers) parameters.

In [16]:
## in oder to import PreTrained (model, config, and tokenizer) we need to import few other modules

from transformers import AutoConfig, AutoModelForSequenceClassification,AutoTokenizer

## We are calling AutoModelForSequenceClassification sicne text_classification is a sequential classification task.

In [64]:
print(f'Model ID:{model_id}\n\ntokenizer_id:{tokenizer_id}')

Model ID:distilbert/distilbert-base-uncased-finetuned-sst-2-english

tokenizer_id:distilbert/distilbert-base-uncased-finetuned-sst-2-english


In [65]:
##load the model and tokenizer

model_pretrained=AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=model_id)
tokenizer_pretrained=AutoTokenizer.from_pretrained(pretrained_model_name_or_path=tokenizer_id)

In [66]:
model_pretrained

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [70]:
sentiment_pipe = pipeline(model=model_pretrained, tokenizer=tokenizer_pretrained, task="text-classification")

In [71]:
sentiment_pipe(multiple_inputs)

[{'label': 'POSITIVE', 'score': 0.9998428821563721},
 {'label': 'NEGATIVE', 'score': 0.9998151659965515}]

<!-- ### pipelines(): Other Parameters

- num_workers(int,*optional*,default to 1): the numbers of workers to be used. (when the pipeline will use DataLoader, while passing a dataset on GPU for a pytorch model)

- batch_size(int,*optional*,default to 1): Used for laoding dataset when using dataloader. For inference, this is not useful.  (in next turtorial **batching the Dataset.**)

- 

  -->

### pipelines(): Other Parameters

- framework (*str*,optional): Which framework to use. "pt" for PyTorch, or "tf" for Tensorflow. The specific framework must be installed. Default: "pt"

- revision (str,*optional*, default to *main*): Determine which git version of task or model to be used. By default, *main* branch is used.

- use_fast(bool, *optional*)

 
    
