# LLM Tasks with Huggingface Transformers Pipeline

In [2]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


### 1. Conversational
Conversational response modelling is the task of generating conversational text that is relevant, coherent and knowledgable given a prompt. These models have applications in chatbots, and as a part of voice assistants

In [3]:
from transformers import Conversation 

converse = pipeline("conversational")

# Let's have a conversation
conservation_1= Conversation("Going to the movies tonight - any suggestions?")
conversation_2 = Conversation("What's the last book you have read?")
converse([conservation_1, conversation_2])

No model was supplied, defaulted to microsoft/DialoGPT-medium and revision 8bada3b (https://huggingface.co/microsoft/DialoGPT-medium).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 642/642 [00:00<00:00, 2.52MB/s]
Downloading pytorch_model.bin: 100%|██████████| 863M/863M [02:22<00:00, 6.07MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 422kB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 128kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 1.29MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 969kB/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[Conversation id: 8c5e3aba-a4b4-41c5-988d-24045b444c27 
 user >> Going to the movies tonight - any suggestions? 
 bot >> The Big Lebowski ,
 Conversation id: ebde4284-7a61-4d07-b939-cae90e5c9e5a 
 user >> What's the last book you have read? 
 bot >> The Last Question ]

### 2. Fill-Mask
Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in.

In [6]:
classifier =pipeline("fill-mask")
classifier("Paris is the <mask> of France")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.48739415407180786,
  'token': 812,
  'token_str': ' capital',
  'sequence': 'Paris is the capital of France'},
 {'score': 0.06109646335244179,
  'token': 1144,
  'token_str': ' heart',
  'sequence': 'Paris is the heart of France'},
 {'score': 0.05793645232915878,
  'token': 1867,
  'token_str': ' Capital',
  'sequence': 'Paris is the Capital of France'},
 {'score': 0.0470590777695179,
  'token': 32357,
  'token_str': ' birthplace',
  'sequence': 'Paris is the birthplace of France'},
 {'score': 0.04390862211585045,
  'token': 29778,
  'token_str': ' envy',
  'sequence': 'Paris is the envy of France'}]

### 3. Question Answering
Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!

In [7]:
qa_model = pipeline("question-answering")

question = "where do I live?"
context = "My name is John and I live in London"
qa_model(question=question, context=context)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 473/473 [00:00<00:00, 2.26MB/s]
Downloading model.safetensors: 100%|██████████| 261M/261M [00:46<00:00, 5.66MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 101kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 590kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 577kB/s]


{'score': 0.9793685674667358, 'start': 30, 'end': 36, 'answer': 'London'}

### 4. Sentence Similarity
Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping.

In [9]:
from sentence_transformers import SentenceTransformer, util

sentences = ["I'm happy", "I'm full of joy"]

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

embedding_1 = model.encode(sentences[0], convert_to_tensor=True)
embedding_2 = model.encode(sentences[1], convert_to_tensor=True)

cosine_scores = util.pytorch_cos_sim(embedding_1, embedding_2)
print(cosine_scores)

tensor([[0.5912]])


### 5. Summarization
 is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.

In [10]:
classifier = pipeline("summarization")

classifier("Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 1.80k/1.80k [00:00<00:00, 12.4MB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.22G/1.22G [04:30<00:00, 4.51MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 113kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.23MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 537kB/s]
Your max_length is set to 142, but your input_length is only 96. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=48)


[{'summary_text': ' Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018 . The city is the centre and seat of government of the region and province of Île-de-France, or Paris Region . Paris Region has an estimated 18 percent of the population of France as of 2017 .'}]

### 6. Table Question Answering
 (Table QA) is the answering a question about an information on a given table.

In [15]:
import pandas as pd 
data = {"Actors": ["Tom Cruise", "Tom Hanks", "Brad Pitt", "Angelina Jolie", "Julia Roberts"], "Number of movies": [42, 47, 53, 46, 44]}
df = pd.DataFrame.from_dict(data)
question = "how many movies does Brad Pitt have?"

tga = pipeline("table-question-answering", model="google/tapas-large-finetuned-wtq")

tga(query=question, table=df)['cells'][0]

In [16]:
classifier = pipeline("text-classification", model = "roberta-large-mnli")
classifier("A soccer game with multiple males playing. Some men are playing a sport.")

Downloading (…)lve/main/config.json: 100%|██████████| 688/688 [00:00<00:00, 1.76MB/s]
Downloading model.safetensors: 100%|██████████| 1.43G/1.43G [04:33<00:00, 5.22MB/s]
Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:01<00:00, 845kB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k

[{'label': 'ENTAILMENT', 'score': 0.9883742332458496}]

### 7. Text Generation
Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase.

In [17]:
generator = pipeline('text-generation', model = 'gpt2')
generator("Hello, I'm a language model", max_length = 30, num_return_sequences=3)

Downloading model.safetensors: 100%|██████████| 548M/548M [01:50<00:00, 4.97MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 510kB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, I\'m a language modeler. If I do this post, a blog post… and an interview, everyone\'s gonna win!"\n\n'},
 {'generated_text': "Hello, I'm a language model that is really good at capturing an important insight or a key concept and using that. I've done a couple things"},
 {'generated_text': "Hello, I'm a language model. It took the trouble to get my PhD. It's the same thing as every other work, but my colleagues"}]

### 8. Text-to-Text generation
Text-to-Text generation models have a separate pipeline called text2text-generation. This pipeline takes an input containing the sentence including the task and returns the output of the accomplished task.



In [18]:
text2text_generator = pipeline("text2text-generation")
text2text_generator("question: What is 42 ? context: 42 is the answer to life, the universe and everything")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 1.21k/1.21k [00:00<00:00, 6.91MB/s]
Downloading model.safetensors: 100%|██████████| 892M/892M [02:58<00:00, 5.00MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 147/147 [00:00<00:00, 666kB/s]
Downloading (…)ve/main/spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 899kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.39M/1.39M [00:00<00:00, 1.71MB/s]
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max

[{'generated_text': 'the answer to life, the universe and everything'}]

### 9. Token classification 
Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks.

In [19]:
classifier = pipeline("ner")
classifier("Hello I'm Omar and I live in Zürich.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 998/998 [00:00<00:00, 4.80MB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.33G/1.33G [04:49<00:00, 4.61MB/s]
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model

[{'entity': 'I-PER',
  'score': 0.99770516,
  'index': 5,
  'word': 'Omar',
  'start': 10,
  'end': 14},
 {'entity': 'I-LOC',
  'score': 0.9968976,
  'index': 10,
  'word': 'Zürich',
  'start': 29,
  'end': 35}]

### 10. Translation
 is the task of converting text from one language to another.

In [20]:
model_checkpoint = "Helsinki-NLP/opus-mt-en-fr"
translator = pipeline("translation", model=model_checkpoint)
translator("How are you?")

Downloading (…)lve/main/config.json: 100%|██████████| 1.42k/1.42k [00:00<00:00, 5.85MB/s]
Downloading pytorch_model.bin: 100%|██████████| 301M/301M [01:09<00:00, 4.36MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 293/293 [00:00<00:00, 123kB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 42.0/42.0 [00:00<00:00, 49.9kB/s]
Downloading (…)olve/main/source.spm: 100%|██████████| 778k/778k [00:00<00:00, 917kB/s]
Downloading (…)olve/main/target.spm: 100%|██████████| 802k/802k [00:00<00:00, 1.56MB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.34M/1.34M [00:01<00:00, 1.26MB/s]


[{'translation_text': 'Comment allez-vous ?'}]

### 11. Zero-shot text classification
Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes.


In [21]:
pipe = pipeline(model="facebook/bart-large-mnli")
pipe("I have a problem with my iphone that needs to be resolved asap!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)

Downloading (…)lve/main/config.json: 100%|██████████| 1.15k/1.15k [00:00<00:00, 4.06MB/s]
Downloading model.safetensors: 100%|██████████| 1.63G/1.63G [05:43<00:00, 4.75MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 19.3kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.11MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 704kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 4.25MB/s]


{'sequence': 'I have a problem with my iphone that needs to be resolved asap!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.5227586627006531,
  0.4581395089626312,
  0.014264822006225586,
  0.0026850018184632063,
  0.002152073197066784]}

### 12. Depth estimation
Depth estimation is the task of predicting depth of the objects present in an image.


In [11]:
estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
result = estimator(images="http://images.cocodataset.org/val2017/000000039769.jpg")
result

### 13. Image classification
Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Image classification models take an image as input and return a prediction about which class the image belongs to.

In [23]:
clf = pipeline("image-classification")
clf("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg")

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 5dca96d (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 69.7k/69.7k [00:00<00:00, 338kB/s]
Downloading pytorch_model.bin: 100%|██████████| 346M/346M [00:52<00:00, 6.56MB/s] 
Downloading (…)rocessor_config.json: 100%|██████████| 160/160 [00:00<00:00, 656kB/s]


[{'score': 0.8808925151824951, 'label': 'golden retriever'},
 {'score': 0.0977909192442894, 'label': 'Labrador retriever'},
 {'score': 0.0021747981663793325, 'label': 'tennis ball'},
 {'score': 0.0019781359005719423, 'label': 'Sussex spaniel'},
 {'score': 0.001014324021525681, 'label': 'kuvasz'}]

### 14. Image Segmentation
Image Segmentation divides an image into segments where each pixel in the image is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation.

In [2]:
model = pipeline("image-segmentation")
model("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg")

No model was supplied, defaulted to facebook/detr-resnet-50-panoptic and revision fc15262 (https://huggingface.co/facebook/detr-resnet-50-panoptic).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading model.safetensors: 100%|██████████| 102M/102M [00:15<00:00, 6.53MB/s] 
Some weights of the model checkpoint at facebook/detr-resnet-50-panoptic were not used when initializing DetrForSegmentation: ['detr.model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked', 'detr.model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'detr.model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'detr.model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForSegmentation from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from 

[{'score': 0.99892,
  'label': 'LABEL_193',
  'mask': <PIL.Image.Image image mode=L size=1200x1197>},
 {'score': 0.998526,
  'label': 'dog',
  'mask': <PIL.Image.Image image mode=L size=1200x1197>}]

### 15. Image-to-image
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models.

In [2]:
from PIL import Image
import torch
from diffusers import StableDiffusionImg2ImgPipeline
model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

init_image = Image.open("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg").convert("RGB").resize((768, 512))
prompt = "A running dog in the forest"

images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("images/running_dogs.png")


### 16. Object Detection
Object Detection models allow users to identify objects of certain defined classes. Object detection models receive an image as input and output the images with bounding boxes and labels on detected objects.

In [10]:
model = pipeline("object-detection")

model("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg")

### 17. Text-to-Speech
Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages.

In [9]:
from espnet2.bin.tts_inference import Text2Speech

model = Text2Speech.from_pretrained("espnet/kan-bayashi_ljspeech_vits")

speech, *_ = model("Hello, I'm a language model.")


### 18.Feature extraction 
refers to the process of transforming raw data into numerical features that can be processed while preserving the information in the original dataset.

In [8]:
checkpoint = "facebook/bart-base"
feature_extractor = pipeline("feature-extraction",framework="pt",model=checkpoint)
text = "Transformers is an awesome library!"

#Reducing along the first dimension to get a 768 dimensional array
feature_extractor(text,return_tensors = "pt")[0].numpy().mean(axis=0) 

### 19. Image to text 
models output a text from a given image. Image captioning or optical character recognition can be considered as the most common applications of image to text.

In [6]:
captioner = pipeline("image-to-text",model="Salesforce/blip-image-captioning-base")
captioner("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg")

[{'generated_text': 'a puppy sitting in the grass with its mouth open'}]

### 20. Text to Image 
Generates images from input text. These models can be used to generate and modify images based on text prompts.

In [1]:
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
model_id = "stabilityai/stable-diffusion-2"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("mps")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

### 21. Visual Question Answering
Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions.

In [5]:
from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline("visual-question-answering")

image =  Image.open("images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg")
question = "Is there a dog?"

vqa_pipeline(image, question, top_k=1)

No model was supplied, defaulted to dandelin/vilt-b32-finetuned-vqa and revision 4355f59 (https://huggingface.co/dandelin/vilt-b32-finetuned-vqa).
Using a pipeline without specifying a model name and revision in production is not recommended.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


[{'score': 0.999915599822998, 'answer': 'yes'}]

### 22. Zero shot image classification 
Zero shot image classification is the task of classifying previously unseen classes during training of a model.

In [7]:
model_name = "openai/clip-vit-large-patch14-336"
classifier = pipeline("zero-shot-image-classification", model = model_name)

image_to_classify = "images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg"
labels_for_classification =  ["dog", 
                              "lion", 
                              "rabbit"]
scores = classifier(image_to_classify, 
                    candidate_labels = labels_for_classification)