### The pipeline function
It returns an end-to-end object that performs an NLP task on one or several texts.
It's the most high level API on the transformers library.
It goes from raw text to usable predictions.

What happens inside the pipeline function?

1. Tokenizer -> From raw text to InputIds (vector)
2. Model -> Passing throw the model the logits are generated (-3.401,4.53)
3. PostProcessing -> Transforms Logits into predictions with the corresponding label. Positive or Negative

This is an example with sentiment analysis
### Tokenizer



In [7]:
text = "this course is amazing"
# tokenize text
token_text = text.split() #['this', 'course', 'is', 'amazing']

#Add special tokens
spectial_tokens = [['CLS'],'this', 'course', 'is', 'amazing',['SEP']]

# Tokenizer relates each token with the unique ID
input_IDs = [101,2023,2607,2003,6429,999,102]

In [None]:
from tokenizers import AutoTokenizer # Loading the tokenizer

checkpoint = "distilbert-base-uncased-fintuned-sst-2-english" # chosen checkpoint
tokenizer = AutoTokenizer.from_pretrained(checkpoint) #Most important method. Donwloads and caches the configuration 
raw_imputs = [
    "I'm very happy to be here",
    "I can´t believe how sad my life is"
]
inputs = tokenizer(raw_imputs,padding=True, truncation =True,return_tensors='tf') 


### Model

In [7]:
from transformers import TFAutoModel
checkpoint = "distilbert-base-uncased-fintuned-sst-2-english" # chosen checkpoint
model = TFAutoModel.from_pretrained(checkpoint) # Download and cache the configuration of the model as well as the weights
outputs = model(inputs)
outputs.last_hidden_state.shape
# It will generate a tensor of high dimension which is a representation of the sentences passed

# (2,16,768)


#For classification
from transformers import TFAutoModelForSequenceClassification
checkpoint = "distilbert-base-uncased-fintuned-sst-2-english" # chosen checkpoint
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint) # Download and cache the configuration of the model as well as the weights
outputs = model(inputs)
outputs.logits

#Our model will of shape (2,2)
#This output still is not a probability
# Our result will be logits

OSError: distilbert-base-uncased-fintuned-sst-2-english is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

### Postprocessing

In [None]:
# To convert logits into probabilities we need to use soft max
import tensorflow as tf
predictions = tf.math.softmax(outputs.logits,axis=1)
print(predictions )

In [1]:
from transformers import pipeline

2024-04-23 15:33:05.449600: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-23 15:33:05.768221: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Sentiment analysis pipeline

In [2]:
classifier = pipeline("sentiment-analysis")
classifier("I've been very happy this week")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998724460601807}]

In [5]:
classifier(['I was so sad today',"This looks like a wonderful day"])

[{'label': 'NEGATIVE', 'score': 0.9986395239830017},
 {'label': 'POSITIVE', 'score': 0.9998874664306641}]

### Zero shot classification pipeline
Let's you select labels for classification

In [6]:
classifier_zero_shot = pipeline('zero-shot-classification')
classifier_zero_shot(
    "My cat is very hungry",
    candidate_labels=['animals','sports','love']
)

No model was supplied, defaulted to FacebookAI/roberta-large-mnli and revision 130fb28 (https://huggingface.co/FacebookAI/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


{'sequence': 'My cat is very hungry',
 'labels': ['animals', 'love', 'sports'],
 'scores': [0.969976544380188, 0.018937118351459503, 0.01108638010919094]}

### Text generation task
Uses an input prompt to generate text

In [7]:
text_generator = pipeline('text-generation')
text_generator("My grandma was in the park ")


No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My grandma was in the park \xa0when she saw a car. The passenger left before I could open the door and pull out my phone as quickly as I can. \xa0I walked up to her, saw her on horseback, and asked'}]

In [8]:
text_generator("Soccer is a sport where")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


KeyboardInterrupt: 

### Choosing the model of the whosen pipelina

In [None]:
generator = pipeline('text-generation',model="distilgpt2")
generator("In this house we always try to",
          max_length=30,
          num_return_sequences=2,
          )


All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this house we always try to be a good home and we don't care for families being forced to live in this neighborhood that doesn't support this"},
 {'generated_text': 'In this house we always try to work together with. When I was a kid, my mother had my whole life in debt. As far as I'}]

In [None]:
generator("In this house we always try to",
          max_length=30,
          num_return_sequences=5,
          )

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this house we always try to have some nice things with no fear of overnesting everything," he said.'},
 {'generated_text': 'In this house we always try to have kids, but we always go back and try and do things that are fun for the family. So there have'},
 {'generated_text': 'In this house we always try to do a simple little search and if you want to get ahold of items I think you should look really good.'},
 {'generated_text': 'In this house we always try to be a family, we don\'t want to have kids that will hurt someone," she said.\n\n\n\n'},
 {'generated_text': 'In this house we always try to get down to basics.\n\nThe first thing you know about this house is that it costs the right amount of'}]

## Fill-mask
The fill-mask pipeline will predict missing words in a sentence

In [None]:
get_mask= pipeline('fill-mask')
get_mask("At night my cat always <mask>" , top_k=2 )



No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFRobertaForMaskedLM.

All the weights of TFRobertaForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForMaskedLM for predictions without further training.


[{'score': 0.7773540019989014,
  'token': 36831,
  'token_str': ' sleeps',
  'sequence': 'At night my cat always sleeps'},
 {'score': 0.03538073971867561,
  'token': 25355,
  'token_str': ' cries',
  'sequence': 'At night my cat always cries'}]

## Ner
The NER pipeline identifies entities such as persons, organizations or locations in a sentence

In [None]:
ner = pipeline('ner',grouped_entities=True)
ner('My name is Bruno and I want to work as a Machine Learning Engineer')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.


[{'entity_group': 'PER',
  'score': 0.99840504,
  'word': 'Bruno',
  'start': 11,
  'end': 16},
 {'entity_group': 'MISC',
  'score': 0.8509879,
  'word': 'Engineer',
  'start': 58,
  'end': 66}]

## Question answering
Extracts answers to a question from a given context

In [None]:
question_answer = pipeline('question-answering')
question_answer(
    question="How old is my cat?",
    context="My cat is 2 years old"
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


{'score': 0.43945616483688354, 'start': 10, 'end': 21, 'answer': '2 years old'}

## Summarization
Creates summarizes of long texts

In [None]:
summarize = pipeline('summarization')
summarize("""
          Argentina’s president Javier Milei is a throwback. He sports muttonchops and a leather jacket. His favorite band is the Rolling Stones. 
          He is a Cold Warrior, spouting anachronistic anti-communist rhetoric that doesn’t correspond to the geopolitical dynamics of the 
          twenty-first century. But the realities of Argentine and global economics have not deterred Milei from pursuing a program of trade and 
          foreign policy that seeks to realign the country with the United States and its interests, in a pantomime of the Cold War. By drastically 
          cutting spending, privatizing national industries, and deregulating the economy, Milei hopes not just to curb inflation but to draw foreign 
          investment and return Argentina to the fold of the international financial system. In this, too, he harks back to the past and to the 
          country’s long-standing desire to reenter the “first world” — while expressing old anxieties about its decades-long economic decline.
        But if Milei truly looked back at the past, he’d realize that overreliance on US support will not help the Argentine economy, or, more importantly, 
          its people. If Argentina is to resolve its fiscal woes and develop the infrastructure and industry necessary for its future, it should turn 
          away from US hegemony and embrace the promise of a multipolar approach to trade and foreign policy. This was an approach already taken up by 
          the previous administration — and one that the opposition should cherish as a vision for 
          """)

No model was supplied, defaulted to google-t5/t5-small and revision d769bba (https://huggingface.co/google-t5/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


[{'summary_text': "argentina's president is a Cold Warrior, spouting anachronistic anti-communist rhetoric . but the realities of Argentine and global economics have not deterred Milei from pursuing a program of trade and foreign policy . by drastically cutting spending, privatizing national industries, and deregulating the economy, Milei hopes not just to curb inflation but to draw foreign investment and return Argentina to the fold of the international financial system ."}]

## Translation

In [10]:
translator = pipeline('translation',model="Helsinki-NLP/opus-mt-es-en")
translator("Me encanta vivir en Argentina, mi próximo paso es trabajar para afuera, pero continuaré viviendo acá")

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-es-en.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


source.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

[{'translation_text': 'I love living in Argentina, my next step is working out, but I will continue living here'}]

### Text to audio

In [None]:
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch

import soundfile as sf
from datasets import load_dataset

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

inputs = processor(text="Hello, my dog is cute.", return_tensors="pt")

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

sf.write("speech.wav", speech.numpy(), samplerate=16000)


ModuleNotFoundError: No module named 'datasets'

In [None]:
from transformers import VitsModel, AutoTokenizer
import torch

model = VitsModel.from_pretrained("facebook/mms-tts-eng")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-eng")

text = "some example text in the English language"
inputs = tokenizer(text, return_tensors="tf")

with torch.no_grad():
    output = model(**inputs).waveform


ModuleNotFoundError: No module named 'torch'

In [None]:
# Use a pipeline as a high-level helper

pipe = pipeline("text-to-speech", model="myshell-ai/MeloTTS-English")

ValueError: Unrecognized model in myshell-ai/MeloTTS-English. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chinese_clip, chinese_clip_vision_model, clap, clip, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, graphormer, grounding-dino, groupvit, hubert, ibert, idefics, idefics2, imagegpt, informer, instructblip, jamba, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, longformer, longt5, luke, lxmert, m2m_100, mamba, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mistral, mixtral, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nezha, nllb-moe, nougat, nystromformer, olmo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, pix2struct, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_moe, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso

In [None]:
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    config,
    speaker_wav="/3.wav",
    gpt_cond_len=3,
    language="en",
)


ModuleNotFoundError: No module named 'TTS'

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="suno/bark")

ValueError: Pipeline cannot infer suitable model classes from suno/bark

In [None]:
from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-speech", "suno/bark")

speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})

scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])


ValueError: Pipeline cannot infer suitable model classes from suno/bark

In [None]:
pipe = pipeline("text-to-speech", model="microsoft/speecht5_tts")

ValueError: Pipeline cannot infer suitable model classes from microsoft/speecht5_tts

In [12]:
from transformers import pipeline
from datasets import load_dataset
import soundfile as sf

synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts")

embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
# You can replace this embedding with your own as well.

speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"speaker_embeddings": speaker_embedding})

sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])


ValueError: Pipeline cannot infer suitable model classes from microsoft/speecht5_tts

In [14]:
from transformers import pipeline
import scipy
!pip install git+https://github.com/suno-ai/bark.git
synthesiser = pipeline("text-to-speech", "suno/bark")

speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})

scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-9297arv4
  Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-9297arv4
  Resolved https://github.com/suno-ai/bark.git to commit f4f32d4cd480dfec1c245d258174bc9bde3c2148
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting boto3 (from suno-bark==0.0.1a0)
  Downloading boto3-1.34.88-py3-none-any.whl.metadata (6.6 kB)
Collecting encodec (from suno-bark==0.0.1a0)
  Using cached encodec-0.1.1-py3-none-any.whl
Collecting funcy (from suno-bark==0.0.1a0)
  Downloading funcy-2.0-py2.py3-none-any.whl.metadata (5.9 kB)
Collecting torch (from suno-bark==0.0.1a0)
  Downloading torch-2.2.2-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collec

ValueError: Pipeline cannot infer suitable model classes from suno/bark

In [17]:
from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio
#synthesiser = pipeline("text-to-speech", "suno/bark")

#speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
preload_models()
#scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])

No GPU being used. Careful, inference might be very slow!


text_2.pt:   0%|          | 0.00/5.35G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/625 [00:00<?, ?B/s]

coarse_2.pt:   0%|          | 0.00/3.93G [00:00<?, ?B/s]

fine_2.pt:   0%|          | 0.00/3.74G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [None]:
text_prompt = """
Hello, my name is Bruce. I'm glad I came
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array,rate=SAMPLE_RATE)

