<a href="https://colab.research.google.com/github/Sriram-nameda/NLP/blob/Practice/Updated_huggingface_pipeline_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Using pipelines

In [1]:
from transformers import pipeline

In [2]:
sentiment_pipeline = pipeline("sentiment-analysis", framework="tf")


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use 0


In [3]:
result = sentiment_pipeline(["I love my college!",'I hate people opinion about AI'])
print(result)


[{'label': 'POSITIVE', 'score': 0.9998700618743896}, {'label': 'NEGATIVE', 'score': 0.9985387325286865}]


piplines options - https://huggingface.co/docs/transformers/en/main_classes/pipelines

Changing the default model

In [5]:
sentiment_pipeline2 = pipeline("sentiment-analysis", model="BAAI/bge-reranker-v2-m3", framework="tf")


config.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFXLMRobertaForSequenceClassification.

All the weights of TFXLMRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFXLMRobertaForSequenceClassification for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

Device set to use 0


## What if there is no pipeline and i want to use a different tokenizer and model

In [None]:
from transformers import BertTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf

In [None]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [None]:
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-multilingual-cased")


In [None]:
text = "I love learning about NLP!"
tokens = tokenizer(text, return_tensors="tf", padding=True, truncation=True)

In [None]:
print("Tokenized Words:", tokenizer.tokenize(text))
print("Token IDs:", tokens["input_ids"].numpy().tolist()[0])

In [None]:
outputs = model(**tokens)


In [None]:
logits = outputs.logits
probs = tf.nn.softmax(logits, axis=-1)

In [None]:
label_idx = tf.argmax(probs, axis=-1).numpy()[0]
labels = ["NEGATIVE", "POSITIVE"]
print(f"Predicted Sentiment: {labels[label_idx]}, Probability: {probs.numpy().max():.4f}")

## Other examples of pipelines

In [6]:
generator = pipeline("text-generation", model="gpt2")

prompt = "There once lived a king"
result = generator(prompt, max_length=30, num_return_sequences=1)

print(result)


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'There once lived a king, and once a king lives in an endless sea. If it was not for him, he would have no sons. Even'}]


In [8]:
generator = pipeline("text-generation", model="simplescaling/s1-32B")

prompt = "There once lived a king"
result = generator(prompt, max_length=30, num_return_sequences=1)

print(result)

config.json:   0%|          | 0.00/830 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/63.2k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/29 [00:00<?, ?it/s]

model-00001-of-00029.safetensors:   0%|          | 0.00/4.50G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [9]:
qa_pipeline = pipeline("question-answering")

context = """The Great Wall of China is a historic fortification that stretches
over 13,000 miles. It was primarily built to protect against invasions and
was constructed during the Ming Dynasty."""

question = "Who built the Great Wall of China?"

result = qa_pipeline(question=question, context=context)

print(result)


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


{'score': 0.44635364413261414, 'start': 171, 'end': 183, 'answer': 'Ming Dynasty'}


In [15]:
qa_pipeline = pipeline("question-answering")

context = """The French inventor Nicolas-Joseph Cugnot built the first steam-powered road vehicle in 1769,
while the Swiss inventor François Isaac de Rivaz designed and constructed the first internal combustion-powered automobile in 1808.
The modern car—a practical, marketable automobile for everyday use—was invented in 1886,
when the German inventor Carl Benz patented his Benz Patent-Motorwagen. Commercial cars became widely available during the 20th century.
The 1901 Oldsmobile Curved Dash and the 1908 Ford Model T, both American cars, are widely considered the first mass-produced[3][4] and mass-affordable[5][6][7] cars
"""

question = "Who built first steam powered car?"

result = qa_pipeline(question=question, context=context)

print(result)


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.9416631460189819, 'start': 20, 'end': 41, 'answer': 'Nicolas-Joseph Cugnot'}


In [18]:
qa_pipeline = pipeline("question-answering")

context = """The French inventor Nicolas-Joseph Cugnot built the first steam-powered road vehicle in 1769,
while the Swiss inventor François Isaac de Rivaz designed and constructed the first internal combustion-powered automobile in 1808.
The modern car—a practical, marketable automobile for everyday use—was invented in 1886,
when the German inventor Carl Benz patented his Benz Patent-Motorwagen. Commercial cars became widely available during the 20th century.
The 1901 Oldsmobile Curved Dash and the 1908 Ford Model T, both American cars, are widely considered the first mass-produced[3][4] and mass-affordable[5][6][7] cars
"""

question = input()

result = qa_pipeline(question=question, context=context)

print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Who built first steam powered car
{'score': 0.9643811583518982, 'start': 20, 'end': 41, 'answer': 'Nicolas-Joseph Cugnot'}
