https://shorturl.at/9BEya

**Familiarize with what the state of the art Natural Language Processing (NLP) models can do using the Hugging face pipelines**

(https://colab.research.google.com/notebooks/welcome.ipynb; https://huggingface.co/; https://huggingface.co/docs/transformers/main_classes/pipelines)

The pipelines are a great and easy way to use models. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.


**GPU set-up**

Before installing the required libraries, we change the runtime to GPU. From the drop-down-menu: Runtime->Change runtime type-> GPU. Colab helps us to enable the use of single GPU and can be helpful for preliminary experimental set ups and inferences.

Check whether GPU is enabled.

In [None]:
import torch
if torch.cuda.is_available():
    device_name = torch.cuda.get_device_name(0)
    print(f'Found GPU')
else:
    print("No GPU available, using CPU instead.")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
print(f'GPU name : {torch.cuda.get_device_name(0)}')

Found GPU
GPU name : Tesla T4


**Installing Required Libraries and Packages:**

Transformer (https://huggingface.co/docs/transformers/index) library using pip (https://pip.pypa.io/en/stable/)

In [None]:
! pip install transformers



In [None]:
from transformers import pipeline

**Application 1: Sentiment Analysis**
Input the text and call the Hugging Face pipeline with specified task

In [None]:
text = "This movie is beautiful. I would like to watch this movie again."
#instantiate a pipeline by calling the pipeline function
classifier = pipeline("sentiment-analysis")
print(classifier(text))

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9998657703399658}]


In the above cases, we didn't specify any model name. Generally it is recommended to specify a model name (you can use the existing pre-trained models from the hugging face: https://huggingface.co/models)

In [None]:
text = "This movie is beautiful. I would like to watch this movie again."
#instantiate a pipeline by calling the pipeline function, device parameter to enable GPU, default=-1 (CPU)
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english",device=0)
print(classifier(text))

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9998657703399658}]


**Application 2: Text Generation**

In [None]:
generator = pipeline("text-generation")
prompt= "This tutorial will walk you through how to"
generator(prompt)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'This tutorial will walk you through how to create a virtual environment that will let you create content on your Android phone with the Play Store and then display it in the Play Library for use online or on your smartphone.\n\nSetup Virtualization of Your Android'}]

In [None]:
generator = pipeline("text-generation", model="distilgpt2")
generator(prompt)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'This tutorial will walk you through how to convert the video into a HTML5 format that is easy to use and easily readable to use.\n\n\n\nThe video tutorial will contain an array of options for displaying the video on a file and a bit'}]

**Application 3: Question Answering**

In [None]:

question = "How many programming languages does BLOOM support?"
context = "BLOOM has 176 billion parameters and can generate text in 46 languages natural languages and 13 programming languages."
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
answer = question_answerer(question=question, context=context)
print(f"answer distilbert-base-cased-distilled-squad : {answer['answer']}")


question_answerer = pipeline("question-answering", model='distilbert/distilbert-base-uncased-finetuned-sst-2-english')
answer = question_answerer(question=question, context=context)
print(f"answer distilbert-base-uncased-finetuned-sst-2-english : {answer['answer']}")


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0
Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


answer distilbert-base-cased-distilled-squad : 13
answer distilbert-base-uncased-finetuned-sst-2-english : can generate text in 46 languages natural languages and 13 programming


In [None]:
question_answerer = pipeline("question-answering")
context = """
🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch, and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""
question = "Which deep learning libraries back 🤗 Transformers?"
question_answerer(question=question, context=context)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'score': 0.98026043176651,
 'start': 78,
 'end': 106,
 'answer': 'Jax, PyTorch, and TensorFlow'}

**Application** **4: Translation**

In [None]:
classifier = pipeline('translation', model='snehalyelmati/mt5-hindi-to-english')

Device set to use cuda:0


In [None]:
print(classifier("US TOP-10: बाइडेन ने अमेरिका के वजूद के लिए NATO को बताया अहम, देखें बड़ी खबरें"))

Your input_length: 33 is bigger than 0.9 * max_length: 20. You might consider increasing your max_length manually, e.g. translator('...', max_length=400)


[{'translation_text': 'The rumor turned to the NATO, Please wait for a breath of fresh air.'}]


In [None]:

classifier = pipeline('translation_en_to_de', model='t5-base')

print(classifier('I love dogs!'))


Device set to use cuda:0


[{'translation_text': 'Ich liebe Hunde!'}]
