<a href="https://www.kaggle.com/code/kaggleashwin/nlp-using-transformers?scriptVersionId=137599909" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## **Transformers** are a neural network architecture for sequence modeling proposed by researchers at Google in 2017 in the paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762). Since then it had become State-of-art-architecture for NLP tasks such as Text classification, Text generation, Text summarization etc..

### The implementation becomes much simpler with [HuggingFace](https://huggingface.co/) Transformers, Since it has pre-trained models for such tasks. HuggingFace Hub has lot of pre-trained models which are open-sourced. In this notebook we will implement few basic NLP tasks.

## 🤗 Transformers

In [1]:
# The below text is a customer review on a pen-stand.
text = """
I recently purchased a Pen stand from XYZ Stationery Store,Budapest. and I must say, it was a huge disappointment. The product itself looked nice, but the quality was terrible. The pen stand was supposed to be made of durable plastic, but it felt flimsy and cheap.

The worst part was the design - it claimed to have multiple compartments for pens, pencils, and other stationery items, but they were so poorly designed that none of my pens could fit properly. The compartments were too narrow and unevenly spaced, making it impossible to store anything without it falling over.

I reached out to their customer support team at ABC Retailers, hoping for a replacement or refund, but they were not helpful at all. It took them forever to respond, and when they finally did, they were dismissive and unapologetic. They said the product was 'as advertised' and that I should learn to use it properly.

As someone who values organization and quality in their office supplies, this Pen stand was a complete letdown. I would advise anyone looking to buy a pen stand to stay away from this one and explore other options available in the market."

"""

### Pipelines in 🤗 Transformers connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

## Text classification (Sentiment analysis):

Most often while handeling the customer review, we might be in a possition to classify the review i.e whether the feedback is "positive" or "negative".

In [2]:
from transformers import pipeline
classifier = pipeline("text-classification")

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [3]:
import pandas as pd
outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.999779


## Named Entity Recognition:

Here the model will tag the entities in it, such as person, organisation, location etc..

In [4]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.97988,XYZ Stationery Store,39,59
1,LOC,0.999513,Budapest,60,68
2,ORG,0.988145,ABC Retailers,628,641


## Question answering:

In [5]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Unnamed: 0,score,start,end,answer
0,0.609771,654,677,a replacement or refund


## Summarizing the text:

In [6]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=70, clean_up_tokenization_spaces=True)
print("\n\n")
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]




 The pen stand was supposed to be made of durable plastic, but it felt flimsy and cheap. The design was so poorly designed that none of my pens could fit properly. The compartments were too narrow and unevenly spaced, making it impossible to store anything without it falling over.


### We have implemented few basic tasks in NLP, and the capablities of the transformers are far beyond this. We can even fine tune a model using transfer-learning with our own custom datasets.

### I hope you liked this notebook, and if you are intreasted in learning NLP [https://huggingface.co/learn/nlp-course/](https://huggingface.co/learn/nlp-course/) this would be a great place to start with.