<div class="alert alert-info">
    <h3 align = center>Hugging face transformers for sentiment analysis.</h3>
</div>

#### Transformers pipeline makes it easy to access the models like classifier, tokenizer or any other available model in Hugging face to performe the diffrent tasks such as NLP, Computer vision etc.. [Tasks](https://huggingface.co/tasks)

In [11]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

#### Pipeline for `named entity recognition` Here let's choose a model available in hugging face. [Reference](https://huggingface.co/learn/nlp-course/en/chapter1/3?fw=pt), [Choosen model](https://huggingface.co/models?pipeline_tag=token-classification&sort=trending)

In [12]:
ner = pipeline('ner', model = "dslim/bert-base-NER")

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


In [13]:
ner(" I am Pavan Kumar a stduent in TUM university, I like to learn about the AI and computer vision also")

[{'entity': 'B-PER',
  'score': 0.99910825,
  'index': 3,
  'word': 'Pa',
  'start': 6,
  'end': 8},
 {'entity': 'B-PER',
  'score': 0.97229,
  'index': 4,
  'word': '##van',
  'start': 8,
  'end': 11},
 {'entity': 'I-PER',
  'score': 0.9955035,
  'index': 5,
  'word': 'Kumar',
  'start': 12,
  'end': 17},
 {'entity': 'B-ORG',
  'score': 0.994769,
  'index': 12,
  'word': 'T',
  'start': 31,
  'end': 32},
 {'entity': 'I-ORG',
  'score': 0.9623844,
  'index': 13,
  'word': '##UM',
  'start': 32,
  'end': 34},
 {'entity': 'B-MISC',
  'score': 0.72965735,
  'index': 22,
  'word': 'AI',
  'start': 73,
  'end': 75}]

### Let's try a different model.

In [14]:
ner_1 = pipeline( "ner", model = "FacebookAI/xlm-roberta-large-finetuned-conll03-english")

Some weights of the model checkpoint at FacebookAI/xlm-roberta-large-finetuned-conll03-english were not used when initializing XLMRobertaForTokenClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


In [15]:
ner_1('I am Pavan Kumar a stduent in TUM university, I like to learn about the AI and computer vision also')

[{'entity': 'I-PER',
  'score': 0.9999907,
  'index': 3,
  'word': '▁Pa',
  'start': 5,
  'end': 7},
 {'entity': 'I-PER',
  'score': 0.99998033,
  'index': 4,
  'word': 'van',
  'start': 7,
  'end': 10},
 {'entity': 'I-PER',
  'score': 0.99999213,
  'index': 5,
  'word': '▁Kumar',
  'start': 11,
  'end': 16},
 {'entity': 'I-ORG',
  'score': 0.99593395,
  'index': 11,
  'word': '▁',
  'start': 30,
  'end': 31},
 {'entity': 'I-ORG',
  'score': 0.99858934,
  'index': 12,
  'word': 'TUM',
  'start': 30,
  'end': 33},
 {'entity': 'I-MISC',
  'score': 0.69067717,
  'index': 21,
  'word': '▁AI',
  'start': 72,
  'end': 74}]

### Zero_Short_Classifier. (Classifying the text without model trained explicitly on new data)

In [16]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

Device set to use mps:0


In [17]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)



{'sequence': 'one day I will see the world',
 'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9938651323318481, 0.0032737720757722855, 0.002861043205484748]}

### Pretrained tokenizer.

In [18]:
from transformers import AutoTokenizer


In [19]:
model = "bert-base-uncased"

In [20]:
tokenizer = AutoTokenizer.from_pretrained(model)

In [21]:
text = "I am Pavan Kumar love to learn about the technology"

In [22]:
ids = tokenizer(text)

In [23]:
print(ids)

{'input_ids': [101, 1045, 2572, 6643, 6212, 9600, 2293, 2000, 4553, 2055, 1996, 2974, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [24]:
tokens = tokenizer.tokenize(text)

In [25]:
print(tokens)

['i', 'am', 'pa', '##van', 'kumar', 'love', 'to', 'learn', 'about', 'the', 'technology']


In [26]:
token_ids = tokenizer.convert_tokens_to_ids(tokens)
token_ids

[1045, 2572, 6643, 6212, 9600, 2293, 2000, 4553, 2055, 1996, 2974]

In [27]:
tokenizer.decode(101)

'[CLS]'

### Try for different model.

In [28]:
model_2 = "xlnet-base-cased"

In [29]:
tokenizer_2 = AutoTokenizer.from_pretrained(model_2)

In [30]:
input_ids_1 = tokenizer_2("text")

In [31]:
tokens = tokenizer_2.tokenize(text)
print(tokens)

['▁I', '▁am', '▁Pa', 'van', '▁Kumar', '▁love', '▁to', '▁learn', '▁about', '▁the', '▁technology']


In [32]:
token_ids = tokenizer_2.convert_tokens_to_ids(tokens)
token_ids

[35, 569, 3069, 3207, 14851, 564, 22, 1184, 75, 18, 913]

In [33]:
tokenizer_2.decode(3207), tokenizer_2.decode(22)

('van', 'to')

- Special tokens.

### Hugging face and Pytorch

In [34]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [35]:
print(text)
print(ids)


I am Pavan Kumar love to learn about the technology
{'input_ids': [101, 1045, 2572, 6643, 6212, 9600, 2293, 2000, 4553, 2055, 1996, 2974, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [36]:
ids_points = tokenizer(text, return_tensors = "pt")



In [37]:
print(ids_points['input_ids'].shape)

torch.Size([1, 13])


In [38]:
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

In [39]:
import torch

In [None]:
with torch.no_grad():
    logits = model(ids_points['input_ids']).logits

predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]


### Saving the model

In [None]:
model_directory = "my_trained_model"

In [None]:
tokenizer.save_pretrained(model_directory)
model.save_pretrained(model_directory)

### Loading the saved model.

In [None]:
my_tokenizer = AutoModel.SequenceClassification.from_pretrained(model_directory)
my_model = AutoModelForSequenceClassification.from_pretrained(model_directory)