# Sentiment Analysis Model

Objective : To create a model to analyze sentiments with the Huggingface transformers library.

Some information about Hugging Face : Hugging Face is an AI startup focused on natural language processing (NLP). Hugging Face’s main focus is developing and open-sourcing pretrained NLP models like BERT, GPT-2, and T5. Hugging Face’s models and libraries are 100% free, open-source, and easy to use.
Hugging Face also created the Transformers library, which makes it easy to use their models for common NLP tasks. With just a few lines of code, anyone can tap into the power of state-of-the-art NLP.

Hugging face can be used for :
Text generation, Text classification, Question answering, Summarization,Translation, Sentiment analysis and Search.

Reference => https://huggingface.co/models and https://huggingface.co/docs

In [None]:
from transformers import pipeline

In [None]:
sentiment_classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [None]:
sentiment_classifier("I'm so excited to be learning about Artifiacial Intelligence")

[{'label': 'POSITIVE', 'score': 0.9996956586837769}]

In [None]:
ner = pipeline("ner", model = "dslim/bert-base-NER")

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [None]:
zeroshot_classifier = pipeline("zero-shot-classification", model = "facebook/bart-large-mnli")

Device set to use cpu


In [None]:
sequence_to_classify = "one day I will cook Italian food"
candidate_labels = ['travel', 'cooking', 'dancing']

In [None]:
zeroshot_classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will cook Italian food',
 'labels': ['cooking', 'travel', 'dancing'],
 'scores': [0.98698890209198, 0.006510802078992128, 0.006500298157334328]}

## Pre-trained Tokenizers

In [None]:
from transformers import AutoTokenizer

In [None]:
model = "bert-base-uncased"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model)

In [None]:
sentence = "one day I will see the world"

In [None]:
input_ids = tokenizer(sentence)
print(input_ids)

{'input_ids': [101, 2028, 2154, 1045, 2097, 2156, 1996, 2088, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [None]:
tokens = tokenizer.tokenize(sentence)

In [None]:
print(tokens)

['one', 'day', 'i', 'will', 'see', 'the', 'world']


In [None]:
token_ids = tokenizer.convert_tokens_to_ids(tokens)

In [None]:
print(token_ids)

[2028, 2154, 1045, 2097, 2156, 1996, 2088]


In [None]:
decoded_ids = tokenizer.decode(token_ids)
print(decoded_ids)

one day i will see the world


In [None]:
tokenizer.decode(101)

'[CLS]'

In [None]:
tokenizer.decode(102)

'[SEP]'

In [None]:
model2 = "xlnet-base-cased"

In [None]:
tokenizer2 = AutoTokenizer.from_pretrained(model2)

In [None]:
input_ids = tokenizer2(sentence)

In [None]:
print(input_ids)

{'input_ids': [65, 191, 35, 53, 197, 18, 185, 4, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [None]:
tokens = tokenizer2.tokenize(sentence)
print(tokens)

['▁one', '▁day', '▁I', '▁will', '▁see', '▁the', '▁world']


In [None]:
token_ids = tokenizer2.convert_tokens_to_ids(tokens)
print(token_ids)

[65, 191, 35, 53, 197, 18, 185]


In [None]:
tokenizer2.decode(4)

'<sep>'

In [None]:
tokenizer2.decode(3)

'<cls>'

## Huggingface and Pytorch/Tensorflow

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [None]:
print(sentence)
print(input_ids)

one day I will see the world
{'input_ids': [65, 191, 35, 53, 197, 18, 185, 4, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

In [None]:
input_ids_pt = tokenizer(sentence, return_tensors ="pt")
print(input_ids_pt)

{'input_ids': tensor([[ 101, 2028, 2154, 1045, 2097, 2156, 1996, 2088,  102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}


In [None]:
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

In [None]:
with torch.no_grad():
    logits = model(**input_ids_pt).logits

predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

'POSITIVE'

## Saving and loading models

In [None]:
model_directory = "my_saved_models"

In [None]:
tokenizer.save_pretrained(model_directory)

('my_saved_models/tokenizer_config.json',
 'my_saved_models/special_tokens_map.json',
 'my_saved_models/vocab.txt',
 'my_saved_models/added_tokens.json',
 'my_saved_models/tokenizer.json')

In [None]:
model.save_pretrained(model_directory)

In [None]:
my_tokenizer = AutoTokenizer.from_pretrained(model_directory)

In [None]:
my_model = AutoModelForSequenceClassification.from_pretrained(model_directory)