# Installing and Importing Transformer

In [1]:
pip install transformers

Collecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m47.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m110.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m60.4 MB/s[0m eta [36m0:00:

## Using simple pipeline() function to predict the sentiment

The pipeline() downloads and caches a default pretrained model and tokenizer for sentiment analysis.

In [2]:
from transformers import pipeline
model=pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


Using the model to predict the sentiment for single as well as multiple inputs

In [9]:
print(model("I like to play violin !")) #Single input

res=model(['The whether is great today','The clothes smell bad!']) #List of input
for i in res:
  print(f"label: {i['label']},score: {i['score']}")

[{'label': 'POSITIVE', 'score': 0.9995766282081604}]
label: POSITIVE,score: 0.9998531341552734
label: NEGATIVE,score: 0.9997982382774353


# Using another model and tokenizer in the pipeline() function

In [16]:
from transformers import AutoTokenizer,TFAutoModelForSequenceClassification
model_name="nlptown/bert-base-multilingual-uncased-sentiment"
model=TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer=AutoTokenizer.from_pretrained(model_name)

Downloading (…)lve/main/config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/670M [00:00<?, ?B/s]

Some layers from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


Downloading (…)okenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

The tokenizer returns a dictionary containing:

**input_ids**: numerical representations of your tokens.

**attention_mask**: indicates which tokens should be attended to.

In [19]:
encoding=tokenizer("My Hero Acamedia is the best anime!")
print(encoding)

{'input_ids': [101, 11153, 20837, 12181, 23054, 14302, 10127, 10103, 11146, 16665, 106, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


Passing The given model and the tokenizer and predicting the sentiment

In [18]:
classifier=pipeline("sentiment-analysis",model=model,tokenizer=tokenizer)
classifier(["I love to watch anime","I hate to read long mails"])

[{'label': '5 stars', 'score': 0.6129897832870483},
 {'label': '1 star', 'score': 0.33172473311424255}]

In [23]:
tf_batch = tokenizer(
    ["I love to watch anime","I hate to read long mails"],
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="tf",
)

In [24]:
print(tf_batch)

{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[  101,   151, 11157, 10114, 20367, 16665,   102,     0,     0],
       [  101,   151, 39487, 10114, 18593, 11134, 19385, 10107,   102]],
      dtype=int32)>, 'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}
