<a href="https://colab.research.google.com/github/R0b0t-Maker/LLM-T/blob/main/Transformers_T1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* Transformers is a Python library designed to simplify the process of downloading and training cutting-edge machine learning models.

* While it was originally created for building language models, its capabilities have grown to encompass models for computer vision, audio processing, and more.

* The easiest way to start using the library is via the pipeline() function, which abstracts NLP (and other) tasks into 1 line of code.

For example, when performing sentiment analysis using Transformers, here's a general approach:

1. Select a Model: Choose a pre-trained model suitable for sentiment analysis, such as BERT or RoBERTa.
2. Tokenize the Input Text: Use the tokenizer associated with the chosen model to convert the input text into tokens that the model can understand.
3. Pass Through the Model: Input the tokenized text into the pre-trained model to obtain numerical representations (embeddings) of the text.
4. Decode the Output: Interpret the numerical output to determine the sentiment label, typically by applying a classification layer or thresholding on the model's output.

Here's how you can implement it in Python using the Transformers library:

In [6]:
from transformers import pipeline

# Perform sentiment analysis
result = pipeline(task="sentiment-analysis")("Love this!")

# Print the result
print("Predicted Sentiment:", result[0]['label'])


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Predicted Sentiment: POSITIVE


In the above example code, since we did not specify a model, the default model for sentiment analysis was used (i.e. distilbert-base-uncased-finetuned-sst-2-english).

 However, if we wanted to be more explicit, we could have used the following line of code.

In [7]:
pipeline(task="sentiment-analysis",
        model='distilbert-base-uncased-finetuned-sst-2-english')("Love this!")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9998745918273926}]

In [5]:
from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load pre-trained model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Input text
input_text = "I loved the movie! It was fantastic."

# Tokenize input text
inputs = tokenizer(input_text, return_tensors="pt")

# Pass through the model
outputs = model(**inputs)

# Decode the output
predicted_class = torch.argmax(outputs.logits).item()
print (predicted_class)
sentiment_label = "positive" if predicted_class == 1 else "negative"
print("Predicted Sentiment:", sentiment_label)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


1
Predicted Sentiment: positive
