Popular transformer models used for sentiment analysis include BERT, GPT, and RoBERTa. Researchers and practitioners often leverage pre-trained models released by organizations like OpenAI, Google, and others, and fine-tune them on specific sentiment analysis datasets for improved task performance. Additionally, transfer learning allows these models to generalize well to different domains and languages, making them versatile for sentiment analysis applications.

In [None]:
!pip install transformers



In [None]:
from transformers import pipeline
sentimental_analysis = pipeline("sentiment-analysis")
data = ["she is not very good and not even beautiful"]

print(sentimental_analysis(data))

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'NEGATIVE', 'score': 0.9997904896736145}]


In [None]:
# optamized model

In [None]:
specific_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")
specific_model(data)

config.json:   0%|          | 0.00/949 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/540M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/338 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/843k [00:00<?, ?B/s]

bpe.codes:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/22.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/167 [00:00<?, ?B/s]

emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0


[{'label': 'NEG', 'score': 0.9768897294998169}]

VADER: It's important to note that VADER is a lexicon and rule-based sentiment analysis tool, and it doesn't require training on specific datasets. Instead, it relies on a pre-built lexicon of words and their associated sentiment scores, as well as rules to handle combinations of words and punctuation that convey sentiment.

In Python, the nltk library provides an implementation of the VADER sentiment analysis tool, making it accessible for developers working on NLP projects.

In [None]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Download Vader lexicon if not already downloaded
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [None]:
# Function to analyze sentiment of provided text using Vader
def analyze_text_sentiment_vader(text):
    sid = SentimentIntensityAnalyzer()
    sentiment_scores = sid.polarity_scores(text)

    # Determine sentiment based on compound score
    if sentiment_scores['compound'] >= 0.05:
        sentiment = "Positive"
    elif sentiment_scores['compound'] <= -0.05:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"

    print(f"Text: {text}")
    print(f"Sentiment: {sentiment}")

# Replace 'text_to_analyze' with your own text for analysis
text_to_analyze = 'She is not very good and not even beautiful.'
analyze_text_sentiment_vader(text_to_analyze)

Text: She is not very good and not even beautiful.
Sentiment: Negative


Algorithm sentiment analysis

In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.29.0-py2.py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
Collecting importlib-metadata<7,>=1.4 (from streamlit)
  Downloading importlib_metadata-6.11.0-py3-none-any.whl (23 kB)
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.40-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.6/190.6 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl 

In [None]:
!pip install newspaper3k

Collecting newspaper3k
  Downloading newspaper3k-0.2.8-py3-none-any.whl (211 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.1/211.1 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Collecting cssselect>=0.9.2 (from newspaper3k)
  Downloading cssselect-1.2.0-py2.py3-none-any.whl (18 kB)
Collecting feedparser>=5.2.1 (from newspaper3k)
  Downloading feedparser-6.0.11-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.3/81.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tldextract>=2.0.1 (from newspaper3k)
  Downloading tldextract-5.1.1-py3-none-any.whl (97 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.7/97.7 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting feedfinder2>=0.0.4 (from newspaper3k)
  Downloading feedfinder2-0.0.4.tar.gz (3.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting jieba3k>=0.35.1 (from newspaper3k)
  Downloading jieba3k-0.35.1.zip (

In [None]:
!pip install textblob



In [None]:
import pandas as pd
data = pd.read_excel("/content/Positive and Negative Word List.xlsx")

In [None]:
data.duplicated().sum()

0

In [None]:
data.isnull().sum()

Unnamed: 0                   0
Negative Sense Word List     1
Positive Sense Word List    25
dtype: int64

In [None]:
negative = data["Negative Sense Word List"].dropna()
positive = data["Positive Sense Word List"].dropna()

In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

In [None]:
# Download NLTK datasets (run once)
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
# Sample positive and negative words dataset
positive_words = positive  # Add your positive words
negative_words = negative  # Add your negative words

In [None]:
# Function to perform sentiment analysis
def sentiment_analysis(text):
    tokens = word_tokenize(text.lower())  # Tokenize and convert to lowercase
    tokens = [word for word in tokens if word.isalpha()]  # Remove non-alphabetic tokens
    tokens = [word for word in tokens if word not in stopwords.words('english')]  # Remove stopwords

    positive_count = sum(1 for word in tokens if word in positive_words)
    negative_count = sum(1 for word in tokens if word in negative_words)

    if positive_count > negative_count:
        return "Positive"
    elif negative_count > positive_count:
        return "Negative"
    else:
        return "Neutral"

In [None]:
# Example text for analysis
example_text = "she is not very good and not even beautiful"

In [None]:
# Perform sentiment analysis on the example text
sentiment = sentiment_analysis(example_text)
print("Sentiment:", sentiment)

Sentiment: Neutral


using emojis

In [None]:
!pip install emoji

Collecting emoji
  Downloading emoji-2.9.0-py2.py3-none-any.whl (397 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/397.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/397.5 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m368.6/397.5 kB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m397.5/397.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: emoji
Successfully installed emoji-2.9.0


In [None]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax
import emoji

# Load pre-trained BERT model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Example text with emojis
text = "I love this product! 😍"

# Tokenize and encode the text
inputs = tokenizer(text, return_tensors='pt', truncation=True)

# Forward pass through the model
outputs = model(**inputs)

# Get probabilities for each class (e.g., positive, negative)
probs = softmax(outputs.logits, dim=1).detach().numpy()[0]

# Interpret results
sentiment_labels = ['Negative', 'Neutral', 'Positive']
predicted_sentiment = sentiment_labels[probs.argmax()]

# Map sentiment labels to emojis
emoji_mapping = {
    'Negative': '😠',
    'Neutral': '😐',
    'Positive': '😊'
}

# Get the corresponding emoji for the predicted sentiment
predicted_emoji = emoji_mapping.get(predicted_sentiment, '')

print(f"Predicted Sentiment: {predicted_sentiment} {predicted_emoji}")
print(f"Sentiment Probabilities: {probs}")


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Predicted Sentiment: Negative 😠
Sentiment Probabilities: [0.51927173 0.48072833]


In [None]:
!pip install tweepy



In [None]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax
import emoji
import pandas as pd

# Load pre-trained BERT model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Example CSV file with a 'text' column
csv_file_path = '/content/tweet_data.csv'
df = pd.read_csv(csv_file_path)

# Process each text entry in the CSV file
for index, row in df.iterrows():
    # Example text with emojis
    text = row['tweet']

    # Tokenize and encode the text
    inputs = tokenizer(text, return_tensors='pt', truncation=True)

    # Forward pass through the model
    outputs = model(**inputs)

    # Get probabilities for each class (e.g., positive, negative)
    probs = softmax(outputs.logits, dim=1).detach().numpy()[0]

    # Interpret results
    sentiment_labels = ['Negative', 'Neutral', 'Positive']
    predicted_sentiment = sentiment_labels[probs.argmax()]

    # Map sentiment labels to emojis
    emoji_mapping = {
        'Negative': '😠',
        'Neutral': '😐',
        'Positive': '😊'
    }

    # Get the corresponding emoji for the predicted sentiment
    predicted_emoji = emoji_mapping.get(predicted_sentiment, '')

    print(f"Text: {text}")
    print(f"Predicted Sentiment: {predicted_sentiment} {predicted_emoji}")
    print(f"Sentiment Probabilities: {probs}")
    print("\n")


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Text: @ditesh haha I'm unsure what I can deliver for Foss.my. I'm not using alot of opensource software ( Hail Adobe for being expensive  )
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.32657325 0.6734268 ]


Text: glues not coming off. it is sooo irritating 
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.36427847 0.6357215 ]


Text: Cannot believe she is awake at 6AM on a tuesday. *yawn* And I had a bad dream. Bummer 
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.31618333 0.68381673]


Text: @sentricmusic ...suffice to say their offer was ignored. Then EMI.com launched and they all laughed rather a lot... 
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.31009197 0.689908  ]


Text: feeling low today 
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.41562364 0.58437634]


Text: Bored&amp;tired. Miss the stay-back time 
Predicted Sentiment: Neutral 😐
Sentiment Prob

KeyboardInterrupt: ignored

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

In [None]:
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
from torch.nn.functional import softmax
import emoji
import pandas as pd

# Load pre-trained RoBERTa model and tokenizer
# model_name = 'ardiffnlp/twitter-roberta-base-sentiment'
# tokenizer = RobertaTokenizer.from_pretrained(model_name)
# model = RobertaForSequenceClassification.from_pretrained(model_name)

roberta = "cardiffnlp/twitter-roberta-base-sentiment"

model = AutoModelForSequenceClassification.from_pretrained(roberta)
tokenizer = AutoTokenizer.from_pretrained(roberta)

# Example CSV file with a 'tweet' column
csv_file_path = '/content/tweet_data.csv'
df = pd.read_csv(csv_file_path)

# Process each text entry in the CSV file
for index, row in df.iterrows():
    # Example text with emojis
    text = row['tweet']

    # Tokenize and encode the text
    inputs = tokenizer(text, return_tensors='pt', truncation=True)

    # Forward pass through the model
    outputs = model(**inputs)

    # Get probabilities for each class (e.g., positive, negative)
    probs = softmax(outputs.logits, dim=1).detach().numpy()[0]

    # Interpret results
    sentiment_labels = ['Negative', 'Neutral', 'Positive']
    predicted_sentiment = sentiment_labels[probs.argmax()]

    # Map sentiment labels to emojis
    emoji_mapping = {
        'Negative': '😠',
        'Neutral': '😐',
        'Positive': '😊'
    }

    # Get the corresponding emoji for the predicted sentiment
    predicted_emoji = emoji_mapping.get(predicted_sentiment, '')

    print(f"Text: {text}")
    print(f"Predicted Sentiment: {predicted_sentiment} {predicted_emoji}")
    print(f"Sentiment Probabilities: {probs}")
    print("\n")


config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Text: ooooh.... LOL  that leslie.... and ok I won't do it again so leslie won't  get mad again 
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.20825444 0.4729253  0.31882024]


Text: Meh... Almost Lover is the exception... this track gets me depressed every time. 
Predicted Sentiment: Negative 😠
Sentiment Probabilities: [0.95390856 0.04078398 0.00530738]


Text: some1 hacked my account on aim  now i have to make a new one
Predicted Sentiment: Negative 😠
Sentiment Probabilities: [0.7788862  0.20941688 0.01169692]


Text: @alielayus I want to go to promote GEAR AND GROOVE but unfornately no ride there  I may b going to the one in Anaheim in May though
Predicted Sentiment: Neutral 😐
Sentiment Probabilities: [0.08944464 0.67448825 0.2360671 ]


Text: thought sleeping in was an option tomorrow but realizing that it now is not. evaluations in the morning and work in the afternoon! 
Predicted Sentiment: Neutral 😐
Sen