In [None]:
The line `!unzip -o sentiment_transfer_learning_tensorflow.zip` is a command that is being run in the Jupyter notebook.

The `!` symbol at the beginning is used in Jupyter notebooks to run shell commands.

`unzip` is a utility that helps you list, test and extract compressed ZIP archives.

The `-o` option is used to overwrite files without prompting.

`sentiment_transfer_learning_tensorflow.zip` is the name of the zip file that is being unzipped.

So, this line is unzipping the file `sentiment_transfer_learning_tensorflow.zip` and overwriting any existing files with the same name without asking for confirmation.

In [1]:
!unzip -o sentiment_transfer_learning_tensorflow.zip

Archive:  sentiment_transfer_learning_tensorflow.zip
  inflating: sentiment_transfer_learning_tensorflow/tokenizer_config.json  
  inflating: sentiment_transfer_learning_tensorflow/special_tokens_map.json  
  inflating: sentiment_transfer_learning_tensorflow/config.json  
  inflating: sentiment_transfer_learning_tensorflow/tokenizer.json  
  inflating: sentiment_transfer_learning_tensorflow/vocab.txt  
  inflating: sentiment_transfer_learning_tensorflow/tf_model.h5  


In [2]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

In [3]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("./sentiment_transfer_learning_tensorflow/")

# Load model
loaded_model = TFAutoModelForSequenceClassification.from_pretrained('./sentiment_transfer_learning_tensorflow/')

Some layers from the model checkpoint at ./sentiment_transfer_learning_tensorflow/ were not used when initializing TFBertForSequenceClassification: ['dropout_113']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at ./sentiment_transfer_learning_tensorflow/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [4]:
logitpreds = loaded_model(tokenizer(["He is useless, I dont know why he came to our neighbourhood",
                                     "That guy is well", "He is such a retard"],
                                    return_tensors="np",padding=True,truncation=True))['logits']

print(logitpreds)

tf.Tensor(
[[-0.42841133  0.23808081]
 [-0.45384634  0.21313922]
 [-0.39815122  0.22506994]], shape=(3, 2), dtype=float32)


In [5]:
import tensorflow as tf
import numpy as np
probabilities = tf.nn.softmax(logitpreds).numpy()
predictions = np.argmax(probabilities, axis=1)
print(predictions)

[1 1 1]


In [6]:
predict_score_and_class_dict = {0: 'Negative', 1: 'Positive'}

import numpy as np
for pred in predictions:
    print(predict_score_and_class_dict[pred])

Positive
Positive
Positive


In [7]:
def predict_sentiment(text):
    # Process the text using the loaded tokenizer
    tokens = tokenizer(
        [text],
        return_tensors="tf",
        padding=True,
        truncation=True
    )

    # Get the model predictions
    preds = loaded_model(tokens)['logits']
    class_pred = np.argmax(preds, axis=1)[0]

    # Return the predicted sentiment label
    return predict_score_and_class_dict[class_pred]

In [8]:
import pandas as pd
df=pd.read_csv("unlabeled_data_cleaned.csv", encoding='latin-1')
df

Unnamed: 0,text
0,q be ready anons - public awakening coming - q...
1,enough is enough retruth
2,sjustthenewscompolitics-policyall-things-trump...
3,stmerealxreport
4,cecebloomwood
...,...
739728,bob lighthizer did a great job for america sww...
739729,the time to stand up to this growing tyranny i...
739730,swwwmiamiheraldcomnewspolitics-governmentartic...
739731,swwwfoxnewscompoliticstrump-loves-the-idea-of-...


In [9]:
df['text'] = df['text'].astype(str)

Producing a smal subset of the data to test the model on

In [12]:
df_sub = df.sample(frac=0.05, random_state=100)

In [13]:
df_sub['result'] = df_sub['text'].apply(predict_sentiment)

In [14]:
 df_sub.to_csv('sentiment_transfer_learning_tensorflow.csv', index=False)

In [15]:
df_sub

Unnamed: 0,text,result
304329,syoutubecomwatchvkjhd_-boandfeatureshare,Positive
170013,its safer for a black woman to get an abortion...,Positive
389105,the radical left democrat prosecutors are ille...,Positive
727306,zzzzz,Positive
687490,true,Positive
...,...,...
522117,if god calls emoji telephone_receiver you to d...,Positive
195823,this is big sconservativebriefcomplea-utm_sour...,Positive
573835,i like trump and nikki haley save desantis for...,Positive
27482,godwins faith truthsocial trump covfefe inflat...,Positive


Inspecting the results

In [18]:
# show the distribution of the predicted sentiment as absolute numbers
df_sub['result'].value_counts()




result
Positive    36950
Negative       37
Name: count, dtype: int64