In [5]:
import torch
import pandas as pd
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

In [6]:
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'pre_classifier.bias', 'classifier.

In [7]:
def get_sentiment_scores(comments):
    encoded_comments = tokenizer(comments, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        logits = model(**encoded_comments).logits
    probabilities = torch.softmax(logits, dim=1)
    positive_scores = probabilities[:, 1].tolist()
    return positive_scores

In [11]:
data = pd.read_csv('APPL_tweets.csv')
comments = data['Tweet'].tolist()

In [12]:
sentiment_scores = get_sentiment_scores(comments)
data['sentiment_score'] = sentiment_scores

In [13]:
data.head()

Unnamed: 0.1,Unnamed: 0,Time,User,Tweet,sentiment_score
0,0,2023-07-27 09:51:51+00:00,_li_glass_,@appl_esir 画这么牛逼……,0.52826
1,1,2023-07-27 09:46:24+00:00,kz_train,@appl_morning1 お疲れ様でした😅\n自分も若干疲れが出ました…( -.-) =...,0.522734
2,2,2023-07-27 09:45:30+00:00,youyou_appl,ガヴィルさんも文面のシンプルさ以上に強くなりそうで嬉しいな,0.533037
3,3,2023-07-27 09:44:04+00:00,youyou_appl,アビサルまだ上があるのか……\nまだ海イベあるだろうし楽しみになってきたな,0.531597
4,4,2023-07-27 09:24:48+00:00,APPL_RNP,「こんな時でもお腹は空くんだなぁ(夜飯作ってる)」,0.519389


In [14]:
threshold = 0.5
positive_sentiment = data['sentiment_score'] > threshold