# Import Comment Data

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
import pandas as pd
import numpy as np

df = pd.read_csv('sentiment_data.tsv', sep='\t')
df

Unnamed: 0,comment
0,I’m interested in doing tech courses
1,"Not yet, there are many things I need to under..."
2,@berryberrystrawberry2428 Video explanations...
3,@gradehacker I'm conversational by now. I us...
4,Was looking for data analyst related boot camp...
...,...
582,Hey people i want to ask you that what if i co...
583,should i take the third course if i want to be...
584,Is this course out dated now?
585,I just need the best course to help become a m...


# Sentiment Analysis Using RoBERTa

## Import sentiment analyzer

The model used is Facebook's RoBERTa large model that was trained on Multi-Genre Natural Language Inference (MNLI).

In [5]:
from transformers import pipeline

model = pipeline('zero-shot-classification', model='roberta-large-mnli')

RuntimeError: Failed to import transformers.models.roberta.modeling_tf_roberta because of the following error (look up to see its traceback):
Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.

## Define sentiment labels

In [None]:
labels = ['positive', 'negative', 'neutral']

## Classify all sentiments

### Create classifier function

In [None]:
def classify_comment(comment):
    classified = model(comment, labels)
    print(classified)
    return pd.Series({
        'sentiment': classified['labels'][0],
        'confidence interval': classified['scores'][0]
    })

### Classify sentiments

In [None]:
df[['sentiment', 'confidence interval']] = df['comment'].apply(classify_comment)

{'sequence': 'I’m interested in doing tech courses', 'labels': ['positive', 'neutral', 'negative'], 'scores': [0.6811542510986328, 0.24752849340438843, 0.07131725549697876]}
{'sequence': 'Not yet, there are many things I need to understand about this platform? How can we communicate', 'labels': ['negative', 'neutral', 'positive'], 'scores': [0.6512821912765503, 0.2479037046432495, 0.10081414133310318]}
{'sequence': ' @berryberrystrawberry2428  Video explanations but what I have used is very interractive and at the end of each video you can download worksheet with all the words just learned in the video. I have used yonsei university course to learn korean. The courses vary with different topics though.', 'labels': ['positive', 'neutral', 'negative'], 'scores': [0.6697995066642761, 0.2584182620048523, 0.07178216427564621]}
{'sequence': " @gradehacker  I'm conversational by now. I use mix of free resourses including the coursera courses.", 'labels': ['positive', 'neutral', 'negative'], '

## See the final result

In [None]:
df

Unnamed: 0,comment,sentiment,confidence interval
0,I’m interested in doing tech courses,positive,0.681154
1,"Not yet, there are many things I need to under...",negative,0.651282
2,@berryberrystrawberry2428 Video explanations...,positive,0.669800
3,@gradehacker I'm conversational by now. I us...,positive,0.517534
4,Was looking for data analyst related boot camp...,negative,0.884179
...,...,...,...
582,Hey people i want to ask you that what if i co...,neutral,0.581401
583,should i take the third course if i want to be...,neutral,0.377692
584,Is this course out dated now?,negative,0.674804
585,I just need the best course to help become a m...,positive,0.566242


# Get Conclusion

To get the conclusion of the topic, we'll going to get the mode (the value that appears the most). But before that, we're going to remove the sentiment data that are lower than 0.5 of confidence interval. The reason is we want to remove uncertainty from the data.

## Drop data that has confidence interval lower than 0.5

We decided to drop these data points because they indicate a low confidence level in the model's predictions (less than 0.5). Removing them ensures the dataset is cleaner and more reliable for sentiment analysis.

In [None]:
#df = df.drop(df[df['confidence interval'] < 0.5].index)

## Re-check the data

In [None]:
df

Unnamed: 0,comment,sentiment,confidence interval
0,I’m interested in doing tech courses,positive,0.681154
1,"Not yet, there are many things I need to under...",negative,0.651282
2,@berryberrystrawberry2428 Video explanations...,positive,0.669800
3,@gradehacker I'm conversational by now. I us...,positive,0.517534
4,Was looking for data analyst related boot camp...,negative,0.884179
...,...,...,...
578,Hey man where I can learn python intermediate ...,positive,0.634251
582,Hey people i want to ask you that what if i co...,neutral,0.581401
584,Is this course out dated now?,negative,0.674804
585,I just need the best course to help become a m...,positive,0.566242


## Get mode from the column sentiment

In [None]:
sentiment_table = pd.DataFrame(np.array(np.unique(df['sentiment'], return_counts = True)).transpose(), columns=['labels', 'counts'])
sentiment_table = sentiment_table.sort_values('counts', ascending = False).reset_index(drop = True)

Based on the table above, 421 people agree that Coursera is satisfying, 116 people feel neutral about Coursera, and 50 people agree that Coursera is not satisfying, according to the VADER algorithm, out of 587 people.