# Sentiment Analysis

Here we demonstrate sentiment analysis on a group of student feedback comments. This involves classifying the comments as 'positive', 'negative', or 'neutral'. We'll also show an even finer-grained sorting from most positive to most negative, along with some nice color-coded output. 

Behind the scenes, if you are interested, we're using the model's logprobs (what were its first, second, third, etc. choices for the sentiment output token, and how confident was it in each of those possible choices) to get a sense of how confident the model was in its prediction and what else it considered. gpt-4 (post-RLHF, what we have access to) is somewhat well calibrated (it knows what it knows...kind of like metacognition for a model) and using the logprobs relies on this type of confidence.

## Imports and setup

In [40]:
import pandas as pd
import json
from typing import Any
from IPython.display import HTML
from functools import partial
from pprint import pprint
from pathlib import Path
from dotenv import load_dotenv, find_dotenv
from survey_analysis.sentiment_analysis import (
    SentimentAnalysis, 
    SentimentAnalysisResult, 
    classify_sentiment,
    sort_by_confidence,
)
from survey_analysis.models_common import CommentModel, LLMConfig
from survey_analysis.single_input_task import apply_task_with_logprobs
from survey_analysis.batch_runner import process_tasks

In [3]:
# this makes it more robust to run async tasks inside an already async environment (jupyter notebooks)
import nest_asyncio
nest_asyncio.apply()

Make sure to either set `OPENAI_API_KEY` as an environment variable or put it in a .env file and use the following cell to load the env var. The format in the .env file is:
```
OPENAI_API_KEY=yourKeyGoesHere
```

In [4]:
load_dotenv(find_dotenv())

True

In [5]:
%load_ext autoreload
%autoreload 2

This is a convenience function to make seeing Pandas dataframe values easier, especially when there are long strings like the student comments we will be using.

In [6]:
def full_show(df):
    with pd.option_context('display.max_columns', None, 'display.max_rows', None, 'display.max_colwidth', None):
        display(df)

## Load the example data

In [7]:
data_path = Path('../data/example_data')

Let's load up some fake data. 

All of these comments are synthetic to avoid sharing any sensitive or PII information, but they should work great for illustration purposes. There are 100 rows, with just a few null/nan values here and there for realism. In most surveys I've seen, there are quite a number of null/None/blank etc values, and the functions are written to handle those.

In [8]:
example_survey = pd.read_csv(data_path / 'example_survey_data_synthetic.csv')
full_show(example_survey.head())

Unnamed: 0,best_parts,enhanced_learning,improve_course
0,I valued the practical clinical aspects related to immune-related disorders and their management.,The illustrative visuals and straightforward explanatory clips.,Consider reducing the duration of certain videos. A few appeared to be slightly prolonged.
1,The flexibility to learn at a self-determined speed,The opportunity to review the lecture content,"The pace of some lectures could be slowed down. At times, it's challenging to follow the lecturer's speech or decipher their handwriting."
2,The educational content was extremely enriching and stimulating! The section on oncology was the highlight.,the self-assessment activities.,Nothing specific comes to mind.
3,Professional growth within the medical sector,"The practical integration workshops were highly beneficial, they significantly contributed to a deeper comprehension of the theories and their implementation in a healthcare environment.",Incorporating a few advanced projects as optional tasks could benefit learners who wish to delve deeper into the subject matter. These projects wouldn't need to influence exam scores.
4,The highlights of the class included the practical demonstration clips that made the complex biological principles more understandable by connecting them to daily well-being and actions. This connection was incredibly beneficial as I navigated the course content.,"The aspect of the course that most facilitated my learning was the regular assessments provided at each segment, which helped confirm my grasp of the material presented. These checkpoints effectively guided me in the correct learning direction. It's evident that considerable effort was invested in designing these educational modules to enable students to gain a deep comprehension rather than just a superficial understanding of the subject matter.","Extend the duration of the concept videos for the more challenging topics, as they require a deeper dive to fully grasp the intricacies involved. Additionally, consider introducing an additional educator to the mix. The dynamic of having multiple voices in another subject area is quite engaging, and it would be beneficial to replicate that experience in this subject to prevent monotony from setting in with just one instructor."


We'll also load up some Coursera comments (source is from [this Kaggle notebook](https://www.kaggle.com/datasets/imuhammad/course-reviews-on-coursera), just using the first 200 as an example. The included example dataset is just the first 200 rows of the full 1.45 million rows. I didn't include the full set so as not to blimp up the size of this repo.

In [62]:
coursera_survey = pd.read_csv(data_path / 'coursera_survey_200rows.csv', nrows=200)
full_show(coursera_survey.head())

Unnamed: 0,reviews,reviewers,date_reviews,rating,course_id
0,"Pretty dry, but I was able to pass with just two complete watches so I'm happy about that. As usual there were some questions on the final exam that were NO WHERE in the course, which is annoying but far better than many microsoft tests I have taken. Never found the suplimental material that the course references... but who cares... i passed!",By Robert S,"Feb 12, 2020",4,google-cbrs-cpi-training
1,would be a better experience if the video and screen shots would sho on the side of the text that the instructor is going thru so that user does not have to go all the way to beginning of text to be able to view any slides instructor is showing.,By Gabriel E R,"Sep 28, 2020",4,google-cbrs-cpi-training
2,Information was perfect! The program itself was a little annoying. I had to wait 30 to 45 minutes after watching the videos to to take the quiz. Other than that the information was perfect and passed the test with no issues!,By Jacob D,"Apr 08, 2020",4,google-cbrs-cpi-training
3,A few grammatical mistakes on test made me do a double take but all in all not bad.,By Dale B,"Feb 24, 2020",4,google-cbrs-cpi-training
4,Excellent course and the training provided was very detailed and easy to follow.,By Sean G,"Jun 18, 2020",4,google-cbrs-cpi-training


## Sentiment analysis

### Running

Let's run sentiment analysis on a single comment to see how it works. Notice that we record the model's reasoning as well to get a sense of how the results were arrived at.

In [139]:
survey_task = SentimentAnalysis(question="What could be improved about the course?")
task_input = CommentModel(comment=example_survey.iloc[0]['improve_course'])
llm_config = LLMConfig(logprobs=True, top_logprobs=3)

sample_sentiment = await apply_task_with_logprobs(task_input=task_input,
                                                  get_prompt=survey_task.prompt_messages,
                                                  result_class=survey_task.result_class,
                                                  llm_config=llm_config)

print(f'Student comment: "{task_input.comment}"')
pprint(json.loads(sample_sentiment.model_dump_json(include={'sentiment', 'reasoning'})))

Student comment: "Consider reducing the duration of certain videos. A few appeared to be slightly prolonged."
{'reasoning': 'The comment suggests a specific improvement (reducing the '
              'duration of certain videos) without expressing dissatisfaction '
              'or frustration. It acknowledges that some videos seemed '
              'slightly prolonged but does not convey strong negative feelings '
              'towards the course content or structure.',
 'sentiment': 'neutral'}


You can get a lot more detail from the result, but it's entirely optional - you can also just get the sentiment as above. Just for curiosity, here's some more of the detail. We can see that the top-ranked choice for the model was 'neutral' but next on its list was 'negative', hence we can do some finer-grained classification to call this comment 'neutral-negative', meaning that it's neutral but potentially negative-leaning.

In [140]:
print('logprobs for sentiment token:')
pprint(sample_sentiment.sentiment_logprobs)
print(f'fine-grained sentiment category: {sample_sentiment.fine_grained_sentiment_category}')

logprobs for sentiment token:
[{'linear_prob': 61.44, 'logprob': -0.4870711, 'token': 'neutral'},
 {'linear_prob': 38.45, 'logprob': -0.9558211, 'token': 'negative'},
 {'linear_prob': 0.11, 'logprob': -6.830821, 'token': 'positive'}]
fine-grained sentiment category: neutral-negative


### Sorting/ranking

Next let's run this on a small set of comments and look at sorting/ranking. We'll take 50 comments from the survey question related to 'what were the best parts of the course?'

In [172]:
best_parts_question = "What were the best parts of the course?"
comments = example_survey['best_parts'].tolist()[:50]
sentiment_results = await classify_sentiment(comments=comments, question=best_parts_question)

processing 50 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 12.104737997055054


We'll sort by the finer-grained sentiment. (For those interested, this is the difference in confidence between the top logprob sentiment and the next different sentiment/token, considering the top three only.)

In [175]:
pairs = sort_by_confidence(comments, sentiment_results)
for comment, result in pairs:
    print(f'Student comment: "{comment}"')
    pprint(json.loads(result.model_dump_json(exclude={'logprobs'})))
    print('\n')

Student comment: "The educational content was extremely enriching and stimulating! The section on oncology was the highlight."
{'classification_confidence': {'difference': 'Infinity',
                               'next_token': None,
                               'top_token': 'positive'},
 'fine_grained_sentiment_category': 'positive',
 'reasoning': 'The comment expresses a high level of satisfaction and '
              'enthusiasm towards the educational content, specifically '
              "mentioning it as 'extremely enriching and stimulating'. The "
              'mention of the oncology section as a highlight further '
              'emphasizes the positive experience the student had with the '
              'course material.',
 'sentiment': 'positive',
 'sentiment_logprobs': [{'linear_prob': 100.0,
                         'logprob': 0.0,
                         'token': 'positive'},
                        {'linear_prob': 0.0,
                         'logprob': -23.25,
    

### Displaying results

Create a display helper function for pretty output ranked by sentiment from most positive to most negative

In [176]:
color_map = {'positive': 'MediumSlateBlue', 
            'neutral-positive': 'LightSkyBlue',
            'neutral': 'Snow',
            'neutral-negative': 'LightSalmon',
            'negative': 'Red'}
# this one is a diverging rdbu color map
# color_map = {'positive': '#0571b0', 
#             'neutral-positive': '#92c5de',
#             'neutral': '#f7f7f7',
#             'neutral-negative': '#f4a582',
#             'negative': '#ca0020'}

def colorized_html_output(pairs: tuple[str, SentimentAnalysisResult]) -> Any:
    html_output = ""
    for comment, result in pairs:
        color = color_map[result.fine_grained_sentiment_category]
        html_output += f'<span style="color:{color}; font-size:1.2em">"{comment}"</span><br>'

    return HTML(html_output)

def reference_colors():
    """This is kind of like a legend for the colorized_html_output"""
    html_output = ""
    for category, color in color_map.items():
        html_output += f'<span style="color:{color}; font-size:1.2em">{category}</span><br>'

    return HTML(html_output)

In [177]:
display(reference_colors())

Now let's take a look at the comments in a prettier format sorted from most positive (bluest in this scheme) to most negative (red in this scheme). Given that we are using the 'best parts' responses, one would expect that the vast majority would show up as positive sentiment.

In [179]:
display(colorized_html_output(pairs))

The color-coded version makes it pretty clear that most of the comments came across as positive, given that we were asking 'what were the best parts of the course?'

### Comparing to another survey question

Now let's take a look at the answers to "What could we improve about the course?". We might expect that there would be more comments that skew toward negative sentiment. Let's find out.

In [180]:
improve_course_question = 'What could be improved about the course?'
comments_improve = example_survey['improve_course'].tolist()[:50]
sentiment_results_improve = await classify_sentiment(comments=comments_improve, question=improve_course_question)


processing 50 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 13.374605178833008


Rank the comments using the confidence difference scores within each top sentiment

In [181]:
pairs_improve = sort_by_confidence(comments_improve, sentiment_results_improve)
for comment, result in pairs:
    print(f'Student comment: "{comment}"')
    pprint(json.loads(result.model_dump_json(exclude={'logprobs'})))
    print('\n')

Student comment: "The educational content was extremely enriching and stimulating! The section on oncology was the highlight."
{'classification_confidence': {'difference': 'Infinity',
                               'next_token': None,
                               'top_token': 'positive'},
 'fine_grained_sentiment_category': 'positive',
 'reasoning': 'The comment expresses a high level of satisfaction and '
              'enthusiasm towards the educational content, specifically '
              "mentioning it as 'extremely enriching and stimulating'. The "
              'mention of the oncology section as a highlight further '
              'emphasizes the positive experience the student had with the '
              'course material.',
 'sentiment': 'positive',
 'sentiment_logprobs': [{'linear_prob': 100.0,
                         'logprob': 0.0,
                         'token': 'positive'},
                        {'linear_prob': 0.0,
                         'logprob': -23.25,
    

In [182]:
display(reference_colors())

In [183]:
display(colorized_html_output(pairs_improve))

Sure enough - there are more comments that could be construed as negative, given that's what the "What could be improved about the course?" survey question is prompting for.

## Sentiment analysis for some Coursera comments

These are comments all from one course. There are a lot of duplicate rows in the dataset, for some reason, hence the drop_duplicates.

In [184]:
coursera_review_question = 'What did you think of the course?'
comments_coursera = coursera_survey.iloc[100:200].drop_duplicates()['reviews'].tolist()
sentiment_results_coursera = await classify_sentiment(comments=comments_coursera, question=coursera_review_question)

processing 54 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 21.18484091758728


In [186]:
pairs_coursera = sort_by_confidence(comments_coursera, sentiment_results_coursera)
for comment, result in pairs_coursera:
    print(f'Student comment: "{comment}"')
    print(f'Sentiment: {result.sentiment}')
    print(f'Fine-grained sentiment: {result.fine_grained_sentiment_category}')
    pprint(json.loads(result.model_dump_json(exclude={'logprobs', 'sentiment', 'fine_grained_sentiment_category'})))
    print('\n')

Student comment: "pretty good"
Sentiment: positive
Fine-grained sentiment: positive
{'classification_confidence': {'difference': 15.265625210368999,
                               'next_token': 'neutral',
                               'top_token': 'positive'},
 'reasoning': "The phrase 'pretty good' indicates a level of satisfaction and "
              'approval, suggesting that the student had a generally positive '
              'experience with the course.',
 'sentiment_logprobs': [{'linear_prob': 100.0,
                         'logprob': -7.89631e-07,
                         'token': 'positive'},
                        {'linear_prob': 0.0,
                         'logprob': -14.546876,
                         'token': 'Positive'},
                        {'linear_prob': 0.0,
                         'logprob': -15.265626,
                         'token': 'neutral'}]}


Student comment: "Good"
Sentiment: positive
Fine-grained sentiment: positive
{'classification_confidence': 

In [187]:
display(reference_colors())

In [188]:
display(colorized_html_output(pairs_coursera))

^^ Pretty negative sentiment overall. On a practical level, we can see that the model has classified the comments well.

Let's try it for a different course.

In [189]:
coursera_review_question = 'What did you think of the course?'
comments_coursera2 = coursera_survey.iloc[:86].drop_duplicates()['reviews'].tolist()
sentiment_results_coursera2 = await classify_sentiment(comments=comments_coursera2, question=coursera_review_question)

processing 33 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 12.047163009643555


In [190]:
display(reference_colors())

In [191]:
pairs_coursera2 = sort_by_confidence(comments_coursera2, sentiment_results_coursera2)
display(colorized_html_output(pairs_coursera2))

The comments' sentiment for this course seems much more positive.

We can also turn this into a dataframe for downloading or easier viewing. This is in the same order as the original dataframe, but you can use the sorted pairs and turn those into a dataframe if you want to keep the sort order shown above.

In [195]:
# here we omit the logprobs, as they are quite large
comments_df = pd.DataFrame({'comment': comments_coursera2})
values_df = pd.json_normalize([result.model_dump(exclude={'logprobs'}) for result in sentiment_results_coursera2])
results_df = pd.concat([comments_df, values_df], axis=1)

full_show(results_df.head())

Unnamed: 0,comment,reasoning,sentiment,sentiment_logprobs,fine_grained_sentiment_category,classification_confidence.top_token,classification_confidence.next_token,classification_confidence.difference
0,"Pretty dry, but I was able to pass with just two complete watches so I'm happy about that. As usual there were some questions on the final exam that were NO WHERE in the course, which is annoying but far better than many microsoft tests I have taken. Never found the suplimental material that the course references... but who cares... i passed!","The comment contains both positive and negative elements, leading to a neutral overall sentiment. The student expresses satisfaction with being able to pass the course after watching the material only twice ('I'm happy about that'), which is a positive aspect. However, they also mention several negative points: the course content is described as 'Pretty dry', there are complaints about questions on the final exam not being covered in the course material ('NO WHERE in the course'), and an inability to find supplemental material referenced by the course ('Never found the suplimental material'). The comparison to Microsoft tests ('far better than many microsoft tests I have taken') suggests some level of satisfaction despite the issues mentioned, further supporting a neutral sentiment.",neutral,"[{'token': 'neutral', 'logprob': -0.008209572, 'linear_prob': 99.18}, {'token': 'positive', 'logprob': -5.0394597, 'linear_prob': 0.65}, {'token': 'mixed', 'logprob': -6.9144597, 'linear_prob': 0.1}]",neutral-positive,neutral,positive,5.03125
1,would be a better experience if the video and screen shots would sho on the side of the text that the instructor is going thru so that user does not have to go all the way to beginning of text to be able to view any slides instructor is showing.,"The comment suggests an improvement to the course layout, indicating that the current setup, where videos and screenshots are not easily accessible alongside the text, is inconvenient. The student is providing constructive feedback on how the course could enhance its user experience, rather than expressing satisfaction or dissatisfaction with the course content or instructor. This indicates a neutral sentiment as it focuses on a specific aspect of course design rather than the overall quality or enjoyment of the course.",neutral,"[{'token': 'neutral', 'logprob': 0.0, 'linear_prob': 100.0}, {'token': 'Neutral', 'logprob': -16.75, 'linear_prob': 0.0}, {'token': ' neutral', 'logprob': -20.28125, 'linear_prob': 0.0}]",neutral,neutral,,inf
2,Information was perfect! The program itself was a little annoying. I had to wait 30 to 45 minutes after watching the videos to to take the quiz. Other than that the information was perfect and passed the test with no issues!,"The comment starts and ends on a positive note, emphasizing that the information provided was perfect and that the student passed the test with no issues. Although there is a mention of a negative aspect regarding the program being 'a little annoying' due to waiting times between videos and quizzes, the overall sentiment leans towards positive due to the emphasis on the quality of information and the successful outcome.",positive,"[{'token': 'positive', 'logprob': -0.051660582, 'linear_prob': 94.97}, {'token': 'neutral', 'logprob': -2.9891605, 'linear_prob': 5.03}, {'token': 'negative', 'logprob': -11.645411, 'linear_prob': 0.0}]",positive,positive,neutral,2.9375
3,A few grammatical mistakes on test made me do a double take but all in all not bad.,"The comment suggests a mixed feeling towards the course. The mention of 'a few grammatical mistakes on test' indicates a negative aspect, as it caused confusion or required extra effort ('made me do a double take'). However, the concluding phrase 'but all in all not bad' implies a general satisfaction or acceptance of the course despite the mentioned flaws. The overall sentiment is neither strongly positive nor strongly negative, hence classified as neutral.",neutral,"[{'token': 'neutral', 'logprob': -0.0015089125, 'linear_prob': 99.85}, {'token': 'positive', 'logprob': -6.5015087, 'linear_prob': 0.15}, {'token': 'negative', 'logprob': -12.126509, 'linear_prob': 0.0}]",neutral-positive,neutral,positive,6.5
4,Excellent course and the training provided was very detailed and easy to follow.,"The comment uses words like 'excellent' and 'easy to follow' to describe the course and training, indicating a high level of satisfaction and a positive experience.",positive,"[{'token': 'positive', 'logprob': -1.9361265e-07, 'linear_prob': 100.0}, {'token': 'Positive', 'logprob': -16.28125, 'linear_prob': 0.0}, {'token': 'posit', 'logprob': -20.125, 'linear_prob': 0.0}]",positive,positive,,inf
