## UW Pharmacy Student Self-Care Analysis (winter 5_6)

This notebook is a draft using the "winter5_6" page in the student comment data of the 2022-2023 SY in the UW SoP. The analysis roughly looks at the determinants of the categories of self-care students chose for this quarter using VADER and NLTK sentiment analysis and Pandas table manipulation.

Note: this notebook relies on uploading a single csv sheet for one quarter, along with adding two additional columns called "Category 1" and "Category 2" which categorize each comment based on the 8 facets of self-care.

### Setup

In [1]:
#import python libraries
import numpy as np
import pandas as pd

!pip install vaderSentiment
!pip install --upgrade pip
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from textblob import TextBlob
import nltk
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

import spacy

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
  from .autonotebook import tqdm as notebook_tqdm
2023-06-01 04:51:25.186571: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 04:51:25.499456: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object 

In [2]:
# Can change the file csv right here
file_csv = "winter5_6.csv"

data = pd.read_csv(file_csv)

first_column = data.columns[1]
data = data[[first_column, "Category 1", "Category 2"]]

In [3]:
all_dups = pd.DataFrame()
for i in range(len(data)):
    if not pd.isnull(data["Category 2"][i]):
        temp = data[[first_column, "Category 2"]].iloc[i]
        all_dups = all_dups.append(temp)
    else:
        continue
all_dups = all_dups[[first_column, "Category 2"]]
all_dups = all_dups.rename(columns={"Category 2": "Category"})

### Working data

In [4]:
data = data.rename(columns={"Category 1": "Category"})
data = data[[first_column, "Category"]]
data = data.append(all_dups)
data

Unnamed: 0,Win 5_6,Category
0,Took a nap before work since my team this week...,Physical
1,One of the activities that I chose for self-ca...,Community
2,I listened to some of my favorite artists whil...,Emotional
3,I worked on my Paint by Diamond while listenin...,Mental
4,I soaked in the bathtub for an hour with relax...,Physical
...,...,...
94,I went to the gym and had dinner with my frien...,Community
95,This week I spent 20 minutes grocery shopping ...,Environmental
96,This week I wanted to note how I went to the I...,Community
98,I went to a walk around the neighborhood with ...,Community


### Category frequencies within student comments

The comments data is displayed here in terms of each of the 8 self-care categories (physical, mental, community, emotional, environmental, spiritual, and occupational) through a pie chart. 

In [5]:
import plotly.express as px

dfg = data.groupby("Category").count().sort_values(by=first_column, ascending=False)
print("All reflections processed (includes duplicate categories): ", np.sum(dfg[first_column]))

dfg_pie = px.pie(dfg.reset_index(), values='Win 5_6', names='Category', title='Category Frequencies')
dfg_pie

All reflections processed (includes duplicate categories):  161


### Sentiment analysis (positivity polarity score for each comment)

In this section, I scored each of the comments using the VADER sentiment analysis function. The function sentiment_scores(sentence) will return the positivity polarity score for each comment probabilistically between -1 and 1, with -1 being the most negative and 1 being the most positive. Note that this model is untrained and will be a very rough interpretation of the comments, i.e. not trained on what self-care is as a whole but on the general tone of each comment.

In [6]:
# function to print sentiments
# of the sentence (vader)

#taken from geekforgeeks
def sentiment_scores(sentence):
 
    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
 
    # polarity_scores method of SentimentIntensityAnalyzer
    # object gives a sentiment dictionary.
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)

    return sentiment_dict['compound']

In [27]:
data[first_column]

0      Took a nap before work since my team this week...
1      One of the activities that I chose for self-ca...
2      I listened to some of my favorite artists whil...
3      I worked on my Paint by Diamond while listenin...
4      I soaked in the bathtub for an hour with relax...
                             ...                        
94     I went to the gym and had dinner with my frien...
95     This week I spent 20 minutes grocery shopping ...
96     This week I wanted to note how I went to the I...
98     I went to a walk around the neighborhood with ...
101    This week I called a friend whim we haven’t ta...
Name: Win 5_6, Length: 161, dtype: object

In [28]:
scores = pd.DataFrame(columns={"Win 1_2", "Score"})
for i in data[first_column]:
    score = sentiment_scores(i)
    scores = scores.append({first_column: i, "Score": score}, ignore_index=True)

In [8]:
scores_categories = data.merge(scores, on=first_column, how="right")
scores_categories

Unnamed: 0,Win 5_6,Category,Score,Win 1_2
0,Took a nap before work since my team this week...,Physical,0.0000,
1,One of the activities that I chose for self-ca...,Community,0.1531,
2,I listened to some of my favorite artists whil...,Emotional,0.4588,
3,I listened to some of my favorite artists whil...,Physical,0.4588,
4,I worked on my Paint by Diamond while listenin...,Mental,0.3400,
...,...,...,...,...
262,This week I wanted to note how I went to the I...,Community,0.5256,
263,I went to a walk around the neighborhood with ...,Environmental,0.0000,
264,I went to a walk around the neighborhood with ...,Community,0.0000,
265,This week I called a friend whim we haven’t ta...,Community,0.4939,


In [9]:
scores_categories['Category'].unique()

array(['Physical', 'Community', 'Emotional', 'Mental', 'Occupational',
       'Environmental', 'Spiritual'], dtype=object)

In [10]:
fig = px.scatter(scores_categories, y="Score", x="Category", title="Polarity Scores by Category")
mean_scores = scores_categories.groupby('Category').mean()
for c in scores_categories['Category'].unique():
    fig.add_scatter(x=[c],
                    y=[mean_scores.loc[c]['Score']],
                    marker=dict(
                        color='red',
                        size=10
                    ),
                name=f'{c} mean')

fig.show()

In [11]:
px.bar(mean_scores.reset_index().sort_values(by="Score", ascending=False), x="Category", y="Score", title="Average Scores per Category")

### Scores of each category 

In [12]:
print('Physical category table')
scores_categories_physical = scores_categories[scores_categories["Category"] == "Physical"]
scores_categories_physical

Physical category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
0,Took a nap before work since my team this week...,Physical,0.0000,
3,I listened to some of my favorite artists whil...,Physical,0.4588,
6,I soaked in the bathtub for an hour with relax...,Physical,0.7579,
13,"Since the weather has been pretty nice, I got ...",Physical,0.7424,
14,I have been biking 4 miles to school and back ...,Physical,0.9004,
...,...,...,...,...
253,I was able to run at my high school's football...,Physical,0.4215,
255,Go to the gym.,Physical,0.0000,
257,I went to the gym and had dinner with my frien...,Physical,0.4767,
259,This week I spent 20 minutes grocery shopping ...,Physical,0.7003,


In [13]:
print('Mental category table')
scores_categories_mental = scores_categories[scores_categories["Category"] == "Mental"]
scores_categories_mental

Mental category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
4,I worked on my Paint by Diamond while listenin...,Mental,0.34,
10,I had improv rehearsal and it was the first ti...,Mental,0.0,
12,I relaxed and watched some Netflix,Mental,0.4939,
16,The other day I was stressed and tired and fel...,Mental,0.9032,
25,I bought more plants!!!,Mental,0.0,
34,tried a new game in virtual reality,Mental,0.0,
40,"For my self care this week, it took more than ...",Mental,0.975,
46,I like to spend few minutes calming myself dow...,Mental,0.9789,
53,I spent last weekend watching Jurassic Park wi...,Mental,0.0,
60,I went on a walk and listened to a podcast!,Mental,0.0,


In [14]:
print('Community category table')
scores_categories_community = scores_categories[scores_categories["Category"] == "Community"]
scores_categories_community

Community category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
1,One of the activities that I chose for self-ca...,Community,0.1531,
9,Social connectedness:I felted like I needed a ...,Community,0.8807,
11,I had improv rehearsal and it was the first ti...,Community,0.0,
23,I went out to eat at a buffet with a friend to...,Community,0.5411,
30,I enjoyed a coffee with my friend on a sunny d...,Community,0.9186,
38,Aside the 30-45 minutes' walk I do 3-4 times i...,Community,0.4939,
43,I grabbed coffee and had a chat with my mentor,Community,0.0,
45,I reached out to my dad and chatted for a while,Community,0.1027,
48,I went to my friends house for dinner and spen...,Community,0.7579,
50,I have met my best friend's mom for the first ...,Community,0.8225,


In [15]:
print('Emotional category table')
scores_categories_emotional = scores_categories[scores_categories["Category"] == "Emotional"]
scores_categories_emotional

Emotional category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
2,I listened to some of my favorite artists whil...,Emotional,0.4588,
5,I worked on my Paint by Diamond while listenin...,Emotional,0.34,
7,I soaked in the bathtub for an hour with relax...,Emotional,0.7579,
26,Listened to relaxing environmental/raining sou...,Emotional,0.4939,
29,I spent 20 minutes on dancing with my favorite...,Emotional,0.4588,
55,I went for a drive and listened to relaxing mu...,Emotional,0.4939,
65,I decided to listen to some new music. Persona...,Emotional,0.8126,
93,Today I chose to rest and relax by discovering...,Emotional,0.4926,
99,I listened to music and cleaned,Emotional,0.0,
114,This week I needed a lot of grace so for my me...,Emotional,0.128,


In [16]:
print('Spiritual category table')
scores_categories_spiritual = scores_categories[scores_categories["Category"] == "Spiritual"]
scores_categories_spiritual

Spiritual category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
37,Aside the 30-45 minutes' walk I do 3-4 times i...,Spiritual,0.4939,
47,I like to spend few minutes calming myself dow...,Spiritual,0.9789,
68,guided meditation for 20 minutes! felt recente...,Spiritual,0.0,
79,I took a walk at Green Lake Park last Sunday m...,Spiritual,0.5574,
117,I practice yoga and did some other exercises.,Spiritual,0.0,
154,This week I called a friend whim we haven’t ta...,Spiritual,0.4939,
185,Aside the 30-45 minutes' walk I do 3-4 times i...,Spiritual,0.4939,
194,I like to spend few minutes calming myself dow...,Spiritual,0.9789,
220,I took a walk at Green Lake Park last Sunday m...,Spiritual,0.5574,
242,I practice yoga and did some other exercises.,Spiritual,0.0,


In [17]:
print('Occupational category table')
scores_categories_occupational = scores_categories[scores_categories["Category"] == "Occupational"]
scores_categories_occupational

Occupational category table


Unnamed: 0,Win 5_6,Category,Score,Win 1_2
8,I Focused on Resilience and growth mindset. I ...,Occupational,0.6326,
72,Nurture connection - I spent time and ate out ...,Occupational,0.6705,
121,In the spirit of being transparent I have not ...,Occupational,0.9077,
214,Nurture connection - I spent time and ate out ...,Occupational,0.6705,
244,In the spirit of being transparent I have not ...,Occupational,0.9077,


### Frequency of nouns and verbs 

In [18]:
!pip install spacy -q
!python -m spacy download en_core_web_sm -q

from collections import Counter
import en_core_web_sm

2023-06-01 04:51:47.128984: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 04:51:47.386979: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-06-01 04:51:47.387017: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-06-01 04:51:47.424820: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-01 04:51:48.954315: W tensorflow/stream_executor/platform/de

Note: Just for the sake of experimentation, I only did this part on comments categorized as "physical." Can easily broaden it to more categories if useful.

This section takes the top 100 ('top' meaning most frequent) words from a particular category along with its count within the comments. Again, note that there may be duplicates depending on if the comment was repeated if it was categorized twice. I then created a function to extract nouns and verbs within that category. This can potentially be helpful if we were to see which action within each category students tended to choose.

In [19]:
def fig_frequencies(tbl, keyword):
    counter = Counter(" ".join(tbl[first_column]).split()).most_common(100)
    array = []
    for c in counter:
        array = np.append(array, c[0])

    sentence = ','.join(array)

    def extract_nouns_verbs(sentence):
        text = nltk.word_tokenize(sentence)
        pos_tagged = nltk.pos_tag(text)
        nouns_verbs = filter(lambda x:x[1]=='NN' or x[1] == 'VB',pos_tagged)
        return list(nouns_verbs)

    nouns_verbs = pd.DataFrame(extract_nouns_verbs(sentence), columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency = nouns_verbs.merge(counted_words, on='Word', how='left').sort_values(by='Word frequency', ascending=False)

    nlp = en_core_web_sm.load()
    def all_nouns(sentence):
        doc = nlp(sentence)
        nouns = [(token.lemma_, "NN") for token in doc if token.pos_ == "NOUN"]
        return nouns

    def all_verbs(sentence):
        doc = nlp(sentence)
        verbs = [(token.lemma_, "VB") for token in doc if token.pos_ == "VERB"]
        return verbs

    nouns_verbs_spacy = all_nouns(sentence) + all_verbs(sentence)

    nouns_verbs_sp = pd.DataFrame(nouns_verbs_spacy, columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency_sp = nouns_verbs_sp.merge(counted_words, on='Word', how='inner').sort_values(by='Word frequency', ascending=False)
    noun_verb_frequency_sp = noun_verb_frequency_sp.drop_duplicates(subset='Word', keep="last")

    fig = px.bar(noun_verb_frequency_sp, x='Word', y='Word frequency', color='Word type', title='Noun/verb frequencies for Category: ' + keyword)
    fig.update_layout(xaxis_categoryorder = 'total descending')

    fig.show()

In [20]:
fig_frequencies(scores_categories_physical, 'Physical');


[W123] Argument disable with value [] is used instead of ['senter'] as specified in the config. Be aware that this might affect other components in your pipeline.



In [21]:
fig_frequencies(scores_categories_mental, 'Mental');

In [22]:
fig_frequencies(scores_categories_emotional, 'Emotional');

In [23]:
fig_frequencies(scores_categories_community, 'Community');

In [24]:
fig_frequencies(scores_categories_spiritual, 'Spiritual');

In [25]:
fig_frequencies(scores_categories_occupational, 'Occupational');

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=443b2d8b-ed93-43a6-bbe3-c3fe69ac80e1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>