## UW Pharmacy Student Self-Care Analysis (spring 5_6)

This notebook is a draft using the "spring 5_6" page in the student comment data of the 2022-2023 SY in the UW SoP. The analysis roughly looks at the determinants of the categories of self-care students chose for this quarter using VADER and NLTK sentiment analysis and Pandas table manipulation.

Note: this notebook relies on uploading a single csv sheet for one quarter, along with adding two additional columns called "Category 1" and "Category 2" which categorize each comment based on the 8 facets of self-care.

### Setup

In [1]:
#import python libraries
import numpy as np
import pandas as pd

!pip install vaderSentiment
!pip install --upgrade pip
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from textblob import TextBlob
import nltk
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

import spacy

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 KB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m82.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.4
    Uninstalling pip-22.0.4:
      Successfully uninstalled pip-22.0.4
Successfully installed pip-23.1.2
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping

In [2]:
# Can change the file csv right here
file_csv = "spring5_6.csv"

data = pd.read_csv(file_csv)

first_column = data.columns[1]
data = data[[first_column, "Category 1", "Category 2"]]

In [3]:
all_dups = pd.DataFrame()
for i in range(len(data)):
    if not pd.isnull(data["Category 2"][i]):
        temp = data[[first_column, "Category 2"]].iloc[i]
        all_dups = all_dups.append(temp)
    else:
        continue
all_dups = all_dups[[first_column, "Category 2"]]
all_dups = all_dups.rename(columns={"Category 2": "Category"})

### Working data

In [4]:
data = data.rename(columns={"Category 1": "Category"})
data = data[[first_column, "Category"]]
data = data.append(all_dups)
data

Unnamed: 0,Spr 5_6,Category
0,Gave myself time to lie down on the couch and ...,Mental
1,I went on a run!,Physical
2,This week I spent at least 20 minutes reading ...,Mental
3,"I took a walk to my neighborhood park , sat on...",Physical
4,I went to watch movie at ipic theatre in redmo...,Community
...,...,...
97,I made myself lunch and read a book,Mental
98,I went to the Swanson Nursery with my mom and ...,Community
99,I used the remaining class time to practice pa...,Mental
103,I did some laundry and tidied up my house!,Mental


### Category frequencies within student comments

The comments data is displayed here in terms of each of the 8 self-care categories (physical, mental, community, emotional, environmental, spiritual, and occupational) through a pie chart. 

In [5]:
import plotly.express as px

dfg = data.groupby("Category").count().sort_values(by=first_column, ascending=False)
print("All reflections processed (includes duplicate categories): ", np.sum(dfg[first_column]))

dfg_pie = px.pie(dfg.reset_index(), values='Spr 5_6', names='Category', title='Category Frequencies')
dfg_pie

All reflections processed (includes duplicate categories):  167


### Sentiment analysis (positivity polarity score for each comment)

In this section, I scored each of the comments using the VADER sentiment analysis function. The function sentiment_scores(sentence) will return the positivity polarity score for each comment probabilistically between -1 and 1, with -1 being the most negative and 1 being the most positive. Note that this model is untrained and will be a very rough interpretation of the comments, i.e. not trained on what self-care is as a whole but on the general tone of each comment.

In [6]:
# function to print sentiments
# of the sentence (vader)

#taken from geekforgeeks
def sentiment_scores(sentence):
 
    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
 
    # polarity_scores method of SentimentIntensityAnalyzer
    # object gives a sentiment dictionary.
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)

    return sentiment_dict['compound']

In [7]:
scores = pd.DataFrame(columns={"Win 1_2", "Score"})
for i in data[first_column]:
    sentence = str(i)
    score = sentiment_scores(sentence)
    scores = scores.append({first_column: sentence, "Score": score}, ignore_index=True)

In [8]:
scores_categories = data.merge(scores, on=first_column, how="right")
scores_categories

Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
0,Gave myself time to lie down on the couch and ...,Mental,0.5255,
1,I went on a run!,Physical,0.0000,
2,This week I spent at least 20 minutes reading ...,Mental,0.0000,
3,"I took a walk to my neighborhood park , sat on...",Physical,0.0000,
4,"I took a walk to my neighborhood park , sat on...",Environmental,0.0000,
...,...,...,...,...
289,I used the remaining class time to practice pa...,Mental,0.0000,
290,I did some laundry and tidied up my house!,Environmental,0.0000,
291,I did some laundry and tidied up my house!,Mental,0.0000,
292,This week for health and wellness I set-up a c...,Mental,0.3818,


In [9]:
fig = px.scatter(scores_categories, y="Score", x="Category", title="Polarity Scores by Category")
mean_scores = scores_categories.groupby('Category').mean()
for c in mean_scores.index:
    fig.add_scatter(x=[c],
                    y=[mean_scores.loc[c]['Score']],
                    marker=dict(
                        color='red',
                        size=10
                    ),
                name=f'{c} mean')

fig.show()

In [10]:
px.bar(mean_scores.reset_index().sort_values(by="Score", ascending=False), x="Category", y="Score", title="Average Scores per Category")

### Scores of each category 

In [11]:
print('Physical category table')
scores_categories_physical = scores_categories[scores_categories["Category"] == "Physical"]
scores_categories_physical

Physical category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
1,I went on a run!,Physical,0.0000,
3,"I took a walk to my neighborhood park , sat on...",Physical,0.0000,
8,I sat on my balcony and just enjoyed my coffee...,Physical,0.7599,
12,Went to the IMA to workout,Physical,0.0000,
20,This week I was able to take some more time an...,Physical,0.0000,
...,...,...,...,...
267,I did some gentle yoga,Physical,0.4404,
268,I went for a 15 min walk each way to Safeway t...,Physical,0.0000,
273,"For my activity this week, I played softball w...",Physical,0.8065,
275,I went for a run out in the sun.,Physical,0.0000,


In [12]:
print('Mental category table')
scores_categories_mental = scores_categories[scores_categories["Category"] == "Mental"]
scores_categories_mental

Mental category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
0,Gave myself time to lie down on the couch and ...,Mental,0.5255,
2,This week I spent at least 20 minutes reading ...,Mental,0.0,
11,"I've been struggling with anxiety lately, whic...",Mental,0.8173,
14,"For my wellness self-care, I am documenting ou...",Mental,0.8716,
18,I had more unconventional self-care activity t...,Mental,0.3365,
25,I have been spending more time with my chicken...,Mental,0.2714,
38,I play with cats and worked out for 1 hour.,Mental,0.34,
40,"I am recovering from being sick for a week, so...",Mental,0.3182,
67,I actually used this time to edit some photos ...,Mental,0.0,
68,I played ukulele and sang.,Mental,0.34,


In [13]:
print('Community category table')
scores_categories_community = scores_categories[scores_categories["Category"] == "Community"]
scores_categories_community

Community category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
5,I went to watch movie at ipic theatre in redmo...,Community,0.9485,
15,"For my wellness self-care, I am documenting ou...",Community,0.8716,
17,My fiancé and I went and got dinner with my da...,Community,0.807,
24,I’ve been really stressed lately due to midter...,Community,0.6915,
33,I drank wine with my boyfriend and had a relax...,Community,0.4939,
42,For my health and wellness this week I took my...,Community,0.9951,
46,I will be facetiming a friend.,Community,0.4939,
48,Assessing Occupational Wellness: I opened my L...,Community,0.98,
50,I celebrated Eid with my friends and family.,Community,0.7783,
51,I went to a concert o Sunday night. Saw a very...,Community,0.8057,


In [14]:
print('Emotional category table')
scores_categories_emotional = scores_categories[scores_categories["Category"] == "Emotional"]
scores_categories_emotional

Emotional category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
10,"I've been struggling with anxiety lately, whic...",Emotional,0.8173,
28,"I am sick currently, so I only listened to mus...",Emotional,-0.5106,
30,I spent 30 minutes to listening to my favorite...,Emotional,0.4588,
32,I spent 2 hours to re-read the Buddha's Brain ...,Emotional,0.836,
52,I went to a concert o Sunday night. Saw a very...,Emotional,0.8057,
54,I visited the Seattle Asian Art Museum in the ...,Emotional,0.0,
61,After coming from WIP I went to my room put on...,Emotional,0.6705,
69,I played ukulele and sang.,Emotional,0.34,
84,I started learning a new song on the piano.,Emotional,0.0,
86,"I was reading a book called ""Clarity and Conne...",Emotional,0.9313,


In [15]:
print('Spiritual category table')
scores_categories_spiritual = scores_categories[scores_categories["Category"] == "Spiritual"]
scores_categories_spiritual

Spiritual category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
31,I spent 2 hours to re-read the Buddha's Brain ...,Spiritual,0.836,
39,"I am recovering from being sick for a week, so...",Spiritual,0.3182,
49,I celebrated Eid with my friends and family.,Spiritual,0.7783,
62,After coming from WIP I went to my room put on...,Spiritual,0.6705,
124,I did some gentle yoga,Spiritual,0.4404,
190,I spent 2 hours to re-read the Buddha's Brain ...,Spiritual,0.836,
198,"I am recovering from being sick for a week, so...",Spiritual,0.3182,
206,I celebrated Eid with my friends and family.,Spiritual,0.7783,
217,After coming from WIP I went to my room put on...,Spiritual,0.6705,
266,I did some gentle yoga,Spiritual,0.4404,


In [16]:
print('Occupational category table')
scores_categories_occupational = scores_categories[scores_categories["Category"] == "Occupational"]
scores_categories_occupational

Occupational category table


Unnamed: 0,Spr 5_6,Category,Score,Win 1_2
9,Reflection: I spent a little bit of time refle...,Occupational,0.5106,
47,Assessing Occupational Wellness: I opened my L...,Occupational,0.98,
66,I ate lunch and worked on my assignments this ...,Occupational,0.0,
88,I invited my friend and talked for a long time...,Occupational,0.8399,
135,I took this time going over all the onboarding...,Occupational,-0.4215,
140,This week/today for my well-being activity I m...,Occupational,0.8176,
153,Today I took some time to catch up on homework...,Occupational,0.891,
158,I used the remaining class time to practice pa...,Occupational,0.0,
167,This week for health and wellness I set-up a c...,Occupational,0.3818,
204,Assessing Occupational Wellness: I opened my L...,Occupational,0.98,


### Frequency of nouns and verbs 

In [17]:
!pip install spacy -q
!python -m spacy download en_core_web_sm -q

from collections import Counter
import en_core_web_sm

2023-06-01 20:57:49.557260: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 20:57:49.745487: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-06-01 20:57:49.745528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-06-01 20:57:49.778576: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-01 20:57:50.897456: W tensorflow/stream_executor/pla

Note: Just for the sake of experimentation, I only did this part on comments categorized as "physical." Can easily broaden it to more categories if useful.

This section takes the top 100 ('top' meaning most frequent) words from a particular category along with its count within the comments. Again, note that there may be duplicates depending on if the comment was repeated if it was categorized twice. I then created a function to extract nouns and verbs within that category. This can potentially be helpful if we were to see which action within each category students tended to choose.

In [18]:
def fig_frequencies(tbl, keyword):
    counter = Counter(" ".join(tbl[first_column]).split()).most_common(100)
    array = []
    for c in counter:
        array = np.append(array, c[0])

    sentence = ','.join(array)

    def extract_nouns_verbs(sentence):
        text = nltk.word_tokenize(sentence)
        pos_tagged = nltk.pos_tag(text)
        nouns_verbs = filter(lambda x:x[1]=='NN' or x[1] == 'VB',pos_tagged)
        return list(nouns_verbs)

    nouns_verbs = pd.DataFrame(extract_nouns_verbs(sentence), columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency = nouns_verbs.merge(counted_words, on='Word', how='left').sort_values(by='Word frequency', ascending=False)

    nlp = en_core_web_sm.load()
    def all_nouns(sentence):
        doc = nlp(sentence)
        nouns = [(token.lemma_, "NN") for token in doc if token.pos_ == "NOUN"]
        return nouns

    def all_verbs(sentence):
        doc = nlp(sentence)
        verbs = [(token.lemma_, "VB") for token in doc if token.pos_ == "VERB"]
        return verbs

    nouns_verbs_spacy = all_nouns(sentence) + all_verbs(sentence)

    nouns_verbs_sp = pd.DataFrame(nouns_verbs_spacy, columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency_sp = nouns_verbs_sp.merge(counted_words, on='Word', how='inner').sort_values(by='Word frequency', ascending=False)
    noun_verb_frequency_sp = noun_verb_frequency_sp.drop_duplicates(subset='Word', keep="last")

    fig = px.bar(noun_verb_frequency_sp, x='Word', y='Word frequency', color='Word type', title='Noun/verb frequencies for Category: ' + keyword)
    fig.update_layout(xaxis_categoryorder = 'total descending')

    fig.show()

In [19]:
fig_frequencies(scores_categories_physical, 'Physical');


[W123] Argument disable with value [] is used instead of ['senter'] as specified in the config. Be aware that this might affect other components in your pipeline.



In [20]:
fig_frequencies(scores_categories_mental, 'Mental');

In [21]:
fig_frequencies(scores_categories_emotional, 'Emotional');

In [22]:
fig_frequencies(scores_categories_community, 'Community');

In [23]:
fig_frequencies(scores_categories_spiritual, 'Spiritual');

In [24]:
fig_frequencies(scores_categories_occupational, 'Occupational');

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=443b2d8b-ed93-43a6-bbe3-c3fe69ac80e1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>