## UW Pharmacy Student Self-Care Analysis (spring 7_8)

This notebook is a draft using the "spring 7_8" page in the student comment data of the 2022-2023 SY in the UW SoP. The analysis roughly looks at the determinants of the categories of self-care students chose for this quarter using VADER and NLTK sentiment analysis and Pandas table manipulation.

Note: this notebook relies on uploading a single csv sheet for one quarter, along with adding two additional columns called "Category 1" and "Category 2" which categorize each comment based on the 8 facets of self-care.

### Setup

In [1]:
#import python libraries
import numpy as np
import pandas as pd

!pip install vaderSentiment
!pip install --upgrade pip
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from textblob import TextBlob
import nltk
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

import spacy

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
  from .autonotebook import tqdm as notebook_tqdm
2023-06-01 20:59:57.265002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 20:59:57.449099: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared

In [2]:
# Can change the file csv right here
file_csv = "spring7_8.csv"

data = pd.read_csv(file_csv)

first_column = data.columns[1]
data = data[[first_column, "Category 1", "Category 2"]]

In [3]:
all_dups = pd.DataFrame()
for i in range(len(data)):
    if not pd.isnull(data["Category 2"][i]):
        temp = data[[first_column, "Category 2"]].iloc[i]
        all_dups = all_dups.append(temp)
    else:
        continue
all_dups = all_dups[[first_column, "Category 2"]]
all_dups = all_dups.rename(columns={"Category 2": "Category"})

### Working data

In [4]:
data = data.rename(columns={"Category 1": "Category"})
data = data[[first_column, "Category"]]
data = data.append(all_dups)
data

Unnamed: 0,Spr 7-8,Category
0,Took a nap after class and ended up sleeping t...,Physical
1,I watched TV with my dad!,Community
2,This week I spent about an hour getting dinner...,Physical
3,I went to go visit my high school teachers.,Community
4,I took a 30 minutes walk in my new neighborhoo...,Environmental
...,...,...
101,I am looking forward to our white coat ceremon...,Community
103,I went to U Village for lunch!,Physical
104,Jogged outside to gas works park for half an hour,Environmental
105,This week for health and wellness I learned ab...,Physical


### Category frequencies within student comments

The comments data is displayed here in terms of each of the 8 self-care categories (physical, mental, community, emotional, environmental, spiritual, and occupational) through a pie chart. 

In [5]:
import plotly.express as px

dfg = data.groupby("Category").count().sort_values(by=first_column, ascending=False)
print("All reflections processed (includes duplicate categories): ", np.sum(dfg[first_column]))

dfg_pie = px.pie(dfg.reset_index(), values='Spr 7-8', names='Category', title='Category Frequencies')
dfg_pie

All reflections processed (includes duplicate categories):  162


### Sentiment analysis (positivity polarity score for each comment)

In this section, I scored each of the comments using the VADER sentiment analysis function. The function sentiment_scores(sentence) will return the positivity polarity score for each comment probabilistically between -1 and 1, with -1 being the most negative and 1 being the most positive. Note that this model is untrained and will be a very rough interpretation of the comments, i.e. not trained on what self-care is as a whole but on the general tone of each comment.

In [6]:
# function to print sentiments
# of the sentence (vader)

#taken from geekforgeeks
def sentiment_scores(sentence):
 
    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
 
    # polarity_scores method of SentimentIntensityAnalyzer
    # object gives a sentiment dictionary.
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)

    return sentiment_dict['compound']

In [7]:
scores = pd.DataFrame(columns={"Win 1_2", "Score"})
for i in data[first_column]:
    sentence = str(i)
    score = sentiment_scores(sentence)
    scores = scores.append({first_column: sentence, "Score": score}, ignore_index=True)

In [8]:
scores_categories = data.merge(scores, on=first_column, how="right")
scores_categories

Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
0,Took a nap after class and ended up sleeping t...,Physical,,0.0000
1,I watched TV with my dad!,Community,,0.0000
2,I watched TV with my dad!,Mental,,0.0000
3,This week I spent about an hour getting dinner...,Physical,,0.5696
4,This week I spent about an hour getting dinner...,Community,,0.5696
...,...,...,...,...
271,Jogged outside to gas works park for half an hour,Environmental,,0.0000
272,This week for health and wellness I learned ab...,Emotional,,0.0258
273,This week for health and wellness I learned ab...,Physical,,0.0258
274,Took a breather to do things for myself like w...,Mental,,-0.3182


In [9]:
fig = px.scatter(scores_categories, y="Score", x="Category", title="Polarity Scores by Category")
mean_scores = scores_categories.groupby('Category').mean()
for c in mean_scores.index:
    fig.add_scatter(x=[c],
                    y=[mean_scores.loc[c]['Score']],
                    marker=dict(
                        color='red',
                        size=10
                    ),
                name=f'{c} mean')

fig.show()

In [10]:
px.bar(mean_scores.reset_index().sort_values(by="Score", ascending=False), x="Category", y="Score", title="Average Scores per Category")

### Scores of each category 

In [11]:
print('Physical category table')
scores_categories_physical = scores_categories[scores_categories["Category"] == "Physical"]
scores_categories_physical

Physical category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
0,Took a nap after class and ended up sleeping t...,Physical,,0.0000
3,This week I spent about an hour getting dinner...,Physical,,0.5696
7,I took a 30 minutes walk in my new neighborhoo...,Physical,,0.9232
8,I focused on health and well-being. I utilized...,Physical,,0.8860
12,Went for a run,Physical,,0.0000
...,...,...,...,...
260,I went grocery shopping and to the gym.,Physical,,0.0000
262,[photo of student at rock climbing gym],Physical,,0.0000
269,I went to U Village for lunch!,Physical,,0.0000
270,Jogged outside to gas works park for half an hour,Physical,,0.0000


In [12]:
print('Mental category table')
scores_categories_mental = scores_categories[scores_categories["Category"] == "Mental"]
scores_categories_mental

Mental category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
2,I watched TV with my dad!,Mental,,0.0
9,I focused on health and well-being. I utilized...,Mental,,0.886
11,Finished reading a really good book called The...,Mental,,0.4927
13,"For my wellness, I decided to read for 20 minu...",Mental,,0.861
23,I spent the entire day cleaning my room. Messy...,Mental,,-0.6476
24,"I went on a walk with Henry, the pup I am hous...",Mental,,0.3182
27,I honestly kind of forgot to do my wellness he...,Mental,,0.7968
38,"This week, I find myself in Chicago with my fa...",Mental,,0.9948
51,I have been spending more time outside in the ...,Mental,,0.5095
54,Pet and spent time with my doggies!,Mental,,0.0


In [13]:
print('Community category table')
scores_categories_community = scores_categories[scores_categories["Category"] == "Community"]
scores_categories_community

Community category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
1,I watched TV with my dad!,Community,,0.0
4,This week I spent about an hour getting dinner...,Community,,0.5696
5,I went to go visit my high school teachers.,Community,,0.0
18,This week I cooked up some tacos for my roomma...,Community,,0.0
29,"I called my mom in [country outside US], and w...",Community,,0.8126
31,I went hiking with my boyfriend.,Community,,0.0
36,I went to green lake in Seattle with my friend...,Community,,0.4767
37,"This week, I find myself in Chicago with my fa...",Community,,0.9948
42,I got lunch with my big [sibling from fraternity],Community,,0.0
46,I went to an engagement party.,Community,,0.6908


In [14]:
print('Emotional category table')
scores_categories_emotional = scores_categories[scores_categories["Category"] == "Emotional"]
scores_categories_emotional

Emotional category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
15,I took some time to draw. It was nice to break...,Emotional,,0.3804
16,Today I decided to start writing in my journal...,Emotional,,0.8398
45,I just want to be me. Be yourself: behave in a...,Emotional,,0.9939
59,I reflected on the things that have troubled m...,Emotional,,-0.4588
74,Practiced piano.,Emotional,,0.0
102,It’s been awhile since I last journaled so I d...,Emotional,,0.5574
137,I listened to music.,Emotional,,0.0
159,This week for health and wellness I learned ab...,Emotional,,0.0258
210,Practiced piano.,Emotional,,0.0
272,This week for health and wellness I learned ab...,Emotional,,0.0258


In [15]:
print('Spiritual category table')
scores_categories_spiritual = scores_categories[scores_categories["Category"] == "Spiritual"]
scores_categories_spiritual

Spiritual category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score


In [16]:
print('Occupational category table')
scores_categories_occupational = scores_categories[scores_categories["Category"] == "Occupational"]
scores_categories_occupational

Occupational category table


Unnamed: 0,Spr 7-8,Category,Win 1_2,Score
10,I spent some time reflecting on my progress as...,Occupational,,0.7269
44,I will be working out at the gym and catching ...,Occupational,,0.0
81,Growth mindset - After performing poorly on a ...,Occupational,,0.875
129,Today for my Health wellness and self-care I a...,Occupational,,0.9022
138,I joined a study group to prepare the Friday e...,Occupational,,0.0
150,I used the remaining class time to practice fo...,Occupational,,0.0
152,I am looking forward to our white coat ceremon...,Occupational,,0.3182
162,Took a breather to do things for myself like w...,Occupational,,-0.3182
193,I will be working out at the gym and catching ...,Occupational,,0.0
250,Today for my Health wellness and self-care I a...,Occupational,,0.9022


### Frequency of nouns and verbs 

In [17]:
!pip install spacy -q
!python -m spacy download en_core_web_sm -q

from collections import Counter
import en_core_web_sm

2023-06-01 21:00:09.157500: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 21:00:09.348826: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-06-01 21:00:09.348863: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-06-01 21:00:09.380628: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-01 21:00:10.280940: W tensorflow/stream_executor/pla

Note: Just for the sake of experimentation, I only did this part on comments categorized as "physical." Can easily broaden it to more categories if useful.

This section takes the top 100 ('top' meaning most frequent) words from a particular category along with its count within the comments. Again, note that there may be duplicates depending on if the comment was repeated if it was categorized twice. I then created a function to extract nouns and verbs within that category. This can potentially be helpful if we were to see which action within each category students tended to choose.

In [18]:
def fig_frequencies(tbl, keyword):
    counter = Counter(" ".join(tbl[first_column]).split()).most_common(100)
    array = []
    for c in counter:
        array = np.append(array, c[0])

    sentence = ','.join(array)

    def extract_nouns_verbs(sentence):
        text = nltk.word_tokenize(sentence)
        pos_tagged = nltk.pos_tag(text)
        nouns_verbs = filter(lambda x:x[1]=='NN' or x[1] == 'VB',pos_tagged)
        return list(nouns_verbs)

    nouns_verbs = pd.DataFrame(extract_nouns_verbs(sentence), columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency = nouns_verbs.merge(counted_words, on='Word', how='left').sort_values(by='Word frequency', ascending=False)

    nlp = en_core_web_sm.load()
    def all_nouns(sentence):
        doc = nlp(sentence)
        nouns = [(token.lemma_, "NN") for token in doc if token.pos_ == "NOUN"]
        return nouns

    def all_verbs(sentence):
        doc = nlp(sentence)
        verbs = [(token.lemma_, "VB") for token in doc if token.pos_ == "VERB"]
        return verbs

    nouns_verbs_spacy = all_nouns(sentence) + all_verbs(sentence)

    nouns_verbs_sp = pd.DataFrame(nouns_verbs_spacy, columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency_sp = nouns_verbs_sp.merge(counted_words, on='Word', how='inner').sort_values(by='Word frequency', ascending=False)
    noun_verb_frequency_sp = noun_verb_frequency_sp.drop_duplicates(subset='Word', keep="last")

    fig = px.bar(noun_verb_frequency_sp, x='Word', y='Word frequency', color='Word type', title='Noun/verb frequencies for Category: ' + keyword)
    fig.update_layout(xaxis_categoryorder = 'total descending')

    fig.show()

In [19]:
fig_frequencies(scores_categories_physical, 'Physical');


[W123] Argument disable with value [] is used instead of ['senter'] as specified in the config. Be aware that this might affect other components in your pipeline.



In [20]:
fig_frequencies(scores_categories_mental, 'Mental');

In [21]:
fig_frequencies(scores_categories_emotional, 'Emotional');

In [22]:
fig_frequencies(scores_categories_community, 'Community');

In [23]:
fig_frequencies(scores_categories_spiritual, 'Spiritual');

In [24]:
fig_frequencies(scores_categories_occupational, 'Occupational');

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=443b2d8b-ed93-43a6-bbe3-c3fe69ac80e1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>