## UW Pharmacy Student Self-Care Analysis (winter 9_10)

This notebook is a draft using the "winter 9_10" page in the student comment data of the 2022-2023 SY in the UW SoP. The analysis roughly looks at the determinants of the categories of self-care students chose for this quarter using VADER and NLTK sentiment analysis and Pandas table manipulation.

Note: this notebook relies on uploading a single csv sheet for one quarter, along with adding two additional columns called "Category 1" and "Category 2" which categorize each comment based on the 8 facets of self-care.

### Setup

In [1]:
#import python libraries
import numpy as np
import pandas as pd

!pip install vaderSentiment
!pip install --upgrade pip
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from textblob import TextBlob
import nltk
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

import spacy

^C
Traceback (most recent call last):
  File "/root/venv/bin/pip", line 8, in <module>
    sys.exit(main())
  File "/root/venv/lib/python3.9/site-packages/pip/_internal/cli/main.py", line 77, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/root/venv/lib/python3.9/site-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_w

In [2]:
# Can change the file csv right here
file_csv = "winter9_10.csv"

data = pd.read_csv(file_csv)

first_column = data.columns[1]
data = data[[first_column, "Category 1", "Category 2"]]

In [3]:
all_dups = pd.DataFrame()
for i in range(len(data)):
    if not pd.isnull(data["Category 2"][i]):
        temp = data[[first_column, "Category 2"]].iloc[i]
        all_dups = all_dups.append(temp)
    else:
        continue
all_dups = all_dups[[first_column, "Category 2"]]
all_dups = all_dups.rename(columns={"Category 2": "Category"})

### Working data

In [4]:
data = data.rename(columns={"Category 1": "Category"})
data = data[[first_column, "Category"]]
data = data.append(all_dups)
data

Unnamed: 0,Win 9_10,Category
0,Took a walk outside to take in some of the rar...,Physical
1,One of the activities that I chose for health/...,Mental
2,"Today, I spent some of my time watching a show...",Mental
3,I grabbed lunch with a friend.,Community
4,I tried to using peloton in our apartment bui...,Physical
...,...,...
98,I walked around the neighborhood with my family.,Environmental
99,This week I took a quick walk on this sunny We...,Mental
102,I took a walk around my neighborhood,Environmental
103,I talked to my parents and got gas! Did some l...,Environmental


### Category frequencies within student comments

The comments data is displayed here in terms of each of the 8 self-care categories (physical, mental, community, emotional, environmental, spiritual, and occupational) through a pie chart. 

In [5]:
import plotly.express as px

dfg = data.groupby("Category").count().sort_values(by=first_column, ascending=False)
print("All reflections processed (includes duplicate categories): ", np.sum(dfg[first_column]))

dfg_pie = px.pie(dfg.reset_index(), values='Win 9_10', names='Category', title='Category Frequencies')
dfg_pie

All reflections processed (includes duplicate categories):  157


### Sentiment analysis (positivity polarity score for each comment)

In this section, I scored each of the comments using the VADER sentiment analysis function. The function sentiment_scores(sentence) will return the positivity polarity score for each comment probabilistically between -1 and 1, with -1 being the most negative and 1 being the most positive. Note that this model is untrained and will be a very rough interpretation of the comments, i.e. not trained on what self-care is as a whole but on the general tone of each comment.

In [6]:
# function to print sentiments
# of the sentence (vader)

#taken from geekforgeeks
def sentiment_scores(sentence):
 
    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
 
    # polarity_scores method of SentimentIntensityAnalyzer
    # object gives a sentiment dictionary.
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)

    return sentiment_dict['compound']

In [7]:
scores = pd.DataFrame(columns={"Win 1_2", "Score"})
for i in data[first_column]:
    sentence = str(i)
    score = sentiment_scores(sentence)
    scores = scores.append({first_column: sentence, "Score": score}, ignore_index=True)

In [8]:
scores_categories = data.merge(scores, on=first_column, how="right")
scores_categories

Unnamed: 0,Win 9_10,Category,Score,Win 1_2
0,Took a walk outside to take in some of the rar...,Physical,0.0000,
1,Took a walk outside to take in some of the rar...,Environmental,0.0000,
2,One of the activities that I chose for health/...,Mental,0.7339,
3,One of the activities that I chose for health/...,Community,0.7339,
4,"Today, I spent some of my time watching a show...",Mental,0.4215,
...,...,...,...,...
253,I took a walk around my neighborhood,Environmental,0.0000,
254,I talked to my parents and got gas! Did some l...,Community,0.4753,
255,I talked to my parents and got gas! Did some l...,Environmental,0.4753,
256,This week for self care I layed down and medit...,Spiritual,0.7906,


In [9]:
fig = px.scatter(scores_categories, y="Score", x="Category", title="Polarity Scores by Category")
mean_scores = scores_categories.groupby('Category').mean()
for c in mean_scores.index:
    fig.add_scatter(x=[c],
                    y=[mean_scores.loc[c]['Score']],
                    marker=dict(
                        color='red',
                        size=10
                    ),
                name=f'{c} mean')

fig.show()

In [10]:
px.bar(mean_scores.reset_index().sort_values(by="Score", ascending=False), x="Category", y="Score", title="Average Scores per Category")

### Scores of each category 

In [11]:
print('Physical category table')
scores_categories_physical = scores_categories[scores_categories["Category"] == "Physical"]
scores_categories_physical

Physical category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
0,Took a walk outside to take in some of the rar...,Physical,0.0000,
7,I tried to using peloton in our apartment bui...,Physical,0.7897,
8,"Today, I focused on a growth mindset and rest ...",Physical,0.8981,
12,This week I worked out at the IMA for 1 hour.,Physical,0.0000,
17,"Today, I decided to make a homemade meal. I fe...",Physical,0.6167,
...,...,...,...,...
242,Go to the gym.,Physical,0.0000,
244,I took a walk outside.,Physical,0.0000,
246,I went to the gym and washed my car.,Physical,0.0000,
250,This week I took a quick walk on this sunny We...,Physical,0.5023,


In [12]:
print('Mental category table')
scores_categories_mental = scores_categories[scores_categories["Category"] == "Mental"]
scores_categories_mental

Mental category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
2,One of the activities that I chose for health/...,Mental,0.7339,
4,"Today, I spent some of my time watching a show...",Mental,0.4215,
11,Started watching a new TV show,Mental,0.0,
20,Watched a Netflix documentary for 20 minutes.,Mental,0.0,
24,[photo] dog-sitting,Mental,0.0,
28,I was dancing with my favorite song in the bac...,Mental,0.4588,
36,"For self-care, this week I organized my apartm...",Mental,0.995,
43,I went shopping for clothes.,Mental,0.0,
44,I did some shopping and hung out with family f...,Mental,0.4767,
47,I went to a Cirque de Soleil show last weekend...,Mental,0.4767,


In [13]:
print('Community category table')
scores_categories_community = scores_categories[scores_categories["Category"] == "Community"]
scores_categories_community

Community category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
3,One of the activities that I chose for health/...,Community,0.7339,
5,"Today, I spent some of my time watching a show...",Community,0.4215,
6,I grabbed lunch with a friend.,Community,0.4939,
23,I was strangely craving apple juice all day so...,Community,0.2263,
30,I spent time with my boyfriend and relaxed.,Community,0.4939,
39,I went out to lunch with my mentee,Community,0.0,
40,I took a walk with my friend on a trail,Community,0.4939,
45,I did some shopping and hung out with family f...,Community,0.4767,
46,I went to a Cirque de Soleil show last weekend...,Community,0.4767,
56,I called my bestfriend.,Community,0.0,


In [14]:
print('Emotional category table')
scores_categories_emotional = scores_categories[scores_categories["Category"] == "Emotional"]
scores_categories_emotional

Emotional category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
10,I took some time to write a journal entry. At ...,Emotional,0.296,
16,This last weekend I went to a concert with my ...,Emotional,0.9683,
42,The activity I choose to day is gratitude. I w...,Emotional,0.9957,
49,I recently started journaling daily. I’ve foun...,Emotional,0.8316,
53,"I was watching videos of Vietnam, and I was th...",Emotional,0.9559,
55,"After WIP today, I decided to relax by listeni...",Emotional,0.8947,
100,My highlight of the week was that I got to dis...,Emotional,0.9305,
155,This week for self care I layed down and medit...,Emotional,0.7906,
167,This last weekend I went to a concert with my ...,Emotional,0.9683,
189,"After WIP today, I decided to relax by listeni...",Emotional,0.8947,


In [15]:
print('Spiritual category table')
scores_categories_spiritual = scores_categories[scores_categories["Category"] == "Spiritual"]
scores_categories_spiritual

Spiritual category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
9,"For this assignment, I chose to do a rest and ...",Spiritual,0.6486,
21,"For my health and wellness this week, I chose ...",Spiritual,0.9359,
112,I made a meal and did some yoga.,Spiritual,0.0,
122,I switched WIP shifts with the gold cohort thi...,Spiritual,0.5411,
154,This week for self care I layed down and medit...,Spiritual,0.7906,
229,I made a meal and did some yoga.,Spiritual,0.0,
256,This week for self care I layed down and medit...,Spiritual,0.7906,


In [16]:
print('Occupational category table')
scores_categories_occupational = scores_categories[scores_categories["Category"] == "Occupational"]
scores_categories_occupational

Occupational category table


Unnamed: 0,Win 9_10,Category,Score,Win 1_2
64,Nurture- social connectedness. I spent time st...,Occupational,0.6705,
75,It was a beautiful and peaceful today! I start...,Occupational,0.8858,
89,on 9/10 : i used the extra time we had to go o...,Occupational,0.0,
104,I went to a study group for 1 hour,Occupational,0.0,
123,I took Wednesday's designated class time and s...,Occupational,0.1707,
195,Nurture- social connectedness. I spent time st...,Occupational,0.6705,
203,It was a beautiful and peaceful today! I start...,Occupational,0.8858,


### Frequency of nouns and verbs 

In [17]:
!pip install spacy -q
!python -m spacy download en_core_web_sm -q

from collections import Counter
import en_core_web_sm

^C
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/local/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/shared-libs/python3.9/py/lib/python3.9/site-packages/spacy/__init__.py", line 6, in <module>
  File "/shared-libs/python3.9/py/lib/python3.9/site-packages/spacy/errors.py", line 2, in <module>
    from .compat import Literal
  File "/shared-libs/python3.9/py/lib/python3.9/site-packages/spacy/compat.py", line 3, in <module>
    from thinc.util import copy_array
  File "/shared-libs/python3.9/py/lib/python3.9/site-packages/thinc/__init__.py", line 5, in <module>
    from .config import registry
  File "/shared-libs/python3.9/py/lib/python3.9/site-packages/thinc/config.py

Note: Just for the sake of experimentation, I only did this part on comments categorized as "physical." Can easily broaden it to more categories if useful.

This section takes the top 100 ('top' meaning most frequent) words from a particular category along with its count within the comments. Again, note that there may be duplicates depending on if the comment was repeated if it was categorized twice. I then created a function to extract nouns and verbs within that category. This can potentially be helpful if we were to see which action within each category students tended to choose.

In [18]:
def fig_frequencies(tbl, keyword):
    counter = Counter(" ".join(tbl[first_column]).split()).most_common(100)
    array = []
    for c in counter:
        array = np.append(array, c[0])

    sentence = ','.join(array)

    def extract_nouns_verbs(sentence):
        text = nltk.word_tokenize(sentence)
        pos_tagged = nltk.pos_tag(text)
        nouns_verbs = filter(lambda x:x[1]=='NN' or x[1] == 'VB',pos_tagged)
        return list(nouns_verbs)

    nouns_verbs = pd.DataFrame(extract_nouns_verbs(sentence), columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency = nouns_verbs.merge(counted_words, on='Word', how='left').sort_values(by='Word frequency', ascending=False)

    nlp = en_core_web_sm.load()
    def all_nouns(sentence):
        doc = nlp(sentence)
        nouns = [(token.lemma_, "NN") for token in doc if token.pos_ == "NOUN"]
        return nouns

    def all_verbs(sentence):
        doc = nlp(sentence)
        verbs = [(token.lemma_, "VB") for token in doc if token.pos_ == "VERB"]
        return verbs

    nouns_verbs_spacy = all_nouns(sentence) + all_verbs(sentence)

    nouns_verbs_sp = pd.DataFrame(nouns_verbs_spacy, columns=['Word', 'Word type'])
    counted_words = pd.DataFrame(counter, columns=['Word', 'Word frequency'])

    noun_verb_frequency_sp = nouns_verbs_sp.merge(counted_words, on='Word', how='inner').sort_values(by='Word frequency', ascending=False)
    noun_verb_frequency_sp = noun_verb_frequency_sp.drop_duplicates(subset='Word', keep="last")

    fig = px.bar(noun_verb_frequency_sp, x='Word', y='Word frequency', color='Word type', title='Noun/verb frequencies for Category: ' + keyword)
    fig.update_layout(xaxis_categoryorder = 'total descending')

    fig.show()

In [19]:
fig_frequencies(scores_categories_physical, 'Physical');


[W123] Argument disable with value [] is used instead of ['senter'] as specified in the config. Be aware that this might affect other components in your pipeline.



In [20]:
fig_frequencies(scores_categories_mental, 'Mental');

In [21]:
fig_frequencies(scores_categories_emotional, 'Emotional');

In [22]:
fig_frequencies(scores_categories_community, 'Community');

In [23]:
fig_frequencies(scores_categories_spiritual, 'Spiritual');

In [24]:
fig_frequencies(scores_categories_occupational, 'Occupational');

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=443b2d8b-ed93-43a6-bbe3-c3fe69ac80e1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>