# Quantified Self Assignment

## Part 2. Blog post

### COMM2550 (Spring 2023)

# Money can buy happiness!
Just kidding. Or at least it seemed so...

Every day for the past 2.5 months or so, I filled up a self-report questionnaire that asked about things like my mood, sleep time, how much I spent, and my activities for the day. These are my results.

### The questionnaire &#x1F4AC;

Google Forms was used to set up the questionnaire. 

There were a total of 8 questions, some of which were multiple choice or time input questions, while others were rating questions:

| No. | Question | Task |
| --- | --- | --- |
| 1 | How stressed do you feel right now? | Rate on a scale from 0-10 |
| 2 | What time did you sleep last night? | Input time |
| 3 | What time did you wake up today? | Input time |
| 4 | Did you feel lethargic when you woke up today? | Select one of the following: <br><ul><li>Yes</li> <li>No</li></ul> |
| 5 | What activities did you do/are you doing today? | Select one or more of the following: <br><ul><li> Attend classes </li> <li> Study/complete assignments </li> <li> Go out with friends </li> <li> Watch shows/entertainment activities </li> <li> Drink coffee </li> <li> Exercise </li> <li> Other: Specify the activity by typing in manually </li></ul> | 
| 6 | How much did you spend today? | Select one of the following: <br><ul><li>\$0-10 </li> <li>\$10-20</li> <li>\$20-30</li> <li>\$30-40</li> <li>\$40-50</li> <li>\$50 or more</li></ul> |
| 7 | How happy do you feel overall today? | Rate on a scale from 0-10 |
| 8 | Overall, did you enjoy/are you enjoying today more than yesterday? | Select one of the following: <br><ul><li>Yes</li><li>About the same</li><li>No</li></ul> |

### The data collected &#x1F4BE;
After several weeks of answering the questions every day, it's time to look at the data collected! 

Google Forms stored the responses in a spreadsheet, which was downloaded as a csv file for easier manipulation in Python. For reference, it is saved as `my_daily_log.csv` in the `data` folder.

<figure>
  <img src="data/form_results.png">
  <figcaption>Questionnaire responses shown in spreadsheet via Google Forms</figcaption>
</figure>

The downloaded csv file is then loaded into a Pandas dataframe in Python, as shown below.

In [1]:
from datetime import timedelta
import numpy as np
import pandas as pd

my_log = pd.read_csv('data/my_daily_log.csv')
my_log.head(10)

Unnamed: 0,Timestamp,How stressed do you feel right now?,What time did you sleep last night?,What time did you wake up today?,Did you feel lethargic when you woke up today?,What activities did you do/are you doing today?,How much did you spend today?,How happy do you feel overall today?,"Overall, did you enjoy/are you enjoying today more than yesterday?"
0,2/1/2023 20:15:29,5,1:00:00 AM,11:00:00 AM,Yes,"Attend classes, Watch shows/entertainment acti...",$10-20,7,Yes
1,2/2/2023 20:21:46,5,1:00:00 AM,9:15:00 AM,Yes,"Attend classes, Study/complete assignments",$0-10,6,About the same
2,2/3/2023 20:19:07,4,1:00:00 AM,7:00:00 AM,Yes,"Go out with friends, Drink coffee",$30-40,7,Yes
3,2/4/2023 20:28:40,2,11:00:00 PM,8:00:00 AM,Yes,Go out with friends,$30-40,8,Yes
4,2/5/2023 20:17:33,4,12:00:00 AM,9:00:00 AM,Yes,Go out with friends,>$50,8,About the same
5,2/6/2023 20:37:10,8,1:00:00 AM,8:00:00 AM,Yes,"Attend classes, Study/complete assignments",$20-30,5,No
6,2/7/2023 20:17:43,4,12:30:00 AM,9:30:00 AM,Yes,"Attend classes, Study/complete assignments",$20-30,5,About the same
7,2/8/2023 23:05:33,8,2:00:00 AM,10:00:00 AM,Yes,"Attend classes, Study/complete assignments, Dr...",$10-20,2,No
8,2/9/2023 20:32:02,5,1:00:00 AM,7:00:00 AM,Yes,"Attend classes, Study/complete assignments, Dr...",$0-10,2,About the same
9,2/10/2023 20:26:44,7,4:00:00 AM,10:00:00 AM,No,"Study/complete assignments, Go out with friend...",$30-40,5,Yes


Argh, the entire question text became the columns... And also note the `Timestamp` column (kindly) added in automatically by Google Forms to track the date and time I submitted each day's entry!

I cleaned up some of the data and renamed the columns so they aren't thaaaat long - it's easier to see (and type)! 

![](data/yay.gif)  
_Source: https://i.imgflip.com/5tb0ro.gif_

Here's the variables we now have:

| Variable | Data type | Brief description | Remarks |
| --- | --- | --- | --- |
| `ts` | Continuous | Date and time of each record | Stored as datetime object |
| `stress_level` | Continuous | Rating of my stress level for the day from 0 to 10 | Ratings are integer values only, but calculations can utilize floats |
| `bed_time` | Continuous | Date and time I went to bed | Stored as datetime object |
| `wake_time` | Continuous | Date and time I woke up | Stored as datetime object |
| `lethargic` | Categorical | Whether I felt lethargic when I woke up | 0: No <br> 1: Yes |
| `activities` | Categorical | Activities I did for the day | Stored as list of strings (activity types) |
| `spent_amt` | Categorical | Amount of money I spent for the day, in blocks of $10 | Stored as ordered Categorical type |
| `happy_level` | Continuous | Rating of my happiness level for the day from 0 to 10 | Ratings are integer values only, but calculations can utilize floats |
| `enjoy_more` | Categorical | Whether I enjoyed the day more than the previous day | -1: No <br> 0: About the same <br> 1: Yes |

I also derived some new variables from the original ones:

| Variable | Data type | Brief description | Remarks |
| --- | --- | --- | --- |
| `day_of_week` | Continuous | The day of the week | Technically categorical, but can be treated as continuous for some calclulations <br><br> 0: Monday to 6: Sunday |
| `sleep_dur` | Continuous | My daily sleep duration | Calculated as the time difference between `bed_time` and `wake_time` |
| `spent_amt_cd` | Categorical | Integer coded values for `spent_amt` | 0: $0-10 <br> 1: $10-20 <br> 2: $20-30 <br> 3: $30-40 <br> 4: $40-50 <br> 5: $50 or more |
| `num_activities` | Categorical | Number of activity types I did for the day | Derived from the number of options selected in `activities` |
| `attend_class` | Categorical | Whether one of the activities for the day was attending class | Stored as 0: No and 1: Yes |
| `study_work` | Categorical | Whether one of the activities for the day was studying/completing assignments | Stored as 0: No and 1: Yes | Stored as 0: No and 1: Yes |
| `go_out` | Categorical | Whether one of the activities for the day was going out with friends | Stored as 0: No and 1: Yes |
| `entertain` | Categorical | Whether one of the activities for the day was entertainment activities | Stored as 0: No and 1: Yes |
| `coffee` | Categorical | Whether one of the activities for the day was drinking coffee | Stored as 0: No and 1: Yes |
| `exam` | Categorical | Whether one of the activities for the day was taking an exam | Stored as 0: No and 1: Yes |

### The analysis and results &#x1F50E;

Alright, enough of the tables and descriptions of the variables. Let's get down to analyzing the data!

#### How many records were there? Which period of time do the data come from?
I honestly could not remember when I started filling in the questionnaire (or stopped doing so, for that matter), but `ts` to the rescue!

(Note that I saved the dataframe as a pickle so object types are preserved after analysis)

In [2]:
my_log = pd.read_pickle('data/blog_post.pkl')
my_log['ts'].describe()

count                               70
mean     2023-03-09 03:23:47.285714432
min                2023-02-01 20:15:29
25%         2023-02-20 02:28:47.500000
50%         2023-03-09 08:34:29.500000
75%                2023-03-26 14:56:26
max                2023-04-12 20:15:42
Name: ts, dtype: object

I started logging on February 1, and stopped logging on April 12.

But something seems amiss... If I logged daily (as I _should_ have been), there should be a total of 28 days (Feb) + 31 days (Mar) + 12 days (Apr) = 71 days, or 71 rows in the table!

![](data/ohno.gif)  
_Source: https://media.tenor.com/PqgTNvSN8wIAAAAM/kermit-worried.gif_

In [3]:
my_log[my_log['ts'].dt.date.diff() > timedelta(days=1)]

Unnamed: 0,ts,stress_level,bed_time,wake_time,lethargic,activities,spent_amt,happy_level,enjoy_more,day_of_week,spent_amt_cd,sleep_dur,num_activities,attend_class,study_work,go_out,entertain,coffee,exercise,exam
16,2023-02-18 20:35:50,3,2023-02-19 01:30:00,2023-02-19 11:30:00,0,{Watch shows/entertainment activities},$0-10,7,1,5,0,0 days 10:00:00,1,0,0,0,1,0,0,0


Okay, there's no record for Feb 17. That date sounds familiar... 

It was the Saturday after Valentines and I had 2 midterms in a row and could finally get a break on a trip to New Jersey with my friends! Oops, forgive me!

#### At what time of the day did I usually submit the questionnaire?

With the remaining 70 records, what time did I usually submit my daily log? 

I plotted a graph of all the time data in `ts` and found that the most common time was 8.15pm -- you're right, I set a daily alarm to remind me to complete my daily log at... 8.15pm! &#x23F0;

<div align="center">
<figure>
  <img src="data/logging_time.png">
  <figcaption>Count of logging times</figcaption>
</figure>
</div>

Clearly, the more frequent times I completed my daily log were 8.14pm and several times slightly past 8.15pm. These timings are close to the usual daily reminder time, so it's not surprising that I would generally remember to submit the daily log around that time.

However, there were about 10-15 occassions when the log was submitted later past 9.30pm. 

Nonetheless, there were no submissions after the day had lapsed (12mn), yay!

**_Are there specific days of the week when I tended to submit the daily log later?_**

Were there any patterns as to when I usually submitted the logs later and not at the scheduled time?

<div align="center">
<figure>
  <img src="data/weeklog.png">
  <figcaption>Count of logging times by hour of the day and day of the week</figcaption>
</figure>
</div>

Well, it turns out that there might be! There was a greater variation in the time I completed the daily log on weekends, which is not very surprising, since I tend to be out on the weekends and end up missing the daily reminder. 

It seems odd though, that I submit the questionnaire late sometimes on Monday and Wednesdays, while also submitted the questionnaire between 8-8.59pm on all Tuesdays and Thursdays.

<div align="center">
  <b>~~~~~~~~~~ &#x1F4C5; MY WEEKLY SCHEDULE (A PART OF IT) &#x1F4C5; ~~~~~~~~~~</b>
  <br>
  <br>
  <table>
  <tr>
    <th>Monday</th>
    <th>Tuesday</th> 
    <th>Wednesday</th>
    <th>Thursday</th>
    <th>Friday</th>
  </tr>
  <tr>
    <td>
      12.15 pm: Class A
    </td>
    <td>
      10.15 am: Class B <br>
      3:30 pm: Class C <br>
      <b><i>5.15 pm: COMM2550</i></b>
    </td>
    <td>
      12.15 pm: Class A <br>
      9pm: CCA
    </td>
    <td>
      10.15 am: Class B <br>
      3:30 pm: Class C <br>
    </td>
    <td>
      No class!
    </td>
  </tr>
</table>
</div>

So, this class (COMM2550) being held on Tuesdays likely explains why I always remember to submit the daily log on time. Either being in class or having just ended the class that I'm supposed to complete this assignment for certainly reminded me to submit the questionnaire!

And even though COMM2550 is not held on Thursdays, I somehow tended to complete the daily log at the same time intuitively, since the other classes follow the same schedule.

With my CCA being held close to the time period of 8-8.59pm on Wednesday, I guess that's why I occasionally forgot to complete the daily log on time and only did it later on.

On the rest of the days when I have nothing scheduled weekly around 8.15pm, that's when I tend to miss the scheduled timing.

#### How stressed was I during this period?

School, assignments, tests, work... Oh, the sorrows of life and all the contributors to stress! So how did I fare, subjectively?

In [4]:
my_log['stress_level'].describe()

count    70.000000
mean      4.500000
std       2.276216
min       1.000000
25%       2.000000
50%       5.000000
75%       6.000000
max       9.000000
Name: stress_level, dtype: float64

I rated my stress level 4.5 out of 10 on average. That's actually better than I expected; I imagined it would be around 5-6, given the new pimples that have just started growing on my nose...

But my highest stress rating was 9/10 -- that had to be the time when I had 2 midterms back-to-back, argh!

<div align="center">
<figure>
  <img src="data/stress.png">
  <figcaption>My self-reported stress levels for the past 2.5 months</figcaption>
</figure>
</div>

Looking at the graph of my stress level during this period, I had 3 striking observations:

1. My stress levels appear to spike sporadically above the average (black dotted line).

2. My stress levels were generally higher in the earlier half of February and have been higher than the mean for the most part since the second half of March. Checking my calendar, I'm sure these periods align with when I had more submissions due... (Oh, the stress of not meeting deadlines!)

3. Interestingly, the entire period between Feb 15 and Mar 15 was below the mean. This... was spring break! No wonder my stress level was lower than usual; I thoroughly enjoyed myself in Miami.

#### How happy was I during this period?

Stress aside, how happy was I overall? Am I happier when I feel less stressed? Let's see.

In [5]:
my_log['happy_level'].describe()

count    70.000000
mean      5.671429
std       1.481494
min       2.000000
25%       5.000000
50%       6.000000
75%       7.000000
max       8.000000
Name: happy_level, dtype: float64

I rated my level of happiness about 6/10 overall. I honestly thought I felt happier, but I suppose the data speaks for itself. My range of happiness rating was between 2-8, which is pretty good in my opinion. At least it wasnt 0 or 1 out of 10!

<div align="center">
<figure>
  <img src="data/happy.png">
  <figcaption>My self-reported happiness levels for the past 2.5 months</figcaption>
</figure>
</div>

Looking at the trend of happiness level across time, I think it looks normal with fluctuations around the mean (black dotted line). 

There aren't really any distinct periods of sustained lower or higher happiness levels, though the lowest rating was in the first half of February, along with 2 dips between late March and early April. Again, these periods corresponded generally to when I had more submissions/exams, much like my stress levels that spiked during the same time periods.

**_Were there any differences in my happiness ratings between weekdays and weekends, or days with and without classes?_**

I wondered, given how there were differences between weekends and weekdays or class days and non-class days for my daily log submission timings, could there also be differences in terms of my happiness level during such periods?

In [6]:
is_wd = my_log.set_index('ts').index.weekday < 5  # weekdays 0-4 (Mon-Fri)
is_cd = my_log.set_index('ts').index.weekday < 4  # class days 0-3 (Mon-Thu)
happy = my_log['happy_level']
happy_wd = happy[is_wd]
happy_we = happy[~is_wd]
pd.concat([happy_wd.describe(), happy_we.describe()], axis=1, keys=['weekday', 'weekend'])

Unnamed: 0,weekday,weekend
count,50.0,20.0
mean,5.46,6.2
std,1.487499,1.361114
min,2.0,3.0
25%,5.0,5.75
50%,5.0,7.0
75%,7.0,7.0
max,8.0,8.0


In [7]:
happy_cd = happy[is_cd]
happy_ncd = happy[~is_cd]
pd.concat([happy_cd.describe(), happy_ncd.describe()], axis=1, keys=['days with class', 'days without class'])

Unnamed: 0,days with class,days without class
count,41.0,29.0
mean,5.195122,6.344828
std,1.400348,1.343659
min,2.0,3.0
25%,5.0,6.0
50%,5.0,7.0
75%,6.0,7.0
max,8.0,8.0


It turns out, there are differences too.

Generally, I rated weekends happier on average, for both weekend and days without class, compared to weekdays and days with class. 

I guess having class makes me less happy...

#### How often did I have positive-negative or negative-positive mood transitions?

Since I rated my day overall in terms of enjoyment level compared to the day prior, I am able to see how often my enjoyment stayed positive or negative, or changed from positive to negative (or vice versa).

As a reminder, I recoded the log responses to -1, 0, and 1, which reflects negative transitions, remaining neutral, and positive transitions.

In [8]:
my_log['enjoy_more'].describe()

count    70.000000
mean      0.100000
std       0.725319
min      -1.000000
25%       0.000000
50%       0.000000
75%       1.000000
max       1.000000
Name: enjoy_more, dtype: float64

<div align="center">
<figure>
  <img src="data/enjoy_count.png">
  <figcaption>Frequency of whether I enjoyed the day more than the previous day</figcaption>
</figure>
</div>

It is not surprising that the mean is close to 0 which is neutral, since the valence upon recoding can be balanced out (-1 and +1). This can also be seen by the counts of -1 compared to 1 also being similar. 

Nonetheless, the slight positive average enjoyment signifies that I did have slightly more positive transitions overall.

<div align="center">
<figure>
  <img src="data/enjoy_time.png">
  <figcaption>Whether I enjoyed the day better compared to the previous day <br> <i>Green: Yes; Red: No; White: About the same</i></figcaption>
</figure>
</div>

For the figure above:

| Color | Value | Meaning |
| --- | --- | --- |
| Green | 1 | Yes/Enjoyed more today than yesterday |
| Red | -1 | No/Enjoyed less today than yesterday |
| Unshaded/White | 0 | About the same/Enjoyed today as much as yesterday |

There does not seem to be any patterns on when I enjoyed more/less compared to the previous day, based on the day of the week. But there appears to be more frequent transitions from March onwards.

&rarr; This period corresponds to a sustained period of higher stress and lower happiness levels as observed above. It may be possible that my mood is less stable or with more assignments due my stress level is increased, so there are more transitions between enjoying/not enjoying the days.

<br>
There was also a period between mid February to early March when the enjoyment compared to the previous day was neutral (similar to yesterday) or positive only

&rarr; This period also corresponds to a previously mentioned period -- spring break. I definitely enjoyed myself then, which explains such an observation!

#### How well did I sleep during this period?

Beyond my mood, how much sleep did I manage to get every day? Am I sleeping enough (at least 7 hours per night, based on CDC guidelines)?

<div align="center">
<figure>
  <img src="data/sleep_reqd.png" style="width:70%">
  <figcaption>Recommended hours of sleep per night<br> <i>Source: https://www.cdc.gov/sleep/about_sleep/how_much_sleep.html</i></figcaption>
</figure>
</div>

In [9]:
my_log['sleep_dur'].describe()

count                           70
mean     0 days 07:45:51.428571428
std      0 days 01:35:44.102025080
min                0 days 02:00:00
25%                0 days 06:45:00
50%                0 days 08:00:00
75%                0 days 09:00:00
max                0 days 10:30:00
Name: sleep_dur, dtype: object

<div align="center">
<figure>
  <img src="data/sleepdur.png">
  <figcaption>Count of the number of hours of sleep I had for the past 2.5 months</figcaption>
</figure>
</div>

Sweet dreams! I actually averaged 7 hours 45 minutes of sleep in the past 2.5 months!

But the least I've slept (once) was 2 hours...

<div align="center">
<figure>
  <img src="data/sleepdur_time.png">
  <figcaption>Number of hours of sleep I had for the past 2.5 months</figcaption>
</figure>
</div>

Looking across the dates, there is one obvious spike down to only 2 hours of sleep  
&rarr; This corresponded to the night before my friends and I headed to New York and I could not sleep (I was excited!)

There was also a period of shorter sleep durations in the first half of March  
&rarr; Again, this corresponded to spring break when I was with my friends in Miami.

**_Were there any differences in my sleep duration between certain periods (e.g. weekday/weekend)?_**

Now, is there also a difference in sleep duration based on the day of the week, similar to my happiness levels and enjoyment cycles?

In [10]:
sleep_df = my_log.set_index('ts')['sleep_dur'].astype('timedelta64[s]')
sleep_wd = sleep_df[is_wd]
sleep_we = sleep_df[~is_wd]
pd.concat([sleep_wd.describe(), sleep_we.describe()], axis=1, keys=['weekday', 'weekend'])

Unnamed: 0,weekday,weekend
count,50,20
mean,0 days 07:25:48,0 days 08:36:00
std,0 days 01:38:49,0 days 01:06:15
min,0 days 02:00:00,0 days 06:30:00
25%,0 days 06:30:00,0 days 08:00:00
50%,0 days 07:07:30,0 days 08:52:30
75%,0 days 08:30:00,0 days 09:07:30
max,0 days 10:00:00,0 days 10:30:00


In [11]:
sleep_cd = sleep_df[is_cd]
sleep_ncd = sleep_df[~is_cd]
pd.concat([sleep_cd.describe(), sleep_ncd.describe()], axis=1, keys=['days with class', 'days without class'])

Unnamed: 0,days with class,days without class
count,41,29
mean,0 days 07:38:02,0 days 07:56:53
std,0 days 01:21:22,0 days 01:53:37
min,0 days 04:00:00,0 days 02:00:00
25%,0 days 06:45:00,0 days 06:30:00
50%,0 days 07:30:00,0 days 08:30:00
75%,0 days 08:30:00,0 days 09:00:00
max,0 days 10:00:00,0 days 10:30:00


It seems that I slept about 1h longer on weekends than weekdays, which isn't really that surprising, since I usually went out on the weekends.

Comparing between days when I have classes and days without, there appears to be little to no difference as well.

So overall, there does not seem to be any specific cycle for sleep duration, unlike my happiness levels and enjoyment transitions.

#### How lethargic was I during this period?

In [12]:
cnt = my_log['lethargic'].value_counts()
pct = my_log['lethargic'].value_counts(normalize=True) * 100
pd.concat([cnt, pct], axis=1, keys=['counts', '%'])

Unnamed: 0_level_0,counts,%
lethargic,Unnamed: 1_level_1,Unnamed: 2_level_1
0,38,54.285714
1,32,45.714286


It appears that there were slightly more 0s ("No") than 1s ("Yes"), but it is still relatively equal between yes/no; 54% of the time I felt energized!

<div align="center">
<figure>
  <img src="data/lethargic_time.png">
  <figcaption>Whether I felt lethargic upon waking up for the past 2.5 months <br> <i>Dark purple: Yes, Light red: No</i></figcaption>
</figure>
</div>

The oscillations between yes/no across consecutive days appear to be increasingly more frequent as time passed. 

Notably, the first 10 days of February had a prolonged period when I felt more lethargic, corresponding to the period of higher stress I felt as mentioned previously.

There were also intermittent periods of lethargy lasting several days in late February and mid March, again corresponding to the periods when I had more exams/submissions.

#### How much did I spend per day?

<div align="center">
<figure>
  <img src="data/spend_bar.png">
  <figcaption>Frequency of each daily spending category</figcaption>
</figure>
</div>

The most common spending bracket/category was $0-10 (28 times), but the next most common spending bracket/category was >$50! 

This could simply be due to ">$50" being an open category / a category of larger size compared to the other evenly-spaced categories, however.

I also wondered how much I spent in the past 2.5 months so I multiplied the counts by the mid-way value of each $10 block/category (except for ">$50"):

| Category | Mid-way value (A) | Count (B) | Subtotal (AxB) |
| --- | --- | --- | --- |
| $0-10 | $5 | 28 | $140 |
| $10-20 | $10 | 11 | $110 |
| $20-30 | $25 | 10 | $250 |
| $30-40 | $35 | 6 | $210 |
| $40-50 | $45 | 2 | $90 |
| >$50 | $55 | 13 | $715 |
| **_Total_** | --- | **_70_** | **_$1315_** |

$1315 in 70 days = $18.78 per day

That does not sound too bad to me (read: I am afraid to even peek at my bank balance)! Anyway, I'm sure it's an underestimation, considering the open category of >$50 is too broad, but it's nonetheless a reasonable estimate from the collected data. 

Looking at my daily spendings across time next...

<div align="center">
<figure>
  <img src="data/spending_time.png">
  <figcaption>How much I spent per day for the past 2.5 months <br><i>(Darker colored boxes show higher spending)</i</figcaption>
</figure>
</div>

It can be seen that the daily spending is usually below $20, but there seems to be somewhat cyclical spikes to $50 or more in the days leading up to or on the weekends. That's somewhat expected, since I usually am outside Saturday/Sunday.

There is also a short period in the first half of March when the daily spending was consistently $50 or more, which corresponds to (again) when I was in Miami on spring break.

#### What kinds of activities did I partake in during this period?

In the figure below, grids shaded dark purple represent values of 1 (the activity took place for the particular day), while grids shaded purple represent values of 0 (the activity did not happen).

I spent an approximately equal number of days going out with friends and going for classes, but they appear to be mutually exclusive for the most part! If I went for classes, I did not go out with my friends (and vice versa).

One thing I noticed was that `attend_class` did not always correspond to my weekly class schedule (which means I did not go for class, sometimes...) But I found that **I always attended class on Tuesday!**

> I love my Tuesday classes (emphasis: COMM2550 is on Tuesday)

It is also apparent that there _I have not exercised for the past 2.5 months_......

<div align="center">
<figure>
  <img src="data/activities_time.png">
  <figcaption>Whether an activity was one of the activities I did in the day, for the past 2.5 months</figcaption>
</figure>
</div>

At this juncture I'm sure you're wondering...

#### Are there any correlations between the variables?

From everything that we've analyzed so far, I'm sure it can be gathered that many of the variables appear related. 

Time periods when I felt more stressed, I felt less happy. When I felt more stressed, I also felt more lethargic. These time periods were also aligned to my exam/submission dates.

But are there any concrete, numeric values to prove these correlations? We can calculate some correlation coefficients for the continuous variables. Note that categorical values recoded as integers are also included.

In [13]:
corr_df = my_log.drop(['bed_time', 'wake_time', 'activities', 'spent_amt', 'exercise'], axis=1)
corr_mat = corr_df.corr()
corr_mat

Unnamed: 0,ts,stress_level,lethargic,happy_level,enjoy_more,day_of_week,spent_amt_cd,sleep_dur,num_activities,attend_class,study_work,go_out,entertain,coffee,exam
ts,1.0,0.394683,-0.248586,-0.120006,-0.222627,-0.029351,-0.148201,-0.099832,0.099361,-0.04507877,0.324201,0.131155,-0.231938,0.038579,-0.1147882
stress_level,0.394683,1.0,-0.101516,-0.556554,-0.337963,-0.101153,-0.4086,0.043229,0.49097,0.1290349,0.689786,-0.136359,-0.291409,0.462832,0.06136339
lethargic,-0.248586,-0.101516,1.0,-0.048463,-0.167255,-0.158969,0.140215,-0.204386,0.012256,0.2458538,-0.074505,-0.020199,-0.044219,-0.014997,-0.1431641
happy_level,-0.120006,-0.556554,-0.048463,1.0,0.51656,0.2445,0.623082,-0.118051,-0.357455,-0.3132406,-0.605605,0.536166,0.26739,-0.35746,-0.315167
enjoy_more,-0.222627,-0.337963,-0.167255,0.51656,1.0,0.170624,0.208108,-0.09413,-0.107284,-0.1133836,-0.320697,0.216001,0.138292,-0.127091,0.03851448
day_of_week,-0.029351,-0.101153,-0.158969,0.2445,0.170624,1.0,0.198647,0.145182,-0.230856,-0.5424424,-0.214466,0.348862,0.105104,-0.040628,-0.1069888
spent_amt_cd,-0.148201,-0.4086,0.140215,0.623082,0.208108,0.198647,1.0,-0.400111,-0.320241,-0.2750059,-0.515643,0.621082,-0.172147,-0.110338,-0.2561037
sleep_dur,-0.099832,0.043229,-0.204386,-0.118051,-0.09413,0.145182,-0.400111,1.0,-0.04452,0.05399612,0.019958,-0.317461,0.256431,-0.010464,-0.1163022
num_activities,0.099361,0.49097,0.012256,-0.357455,-0.107284,-0.230856,-0.320241,-0.04452,1.0,0.4429031,0.621614,0.005307,0.14099,0.48033,0.2689805
attend_class,-0.045079,0.129035,0.245854,-0.313241,-0.113384,-0.542442,-0.275006,0.053996,0.442903,1.0,0.294628,-0.467275,-0.039344,-0.013344,-5.2640230000000005e-17


Ew. That table is hard to read. 

Let me make this easier -- here are the variable pairs with the strongest correlations:

In [14]:
corr_summ = corr_mat.unstack().sort_values(ascending=False).drop_duplicates()
corr_summ[(abs(corr_summ) < 1) & (abs(corr_summ) >= 0.5)]

study_work      stress_level    0.689786
spent_amt_cd    happy_level     0.623082
num_activities  study_work      0.621614
spent_amt_cd    go_out          0.621082
happy_level     go_out          0.536166
enjoy_more      happy_level     0.516560
study_work      spent_amt_cd   -0.515643
attend_class    day_of_week    -0.542442
happy_level     stress_level   -0.556554
study_work      happy_level    -0.605605
dtype: float64

And here is the correlation matrix, color-coded (DARKER (RED) VALUES ARE MORE STRONGLY CORRELATED):

In [15]:
mask = np.zeros_like(corr_mat, dtype=bool)
mask[np.triu_indices_from(mask)] = True
corr_mat[mask] = np.nan
(corr_mat.style.background_gradient(cmap='twilight_shifted', axis=None, vmin=-1, vmax=1)
 .highlight_null(color='#f1f1f1').format(precision=3))

Unnamed: 0,ts,stress_level,lethargic,happy_level,enjoy_more,day_of_week,spent_amt_cd,sleep_dur,num_activities,attend_class,study_work,go_out,entertain,coffee,exam
ts,,,,,,,,,,,,,,,
stress_level,0.395,,,,,,,,,,,,,,
lethargic,-0.249,-0.102,,,,,,,,,,,,,
happy_level,-0.12,-0.557,-0.048,,,,,,,,,,,,
enjoy_more,-0.223,-0.338,-0.167,0.517,,,,,,,,,,,
day_of_week,-0.029,-0.101,-0.159,0.244,0.171,,,,,,,,,,
spent_amt_cd,-0.148,-0.409,0.14,0.623,0.208,0.199,,,,,,,,,
sleep_dur,-0.1,0.043,-0.204,-0.118,-0.094,0.145,-0.4,,,,,,,,
num_activities,0.099,0.491,0.012,-0.357,-0.107,-0.231,-0.32,-0.045,,,,,,,
attend_class,-0.045,0.129,0.246,-0.313,-0.113,-0.542,-0.275,0.054,0.443,,,,,,


In my opinion, the more highly correlated variables are mostly expected, and in accordance to what I have talked about previously. There were also some interesting correlations.
    
1. I felt more stress on days when I had to study/complete my assignments

2. Correspondingly, when I was more stressed, I felt less happy overall

3. Going out with friends is positively correlated with happiness and enjoyment, but also greatly increases my daily spending (I should take note when I'm out with my friends next time!)

4. The day of the week also correlates with going for class since schedules repeat weekly

5. When I slept more, I tended to spend less (hmm, maybe I was more prudent with my spending when I had more sleep!)

6. At the same time, when I went out with my friends, I slept noticeably less! This isn't very surprising if I consider that my friends and I were travelling on most of those days when I went out with them, so I naturally had fewer hours of sleep (we wanted to make the most of our time at the location during our travel!)

But I also felt that some correlations were slightly unexpected/odd and that stood out to me:

1. While it is not surprising that drinking coffee is moderately and positively correlated with stress level, it appears that drinking coffee is not correlated with lethargy or sleep duration

2. Sleep duration is moderately correlated with spending amount, but not correlated with stress
 
3. Having an exam is suprisingly not correlated with stress or studying, which seems slightly odd

4. Spending more money is highly correlated with greater happiness and less stress (in fact, it is the second most highly correlated pair of variables...!)

#### Is drinking coffee really not associated with lethargy or sleep duration?

I definitely expected that I would drink more coffee when I feel lethargic, but it seems that isn't the case, according to the correlation matrix. Let's look at the data points.

<div align="center">
<figure>
  <img src="data/corr_lethargic.png">
  <figcaption>Coffee, sleep duration, and lethargy</figcaption>
</figure>
</div>

It can be easily observed that there is indeed no distinct difference between the clustering of data points for drinking coffee and not drinking coffee, in terms of sleep duration!

For both drinking coffee and not drinking coffee, lethargy is also not a significant factor in distinguishing between longer and shorter sleep durations. 

So, it seems that I do not drink coffee more often when I'm feeling lethargic. Interesting.

> I drink coffee not because I'm tired, but for... _fun_?

#### Are sleep duration and stress level really not related to each other?

I would have believed that I sleep less when I'm feeling more stressed, or that I would feel more stressed if I do not have sufficient sleep. Is that really not the case based on the data?

<div align="center">
<figure>
  <img src="data/corr_sleep.png">
  <figcaption>Sleep duration and stress level for the past 2.5 months</figcaption>
</figure>
</div>

The line graphs show that sleep duration and stress coincided with one another (positive relationship) during certain periods, but with stress being showing the trends slightly after that of sleep duration. 

Thinking back, this slight delay might be due to me being unable to sleep as I was stressed, or also because I felt stressed when I slept less the previous night.

However, between Feb 15 - Mar 15 (again, the period leading up to spring break), there is instead an obvious inverse relationship between sleep duration and stress. For this period, I recall sleeping much less as my friends and I were trying to maximize our time travelling!

This significant contrast could explain why the overall correlation is weak to none.

> Being stressed does not always mean less sleep or vice versa!

Considering that there are stronger correlations between sleep duration and other variables, as well as my prior observations (e.g. that I sleep less when I go out with friends), I was curious if my sleep and lethargy varied based on my stress level and whether I went out with my friends.

<div align="center">
<figure>
  <img src="data/sleep_other.png">
  <figcaption>Sleep duration and lethargy for the past 2.5 months, grouped by happiness level and going out with friends</figcaption>
</figure>
</div>

It can be observed from the graphs that:

1. **Sleep duration and going out**  
Sleep duration is indeed quite varied. When I did not go out with friends, I slept generally more than when I went out with my friends.

2. **Stress level and going out**  
When I did not go out with my friends, my stress levels were more frequently higher (perhaps it is because I had more assignments/exams &rarr; more stress &rarr; I chose not to go out with friends)

3. **Stress level, lethargy and going out**
Being more stressed did not necessarily mean I would feel more lethargic. Moreover, when I went out with my friends, I felt less stress and less lethargic more often than when I did not go out with my friends.

4. **Sleep duration, lethargy and going out**  
When I did go out with my friends, sleep duration varied more, and lethargy was related to my sleep duration. But when I did not go out with my friends, sleep duration was not really related with lethargy.

> Overall, sleep duration is affected by a multitude of factors and not just stress level. Moreover, being more stressed could also be related to less sleep as it means I have deadlines to meet for assignments/exams, but at the same time, having more sleep could also mean I feel less stressed the next day.

#### Did I really not feel more stressed or study more when I had exams?

Did I really not feel more stressed or study on the day I had an exam?! That seems to contradict my previous observations that I feel more stressed and less happy near to exam/assignment submission periods?

<div align="center">
<figure>
  <img src="data/corr_exam.png">
  <figcaption>Stress level for the past 2.5 months, with exam days and study days marked <br><i>Red dashed line: Exam day; Green dotted line: Study day</figcaption>
</figure>

In the graph above, red dashed lines indicate the days when there was an exam; green dotted lines indicate the days when there was studying.

It all becomes apparent from the graph: Stress level is usually higher in the days just preceding an exam, but drops on the day of the exam. This could be due to the questionnaire being asked near the end of the day, when I had already taken the exam and feel much LESS stressed!

It can also be observed that stress level usually increases when there are clusters of study days together, and stress remains on the higher levels when study days are more frequent as well.

> Exams are still related to stress levels or studying, just that the indicators of stress and studying are displaced slightly earlier, prior to the exam.

And now for the million dollar question:

#### Does spending more money really mean more happiness and less stress?

<div align="center">
<figure>
  <img src="data/corr_money.png">
  <figcaption>Happiness level for the past 2.5 months, grouped by stress level and categorized by daily spending bracket</figcaption>
</figure>

From the plots above, it can be seen that the data points appear to cluster toward higher happiness and lower stress ratings when daily spending increases. So, spending more money does mean more happiness and lower stress!

_**No!**_

The plots also show that, as daily spending increases, the data points are increasingly exclusive to only going out with friends (vs not going out with friends). This means that higher spendings could logically also just be due to me going out and thus spending more!

Moreover, given the high correlations between stress level, happiness level, daily spending and going out with friends, it could be that spending just tended to be higher when going out with friends, and when I go out or are with my friends, my stress level is naturally lower while my happiness level is higher.

Ultimately, as people always say:

> Causality cannot be established from correlations!

### Conclusion &#x1F4B8;

Money does not guarantee happiness. I repeat: money ≠ happiness!

This daily log assignment has truly allowed me to gain new and interesting insights about myself, about my mood, my daily spendings... and my lack of exercise.

It was definitely not straightforward or easy trying to look through all the patterns in the data (and to be honest, I feel like there's more to be done, but there's so much here already), but I have surely learned a lot!

And now... it's probably a good time for me to get up from my desk, and start exercising.