# Introduction

Since March 11, 2020, I've been consistently maintaining a diary in which I have been documenting what I did, how the day unfolded and how I felt. Over time, as I continued this daily habit, curiosity crept in and I started wondering about what my diary could reveal about me. Was I happier in the past, or am I happier now? How many bad days have I had? How many good days?  
In this notebook I plan to answer those questions and to gain some insights about myself and my life.

## Set Up

In [None]:
# Install dependencies
%pip install polars pandas matplotlib seaborn "black[jupyter]"

# Data Loading

> Before using this notebook its essential to have first extracted the metadata from my diary.

In [None]:
import polars as pl

In [None]:
# Define paths
csv_path = "../data/processed/diary.csv"

In [None]:
df = pl.read_csv(
    csv_path,
    try_parse_dates=True,
)

In [None]:
df = df.with_columns(
    year=df["year"].cast(pl.Int16),
    month=df["month"].cast(pl.Int8),
    day=df["day"].cast(pl.Int8),
    week=df["week"].cast(pl.Int8),
    weekday=df["weekday"].cast(pl.Int16),
    mood=df["mood"].cast(pl.Categorical),
    encoded_mood=df["encoded_mood"].cast(pl.Int8),
)

In [None]:
df.sample(5)

# Initial Data Exploration

In [None]:
(
    df.group_by("mood")
    .agg(pl.col("mood").count().alias("count"))
    .sort("count", descending=True)
)

In [None]:
(
    df.group_by("encoded_mood")
    .agg(pl.col("encoded_mood").count().alias("count"))
    .sort("encoded_mood")
)

In [None]:
df["encoded_mood"].sum()

In [None]:
df.group_by("year").sum().select(["year", "encoded_mood"]).sort("year")

In [None]:
(
    df.filter(df["year"] == 2020)
    .group_by(["year", "month"])
    .sum()
    .select(["month", "encoded_mood"])
    .sort(["month"])
)

In [None]:
(
    df.filter(df["year"] == 2021)
    .group_by(["year", "month"])
    .sum()
    .select(["month", "encoded_mood"])
    .sort(["month"])
)

In [None]:
(
    df.filter(df["year"] == 2022)
    .group_by(["year", "month"])
    .sum()
    .select(["month", "encoded_mood"])
    .sort(["month"])
)

In [None]:
(
    df.filter(df["year"] == 2023)
    .group_by(["year", "month"])
    .sum()
    .select(["month", "encoded_mood"])
    .sort(["month"])
)

In [None]:
df.group_by("month").sum().select(["month", "encoded_mood"]).sort("month")

In [None]:
df.group_by("weekday").sum().select(["weekday", "encoded_mood"]).sort("weekday")

In [None]:
df.group_by("week").sum().select(["week", "encoded_mood"]).sort("week")

## Data Visualization

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
sns.set_theme(style="ticks", context="talk")

In [None]:
def df_by_year(year: int, df: pl.DataFrame) -> pl.DataFrame:
    return df.filter(df["year"] == year).sort(
        [
            "year",
            "month",
            "day",
        ],
    )

In [None]:
plt.figure(figsize=(5, 7))

sns.barplot(
    df.group_by("mood").agg(pl.col("mood").count().alias("count")),
    hue="mood",
    x="count",
    y="mood",
    legend=False,
    palette="rocket",
)

plt.xlabel("Count")
plt.ylabel("Mood")

plt.title("Number of Records per Mood")

plt.show()

In [None]:
df_2021 = df_by_year(2021, df)
df_2022 = df_by_year(2022, df)
df_2023 = df_by_year(2023, df)