# Introduction

I maintain a diary using Obsidian, which is a markdown writing application that support some features like the use of front matter to add metadata to the documents. All of those documents that conform my diary have three main fields which are the one I want to extract and analyze. Those fields are:
- **Date**: as the name suggest, is the date of the day in which I wrote that entry or the day for which the entry was written.
- **Feeling**: a subjective rating system that I developed to keep track of my mood during the day.
- **Important**: indicates whether I considered the day to be special for any particular reason.

The purpose of this notebook then is to extract these fields from each document and compile them into a CSV file. Also, I plan to generate some features like the year, month, week and weekday from the date and a system to be able to track my mood better later.

## Set Up

In [None]:
# Install dependencies
%pip install polars "black[jupyter]"

## Data Extraction

In [None]:
import sys
from pathlib import Path

# Add the "src" directory to the Python path
src_path = Path.cwd().parent / "src"
sys.path.append(str(src_path))

In [None]:
# Import util functions
from extract_frontmatter import extract_frontmatter, write_entries_to_csv

In [None]:
# Define paths
diary_entries = Path("../data/raw/diary/")
interim_csv_path = Path("../data/interim/diary.csv")
processed_csv_path = Path("../data/processed/diary.csv")

In [None]:
# Extract fields from front matter
raw_entries = extract_frontmatter(diary_entries)
raw_entries[:5]

In [None]:
# write fields to interim CSV
write_entries_to_csv(raw_entries, interim_csv_path)

## Data Extraction and Feature Engineering

In [None]:
# Import polars
import polars as pl

In [None]:
df = pl.read_csv(interim_csv_path, try_parse_dates=True)

In [None]:
df.shape

In [None]:
df.sample(5)

In [None]:
# Get a list of the feelings I used to track my mood
df["feeling"].unique().to_list()

In [None]:
feeling_mapping = {
    "unknown": 0,
    "very sad": -2,
    "sad": -1,
    "alone": -1,
    "drama": -1,
    "sick": -1,
    "stressful": -1,
    "angry": -1,
    "tired": -1,
    "emotional": -1,
    "normal": 0,
    "productive": 1,
    "inspired": 1,
    "happy": 1,
    "very happy": 2,
}

In [None]:
# Create features like year, month, week, weekday and enconded_mood
df = (
    df.with_columns(
        year=df["date"].dt.year().cast(pl.Int16),
        month=df["date"].dt.month().cast(pl.Int8),
        day=df["date"].dt.day().cast(pl.Int8),
        week=df["date"].dt.week().cast(pl.Int8),
        weekday=df["date"].dt.weekday().cast(pl.Int16),
        mood=df["feeling"].cast(pl.Categorical),
        encoded_mood=df["feeling"]
        .map_elements(lambda x: feeling_mapping.get(x, None))
        .cast(pl.Int8),
    )
    .select(
        [
            "year",
            "month",
            "day",
            "week",
            "weekday",
            "mood",
            "encoded_mood",
        ],
    )
    .sort(["year", "month", "day"])
)

In [None]:
df.sample(5)

## Data Export

In [None]:
# Write final CSV
df.write_csv(processed_csv_path)