QuantifiedMe
============

**Created by:** Erik Bjäreholt   ([GitHub](https://github.com/ErikBjare), [LinkedIn](https://www.linkedin.com/in/erikbjareholt/))
<br><b>Get the latest version at: https://github.com/ErikBjare/quantifiedme</b>

A bunch of helpful visualizations for managing behavior, productivity, health, and life in general.

> *I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.*
>
>   \-  ***William Thomson*** (Lord Kelvin), Lecture on "Electrical Units of Measurement" (1883)

# Table of contents

TODO: Build automatically

- [Setup](#Setup)
  - [Set window title](#Set-window-title)
- [Load data](#Load-data)
  - [Load ActivityWatch data](#Load-ActivityWatch-data)
  - [Load SmarterTime data](#Load-SmarterTime-data)
  - [Load Toggl data](#Load-Toggl-data)
- [Annotate](#Annotate-data)
- [Visualize](#Visualize)
    

# Setup

First we do some imports, and set some variables used in the rest of the program.

In [None]:
from datetime import datetime, time, date
from pathlib import Path

import matplotlib.pyplot as plt
import pytz

from IPython.utils import io
from IPython.core.display import display, HTML

import aw_core
from aw_core.models import Event
import aw_research, aw_research.classify
from aw_research.classify import _union_no_overlap

import scripts.location as locate
import build_dashboard as lib

your_timezone = pytz.timezone('Europe/Stockholm')

## Set window title

The below sets the window title to something reasonable more descriptive so that ActivityWatch can keep track of it. (Especially useful in 

In [None]:
%%javascript
document.title='QuantifiedMe - Jupyter'

# Load data

We want to data from several sources. Every next source will fill eventual gaps from previous sources, which is made possible thanks to `_union_no_overlap`.

## Load ActivityWatch data

Retrieve events from aw-server. Queried for active windows combined with browser history and filters by AFK/audible.

In [None]:
days_back = 90
now = datetime.now() # datetime.now()
since = datetime.combine((now - timedelta(days=days_back)).date(), time())
events_aw = aw_research.classify.get_events(since=since, end=now, include_smartertime=False, include_toggl=False)
for e in events_aw:
    e.data['$source'] = 'activitywatch'
events = events_aw

## Load SmarterTime data

SmarterTime is an Android app to track your device usage. This loads an ActivityWatch bucket that I converted from the app export.

In [None]:
smartertime_awbucket_path = 'smartertime2activitywatch/smartertime_export_2018-12-23_f64e5977.awbucket.json'

events_smartertime = aw_research.classify._get_events_smartertime(since, filepath=smartertime_awbucket_path)
for e in events_smartertime:
    e.data['$source'] = 'smartertime'
events = _union_no_overlap(events, events_smartertime)

## Load Toggl data

In [None]:
from toggl import api, utils
logging.getLogger('toggl.utils').setLevel(logging.WARNING)

import_toggl = True

if import_toggl:
    # [x] TODO: For some reason this doesn't get all history, consider just switching back to loading from export (at least for older events)
    # The maintainer of togglcli fixed it quickly, huge thanks! https://github.com/AuHau/toggl-cli/issues/87
    
    def entries_from_all_workspaces():
        # [ ] TODO: Several issues, such as not setting the user of each TimeEntry and setting the same workspace on every TimeEntry
        workspaces = list(api.Workspace.objects.all())
        print(f'Found {len(workspaces)} workspaces: {list(w.name for w in workspaces)}')
        entries = __builtins__.sum([list(api.TimeEntry.objects.all_from_reports(start=since, stop=now, workspace=workspace)) for workspace in workspaces], [])
        for e in entries[-10:]:
            print(e['workspace'], e['project'])
        return [e.to_dict() for e in entries]
    
    def entries_from_main_workspace():
        entries = list(api.TimeEntry.objects.all_from_reports(start=since, stop=now))
        return [e.to_dict() for e in entries]
    
    entries = entries_from_main_workspace()
    print(f"Found {len(entries)} time entries in Toggl")
    events_toggl = []
    for e in entries:
        if e['start'] < since.astimezone(timezone.utc):
            continue
        project = e['project'].name if e['project'] else 'no project'
        description = e['description'] or 'no description'
        events_toggl.append(Event(timestamp=e['start'].isoformat(), 
                                  duration=e['duration'] / 1000,
                                  data={'app': project, 
                                        'title': f"{project} -> {description}",
                                        '$source': 'toggl'}))
    events_toggl = sorted(events_toggl, key=lambda e: e.timestamp)
    events = _union_no_overlap(events, events_toggl)

## Create fake data

In [None]:
data_weights = {
    
}

def create_fake_events():
    yield Event(duration=0, data={'': ''})

## Verify data
Just to make sure there are no bugs in underlying code.

In [None]:
# Ensure no events older than `since`
assert all(since <= e.timestamp for e in events)
assert all(e.timestamp + e.duration <= now for e in events)
# Ensure no events overlap
assert all(e1.timestamp + e1.duration <= e2.timestamp for e1, e2 in zip(events[:-1], events[1:]))

# Annotate data

## Define tagging rules

First we need to specify rules used in categorization and tagging.

The rules are specified by a list of tuples on the format `(regex, category, parent_category)`. You can write them within the notebook or load them from a CSV file.

In [None]:
classes = [
    # Media
    (r'Spotify|spotify.com', 'Music', 'Media'),
    (r'YouTube|youtube.com', 'Video', 'Media'),
    
    # Work
    (r'github.com|stackoverflow.com', 'Programming', 'Work'),
    (r'[Aa]ctivity[Ww]atch|aw-.*', 'ActivityWatch', 'Programming'),
    (r'[Qq]uantified[Mm]e', 'QuantifiedMe', 'Programming'),
]

# Now load the classes from within the notebook, or from a CSV file.
load_from_csv = True
if load_from_csv:
    aw_research.classify._init_classes(class_csv_filename="./aw-research/category_regexes.csv")
else:
    aw_research.classify._init_classes(new_classes=classes)

## Annotate events with tags and category

Now we will actually annotate the events with our defined tags/categories.
Classify, which adds `$tags` and `$category_hierarchy` fields to event data

In [None]:
events = aw_research.classify.classify(events)

# Visualize

There are many ways to visualize the data, here are some methods.


## Daily time plot

Useful to see how much you've engaged in a particular activity over time.

In [None]:
def categorytime_per_day(events, category):
    events = [e for e in events if category in e.data["$category_hierarchy"]]
    if not events:
        raise Exception('No events to calculate on')
    ts = pd.Series([e.duration.total_seconds() / 3600 for e in events], 
                   index=pd.DatetimeIndex([e.timestamp for e in events]).tz_convert("UTC"))
    return ts.resample('1D').apply('sum')

def plot_category(cat, big=False):
    fig = plt.figure(figsize=(18, 4 if big else 2.5))
    #aw_research.classify._plot_category_daily_trend(events, [cat])
    ts  = categorytime_per_day(events, cat)
    ts.plot(label=f"{cat}: daily", legend=True)
    ts.rolling(7, min_periods=3).mean().plot(label=f"{cat}: 7d SMA", legend=True)
    ts.rolling(30, min_periods=7).mean().plot(label=f"{cat}: 30d SMA", legend=True)
    plt.legend(loc='upper right')
    plt.ylim(0)
    plt.grid(linestyle='--')
    plt.tight_layout()

# All logged activity
plot_category('', big=True)

In [None]:
category_wages = {
    #"Work": 50,
    "ActivityWatch": 100,
    "QuantifiedMe": 100,
    "Thankful": 200,
    "School": 300,
    #"Self-directed": 200,
    #"Maths": 300,
    #"Control": 200,
}


def plot_wages(events, category_wages):
    df = pd.DataFrame()
    for cat, wage in category_wages.items():
        df[cat] = wage * categorytime_per_day(events, cat)
    df.plot.area(label='total', stacked=True, legend=True, figsize=(16, 5))
    df.sum(axis=1).rolling(7).mean().plot(label='Total 7d SMA', legend=True)
    plt.grid(linestyle='--')
    plt.tight_layout()
    
plot_wages(events, category_wages)

In [None]:
# Work-related
plot_category('Work', big=True)
plot_category('ActivityWatch')
plot_category('QuantifiedMe')
plot_category('Thankful')
plot_category('School')
plot_category('Maths')

In [None]:
# Entertainment
plot_category('Media', big=True)
plot_category('Social Media')
plot_category('Video')
plot_category('Music')

## Category sunburst

Works with the category hierarchy to create a good overview of how time has been spent during a given period.

In [None]:
now = datetime.now()
start_of_today = datetime.combine(now.date(), time()).astimezone(your_timezone)
events_today = [e for e in events if start_of_today < e.timestamp]

In [None]:
def plot_sunburst(events):
    plt.figure(figsize=(6, 6))
    aw_research.classify._plot_category_hierarchy_sunburst(events)
    display(HTML(f"<h2>Duration: {__builtin__.sum((e.duration for e in events), timedelta(0))}</h2>"))

In [None]:
plot_sunburst(events_today)
#HTML("<div style='height: 60em'>test1<br>test2</div>")

In [None]:
plot_sunburst(events)

# Uncategorized

In [None]:
def time_per_keyval(events, key):
    vals = defaultdict(lambda: timedelta(0))
    for e in events:
        if key in e.data:
            vals[e.data[key]] += e.duration
        else:
            vals[f'key {key} did not exist'] += e.duration
    return vals

def print_time_per_keyval(events, key):
    from tabulate import tabulate
    l = sorted([(v, k) for k, v in time_per_keyval(events, key).items()], reverse=True)
    print(tabulate(l[:20], headers=['time', 'val']))
    
events_uncategorized = [e for e in events if 'Uncategorized' in e.data['$tags']]
print_time_per_keyval(events_uncategorized, 'title')

In [None]:
print_time_per_keyval(events, '$source')

# Locations

In [None]:
me = "erik"
locs = ['actic']

In [None]:
dfs = locate.load_all_dfs()
dfs[me].resample("24H").apply("mean").tail(5)

In [None]:
#with io.capture_output():
start_loc = datetime(2015, 1, 1)
for loc in locs:
    plt.figure(figsize=(16, 3))
    plt.title(loc)
    locate.main_plot(dfs, me, loc, start=start_loc);

# Drugs & Supplements

In [None]:
import sys
sys.path.insert(0,'./QSlang')

from qslang import main

# Testing

In [None]:
#print(dir(lib))
display(HTML('<h1>Hello, world!</h1>'))

# Notebook hacks

In [None]:
# Doesn't work in Jupyter Lab because it lacks jQuery
HTML('''<script>
code_show=true; 
function code_toggle() {
  if(code_show){
    $('div.input').hide();
    $('div.jp-InputArea').hide();
  } else {
    $('div.input').show();
    $('div.jp-InputArea').show();
  }
  code_show = !code_show
} 
$(document).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

In [None]:
CSS = """
.output {
/*
    flex-direction: row;
    flew-wrap: wrap;
*/
}

div.output_area > div.prompt {
  /*
  display: none;
  min-width: 0em;
  border-left: 0.4em solid black;
    */
}

div.output > div.output_area {
  flex-grow: 1;
  min-width: 50%;
}
"""

HTML('<style>{}</style>'.format(CSS))