# RescueTime Downloader

Code to collect and export RescueTime Activity Logs, includes options to collect in hourly or minute bins. Default is hourly.

**NOTE:** Collecting Full History takes some time, depending how many years of data you have. I recommend you configure the script below to pull data in yearly chunks, though it should work if you attempt to export full history. 

------

## Setup and Installation

* Go to [RescueTime API](https://www.rescuetime.com/anapi/manage) and copy an API Key 
* Copy credentials-sample.json to create credentials.json and add your RescueTime Key.
* This project depends on no additional code besides standard python libraries and Pandas. 

-----

## Dependencies

In [1]:
import requests
import os
from datetime import date, datetime, timedelta as td
import pandas as pd

----

## Credentials

In [2]:
import json

with open("credentials.json", "r") as file:
    credentials = json.load(file)
    rescuetime_cr = credentials['rescuetime']
    KEY = rescuetime_cr['KEY']

In [3]:
baseurl = 'https://www.rescuetime.com/anapi/data?key='

In [4]:
url =  baseurl + KEY

----

## Export Dates Configuration

In [5]:
# Configure These to Your Preferred Dates
start_date = '2019-09-01'  # Start date for data
end_date   = '2019-10-31'  # End date for data

------

## Function to Get RescueTime Activities

In [18]:
# Adjustable by Time Period
def rescuetime_get_activities(start_date, end_date, resolution='hour'):
    # Configuration for Query
    # SEE: https://www.rescuetime.com/apidoc
    payload = {
        'perspective':'interval',
        'resolution_time': resolution, #1 of "month", "week", "day", "hour", "minute"
        'restrict_kind':'document',
        'restrict_begin': start_date,
        'restrict_end': end_date,
        'format':'json' #csv
    }
    
    # Setup Iteration - by Day
    d1 = datetime.strptime(payload['restrict_begin'], "%Y-%m-%d").date()
    d2 = datetime.strptime(payload['restrict_end'], "%Y-%m-%d").date()
    delta = d2 - d1
    
    activities_list = []
    
    # Iterate through the days, making a request per day
    for i in range(delta.days + 1):
        # Find iter date and set begin and end values to this to extract at once.
        d3 = d1 + td(days=i) # Add a day
        if d3.day == 1: print('Pulling Monthly Data for ', d3)

        # Update the Payload
        payload['restrict_begin'] = str(d3) # Set payload days to current
        payload['restrict_end'] = str(d3)   # Set payload days to current

        # Request
        try: 
            r = requests.get(url, payload) # Make Request
            iter_result = r.json() # Parse result
            # print("Collecting Activities for " + str(d3))
        except: 
            print("Error collecting data for " + str(d3))
        
        for i in iter_result['rows']:
            activities_list.append(i)
            
    return activities_list

---

## Collect Report of Activites By Day

In [19]:
# activities_day_log = rescuetime_get_activities(start_date, end_date, 'day')

In [20]:
# activities_daily = pd.DataFrame.from_dict(activities_day_log)

In [21]:
# activities_daily.info()

In [22]:
# activities_daily.describe()

In [23]:
# activities_daily.tail()

----

## Collect Report of Activites By Hour

In [24]:
activities_hour_log = rescuetime_get_activities(start_date, end_date, 'hour')

Pulling Monthly Data for  2019-09-01
Pulling Monthly Data for  2019-10-01


In [25]:
activities_hour_log

[['2019-09-01T00:00:00',
  1668,
  1,
  'Google Chrome',
  'No Details',
  'Browsers',
  0],
 ['2019-09-01T00:00:00',
  300,
  1,
  'youtube.com',
  'Human Headphones Just Changed The Game - YouTube - Google Chrome',
  'Video',
  -2],
 ['2019-09-01T00:00:00',
  42,
  1,
  'youtube.com',
  'YouTube - Google Chrome',
  'Video',
  -2],
 ['2019-09-01T00:00:00',
  14,
  1,
  'youtube.com',
  '宇哥 - YouTube - Google Chrome',
  'Video',
  -2],
 ['2019-09-01T00:00:00',
  13,
  1,
  'youtube.com',
  '【喵嗷污】如果地球停止转动，人类将面临怎样的灾难？这纪录片比灾难电影还精彩啊 - YouTube - Google Chrome',
  'Video',
  -2],
 ['2019-09-01T00:00:00',
  12,
  1,
  'youtube.com',
  '【赌博】心惊肉跳的捉内鬼行动，谁才是幕后黑手？《赌博默示录：中层管理录利根川22》 - YouTube - Google Chrome',
  'Video',
  -2],
 ['2019-09-01T00:00:00',
  11,
  1,
  'google.com/calendar',
  'YouTube - Google Chrome',
  'Calendars',
  0],
 ['2019-09-01T00:00:00',
  11,
  1,
  'google.com',
  'human headphones - Google Search - Google Chrome',
  'Search',
  1],
 ['2019-09-01T00:00:00',
  9,
  1,
  'go

In [26]:
activities_hourly = pd.DataFrame.from_dict(activities_hour_log)

In [27]:
activities_hourly.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [28]:
activities_hourly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33661 entries, 0 to 33660
Data columns (total 7 columns):
Date            33661 non-null object
Seconds         33661 non-null int64
NumberPeople    33661 non-null int64
Actitivity      33661 non-null object
Document        33661 non-null object
Category        33661 non-null object
Productivity    33661 non-null int64
dtypes: int64(3), object(4)
memory usage: 1.8+ MB


In [29]:
activities_hourly.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,33661.0,33661.0,33661.0
mean,71.458721,1.0,0.613678
std,230.440778,0.0,1.228696
min,1.0,1.0,-2.0
25%,4.0,1.0,0.0
50%,11.0,1.0,1.0
75%,37.0,1.0,1.0
max,3600.0,1.0,2.0


In [30]:
activities_hourly.tail()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
33656,2019-10-31T23:00:00,3,1,system idle process,No Details,Other,0
33657,2019-10-31T23:00:00,3,1,S Finder,No Details,General Utilities,1
33658,2019-10-31T23:00:00,2,1,Duo Mobile,No Details,General Business,2
33659,2019-10-31T23:00:00,2,1,Adobe Acrobat,Adobe Acrobat Pro DC,General Reference & Learning,2
33660,2019-10-31T23:00:00,1,1,Adobe Acrobat,Properties,General Reference & Learning,2


In [31]:
activities_hourly.to_csv('data/rescuetime-hourly-' + start_date + '-to-' + end_date + '.csv')

## Collect Report of Activites By Minute

In [32]:
# activities_minute_log = rescuetime_get_activities(start_date, end_date, 'minute')

In [33]:
# activities_per_minute = pd.DataFrame.from_dict(activities_minute_log)

In [34]:
# Date', u'Time Spent (seconds)', u'Number of People', u'Activity', u'Document', u'Category', u'Productivity'
# activities_per_minute.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [35]:
# activities_per_minute.head()

In [36]:
# activities_per_minute.info()

In [37]:
# activities_per_minute.describe()

In [38]:
# activities_per_minute.to_csv('data/rescuetime-by-minute' + start_date + '-to-' + end_date + '.csv')

-----

## Simple Analysis (Using Exported Logs)

In [39]:
import glob
import os

In [40]:
# import hourly data exports and create a single data frame
path = 'data/'
allFiles = glob.glob(path + "/rescuetime-hourly*.csv")
timelogs = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
activities = pd.concat(list_)

In [41]:
len(activities) # 312477

33661

In [42]:
# total hours
activities.Seconds.sum() / 60 / 60

668.1588888888889

In [43]:
# total days
activities.Seconds.sum() / 60 / 60 / 24

27.839953703703703

In [44]:
activities.head()

Unnamed: 0.1,Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
0,0,2019-09-01T00:00:00,1668,1,Google Chrome,No Details,Browsers,0
1,1,2019-09-01T00:00:00,300,1,youtube.com,Human Headphones Just Changed The Game - YouTu...,Video,-2
2,2,2019-09-01T00:00:00,42,1,youtube.com,YouTube - Google Chrome,Video,-2
3,3,2019-09-01T00:00:00,14,1,youtube.com,宇哥 - YouTube - Google Chrome,Video,-2
4,4,2019-09-01T00:00:00,13,1,youtube.com,【喵嗷污】如果地球停止转动，人类将面临怎样的灾难？这纪录片比灾难电影还精彩啊 - YouTu...,Video,-2


In [45]:
activities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33661 entries, 0 to 33660
Data columns (total 8 columns):
Unnamed: 0      33661 non-null int64
Date            33661 non-null object
Seconds         33661 non-null int64
NumberPeople    33661 non-null int64
Actitivity      33661 non-null object
Document        33661 non-null object
Category        33661 non-null object
Productivity    33661 non-null int64
dtypes: int64(4), object(4)
memory usage: 2.1+ MB


In [46]:
activities.describe()

Unnamed: 0.1,Unnamed: 0,Seconds,NumberPeople,Productivity
count,33661.0,33661.0,33661.0,33661.0
mean,16830.0,71.458721,1.0,0.613678
std,9717.238042,230.440778,0.0,1.228696
min,0.0,1.0,1.0,-2.0
25%,8415.0,4.0,1.0,0.0
50%,16830.0,11.0,1.0,1.0
75%,25245.0,37.0,1.0,1.0
max,33660.0,3600.0,1.0,2.0


In [47]:
# create columns for year, month, day, and dow

In [48]:
# pivot table 
# activities.pivot(index='date', columns='Category', values='seconds')
# temp.pivot(columns='Category', values='Seconds')