In [20]:
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as datetime

# Sleep Data

The first thing I'm curious of is my sleep data. Occasionally I suffer from migraines after waking up in the morning which affects the rest of my day and productivity. I want to investigate if my quality of sleep and how often/how much time I spend in each stage of sleep.

In [2]:
# Explore the structure of JSON sleep files
with open('../Data/CurtisHiga/user-site-export/sleep-2019-01-23.json', 'r') as json_file:
    json_data = json.load(json_file)

In [3]:
json_data

[{'logId': 21320930034,
  'dateOfSleep': '2019-02-22',
  'startTime': '2019-02-22T02:08:30.000',
  'endTime': '2019-02-22T11:52:00.000',
  'duration': 34980000,
  'minutesToFallAsleep': 0,
  'minutesAsleep': 501,
  'minutesAwake': 82,
  'minutesAfterWakeup': 0,
  'timeInBed': 583,
  'efficiency': 87,
  'type': 'stages',
  'infoCode': 0,
  'levels': {'summary': {'deep': {'count': 6,
     'minutes': 79,
     'thirtyDayAvgMinutes': 66},
    'wake': {'count': 31, 'minutes': 82, 'thirtyDayAvgMinutes': 57},
    'light': {'count': 37, 'minutes': 353, 'thirtyDayAvgMinutes': 219},
    'rem': {'count': 7, 'minutes': 69, 'thirtyDayAvgMinutes': 63}},
   'data': [{'dateTime': '2019-02-22T02:08:30.000',
     'level': 'wake',
     'seconds': 480},
    {'dateTime': '2019-02-22T02:16:30.000', 'level': 'light', 'seconds': 900},
    {'dateTime': '2019-02-22T02:31:30.000', 'level': 'deep', 'seconds': 960},
    {'dateTime': '2019-02-22T02:47:30.000', 'level': 'wake', 'seconds': 510},
    {'dateTime': '2019

It appears the data is logged with a specific ``logId`` which can be used as a index for my data frame. The items I want as columns also appears to be the nested in the first level of each entry in the dictionaries. That's important to remember when applying the *read_json* function of the Pandas library on each of the sleep JSON files.

In [25]:
# Use read_json to read in json file as a data frame
sleep_jan19 = pd.read_json('../Data/CurtisHiga/user-site-export/sleep-2019-01-23.json',
                          orient = 'columns',
                          convert_dates = ['dateOfSleep', 'endTime', 'startTime'])

As stated above, the ``logId`` value seems to be unique and could be used as an index for the data frame. I want to take a quick look at the structure of the data frame to make sure it imported like I expected. The *transpose* method is applied here only to look at all the columns easier.

In [26]:
sleep_jan19.set_index('logId').transpose()

logId,21320930034,21295605623,21294114499,21281516129,21269375088,21246933300,21243791213,21235310288,21232657269,21195829631,...,21070993648,21046354964,21044987277,21034019972,21032213856,21011938251,20997357151,20995444650,20992920247,20977707058
dateOfSleep,2019-02-22 00:00:00,2019-02-20 00:00:00,2019-02-20 00:00:00,2019-02-19 00:00:00,2019-02-18 00:00:00,2019-02-16 00:00:00,2019-02-16 00:00:00,2019-02-15 00:00:00,2019-02-15 00:00:00,2019-02-12 00:00:00,...,2019-02-02 00:00:00,2019-01-31 00:00:00,2019-01-31 00:00:00,2019-01-30 00:00:00,2019-01-30 00:00:00,2019-01-28 00:00:00,2019-01-27 00:00:00,2019-01-27 00:00:00,2019-01-27 00:00:00,2019-01-26 00:00:00
duration,34980000,9780000,12900000,32400000,27120000,4380000,13320000,4560000,33120000,4080000,...,29880000,6300000,28680000,5040000,12660000,4320000,4200000,8400000,13320000,31680000
efficiency,87,95,91,93,95,92,91,94,93,99,...,91,87,92,90,94,93,94,97,91,91
endTime,2019-02-22 11:52:00,2019-02-20 12:21:30,2019-02-20 07:29:00,2019-02-19 09:16:00,2019-02-18 10:29:00,2019-02-16 17:27:00,2019-02-16 07:57:30,2019-02-15 17:55:30,2019-02-15 10:57:30,2019-02-12 13:24:30,...,2019-02-02 10:50:00,2019-01-31 12:57:30,2019-01-31 09:46:30,2019-01-30 11:36:00,2019-01-30 07:04:00,2019-01-28 18:05:00,2019-01-27 16:46:00,2019-01-27 12:12:30,2019-01-27 07:15:30,2019-01-26 10:57:30
infoCode,0,2,0,0,0,2,0,2,0,2,...,0,2,0,2,0,2,2,2,0,0
levels,"{'summary': {'deep': {'count': 6, 'minutes': 7...","{'summary': {'restless': {'count': 4, 'minutes...","{'summary': {'deep': {'count': 2, 'minutes': 5...","{'summary': {'deep': {'count': 3, 'minutes': 6...","{'summary': {'deep': {'count': 4, 'minutes': 6...","{'summary': {'restless': {'count': 4, 'minutes...","{'summary': {'deep': {'count': 2, 'minutes': 4...","{'summary': {'restless': {'count': 4, 'minutes...","{'summary': {'deep': {'count': 8, 'minutes': 8...","{'summary': {'restless': {'count': 2, 'minutes...",...,"{'summary': {'deep': {'count': 5, 'minutes': 9...","{'summary': {'restless': {'count': 5, 'minutes...","{'summary': {'deep': {'count': 3, 'minutes': 1...","{'summary': {'restless': {'count': 5, 'minutes...","{'summary': {'deep': {'count': 2, 'minutes': 3...","{'summary': {'restless': {'count': 1, 'minutes...","{'summary': {'restless': {'count': 2, 'minutes...","{'summary': {'restless': {'count': 6, 'minutes...","{'summary': {'deep': {'count': 3, 'minutes': 5...","{'summary': {'deep': {'count': 3, 'minutes': 9..."
minutesAfterWakeup,0,0,0,0,0,0,0,6,0,1,...,0,0,0,0,0,0,0,12,0,0
minutesAsleep,501,155,180,453,401,67,192,66,480,66,...,437,91,430,76,171,67,66,124,177,449
minutesAwake,82,8,35,87,51,6,30,4,72,1,...,61,14,48,8,40,5,4,4,45,79
minutesToFallAsleep,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The JSON file seems to have imported successfully and I'm satisfied with the result for the time being. Below is a list of things I want to do in terms of cleaning up this data before moving on.
+ Ensure ``dataOfSleep``, ``stateTime``, and ``endTime`` are in *datetime* formats
    + Consider splitting dates into columns
+ Determine the difference between ``duration`` and ``timeInBed`` plus how they correlate to the different ``levels`` of sleep
+ Sort the data by ``dateOfSleep``
+ Handle naps
    + Naps could be indicative of a day where I had a migraine
    + Possibly remove them after deciding what to do with them
+ Investigate ``levels``
+ Determine what ``infoCode`` and ``type`` represents

## datetime Objects

In [27]:
sleep_jan19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 14 columns):
dateOfSleep            31 non-null datetime64[ns]
duration               31 non-null int64
efficiency             31 non-null int64
endTime                31 non-null datetime64[ns]
infoCode               31 non-null int64
levels                 31 non-null object
logId                  31 non-null int64
minutesAfterWakeup     31 non-null int64
minutesAsleep          31 non-null int64
minutesAwake           31 non-null int64
minutesToFallAsleep    31 non-null int64
startTime              31 non-null datetime64[ns]
timeInBed              31 non-null int64
type                   31 non-null object
dtypes: datetime64[ns](3), int64(9), object(2)
memory usage: 3.5+ KB


The columns ``dateOfSleep``, ``endTime``, and ``startTime`` already appear to be in *datetime* formats. It may be necessary to split each of these columns into separate datetime columns but for now it's fine.

## ``levels``

Before things get too complicated, I want to take a look at the ``levels`` column and what data lies in each observation.

In [35]:
sleep_jan19['levels'][0]

{'summary': {'deep': {'count': 6, 'minutes': 79, 'thirtyDayAvgMinutes': 66},
  'wake': {'count': 31, 'minutes': 82, 'thirtyDayAvgMinutes': 57},
  'light': {'count': 37, 'minutes': 353, 'thirtyDayAvgMinutes': 219},
  'rem': {'count': 7, 'minutes': 69, 'thirtyDayAvgMinutes': 63}},
 'data': [{'dateTime': '2019-02-22T02:08:30.000',
   'level': 'wake',
   'seconds': 480},
  {'dateTime': '2019-02-22T02:16:30.000', 'level': 'light', 'seconds': 900},
  {'dateTime': '2019-02-22T02:31:30.000', 'level': 'deep', 'seconds': 960},
  {'dateTime': '2019-02-22T02:47:30.000', 'level': 'wake', 'seconds': 510},
  {'dateTime': '2019-02-22T02:56:00.000', 'level': 'light', 'seconds': 1800},
  {'dateTime': '2019-02-22T03:26:00.000', 'level': 'deep', 'seconds': 480},
  {'dateTime': '2019-02-22T03:34:00.000', 'level': 'light', 'seconds': 660},
  {'dateTime': '2019-02-22T03:45:00.000', 'level': 'rem', 'seconds': 810},
  {'dateTime': '2019-02-22T03:58:30.000', 'level': 'light', 'seconds': 990},
  {'dateTime': '20

The data contained with in the ``levels`` columns seems to be a detailed summary of number and duration during each phase of sleep. Right now, I don't need when and how long I spent in each phase of sleep at what part of the night. The overall totals of each phase of sleep will suffice for now. A function will have to be created to add and append these statistics to the data frame.

In [71]:
summaries = [x['summary'] for x in sleep_jan19['levels']]

In [73]:
pd.DataFrame(summaries)

Unnamed: 0,asleep,awake,deep,light,rem,restless,wake
0,,,"{'count': 6, 'minutes': 79, 'thirtyDayAvgMinut...","{'count': 37, 'minutes': 353, 'thirtyDayAvgMin...","{'count': 7, 'minutes': 69, 'thirtyDayAvgMinut...",,"{'count': 31, 'minutes': 82, 'thirtyDayAvgMinu..."
1,"{'count': 0, 'minutes': 155}","{'count': 0, 'minutes': 0}",,,,"{'count': 4, 'minutes': 8}",
2,,,"{'count': 2, 'minutes': 56, 'thirtyDayAvgMinut...","{'count': 14, 'minutes': 85, 'thirtyDayAvgMinu...","{'count': 4, 'minutes': 39, 'thirtyDayAvgMinut...",,"{'count': 16, 'minutes': 35, 'thirtyDayAvgMinu..."
3,,,"{'count': 3, 'minutes': 61, 'thirtyDayAvgMinut...","{'count': 32, 'minutes': 300, 'thirtyDayAvgMin...","{'count': 12, 'minutes': 92, 'thirtyDayAvgMinu...",,"{'count': 41, 'minutes': 87, 'thirtyDayAvgMinu..."
4,,,"{'count': 4, 'minutes': 67, 'thirtyDayAvgMinut...","{'count': 28, 'minutes': 228, 'thirtyDayAvgMin...","{'count': 12, 'minutes': 106, 'thirtyDayAvgMin...",,"{'count': 32, 'minutes': 51, 'thirtyDayAvgMinu..."
5,"{'count': 0, 'minutes': 67}","{'count': 0, 'minutes': 0}",,,,"{'count': 4, 'minutes': 6}",
6,,,"{'count': 2, 'minutes': 47, 'thirtyDayAvgMinut...","{'count': 16, 'minutes': 119, 'thirtyDayAvgMin...","{'count': 3, 'minutes': 26, 'thirtyDayAvgMinut...",,"{'count': 16, 'minutes': 30, 'thirtyDayAvgMinu..."
7,"{'count': 0, 'minutes': 66}","{'count': 1, 'minutes': 1}",,,,"{'count': 4, 'minutes': 9}",
8,,,"{'count': 8, 'minutes': 84, 'thirtyDayAvgMinut...","{'count': 40, 'minutes': 281, 'thirtyDayAvgMin...","{'count': 15, 'minutes': 115, 'thirtyDayAvgMin...",,"{'count': 43, 'minutes': 72, 'thirtyDayAvgMinu..."
9,"{'count': 0, 'minutes': 66}","{'count': 0, 'minutes': 0}",,,,"{'count': 2, 'minutes': 2}",
