# Basic usage

In [2]:
#import niimpy

In [3]:
%load_ext autoreload
%autoreload 1
%aimport niimpy
%aimport niimpy.util

In [4]:
data = niimpy.open("sampledata-singleuser.sqlite3")

Detected single-user database


# Conventions

## Common column names
* `time`: unixtime, integer or float
* `ts`: pandas.Timestamp
* DataFrame and Series indexes, wherever possible, are returned as pandas.DateTimeIndexes (which is a pandas.Timestamp)
*

# Common arguments

* `table=`: First required positional argument.

* `user=`: Second required positional argument. User ID (str) to use to filter data.  This is always a required argument, but there are two special values:
  * Use `niimpy.ALL` for all users.
  * Use `None` for single-user databases.
  
* `start=`, `end=`: Limit the range of selected data.  The times can be given in different formats: unixtime (int or float), string (parsed smartly with dateutil, this seems to be localtime), or a Python `datetime.datetime` object (python seems to interpert naive datetime objects as localtime).

* `limit=<int>`: Return at most this many results.  This can be useful for initial testing: select a few tens or hundreds of results to see if it works before selecting everything.

* `offset=<int>`: Companion of `limit`: how many values to skip when returning results. 

# Metadata functions
These functions give you information about the data within the database

### Users in database
Returns `None` if single-user database otherwise a `set` of usernames.

In [5]:
data.users()

### Tables in database
Returns all tables in database

In [6]:
data.tables()

{'AwareScreen'}

### Amount of data per user and per table
Returns a `pandas.DataFrame` with rows of different tables (converters) and columns of users.  The values are count of data for that (converter, user).

The single-user version has a more limited form, rows are tables and there is one column, `count`.

In [7]:
data.user_table_counts()

Unnamed: 0,count
AwareScreen,1156


### First, last timestamp in database
Let's say you want to find the first and last datapoint for a certain (converter, user).  Remember, for single-user databases we use `None` as the user argument.

Output is a trivial DataFrame with columns `time` (unixtime) and `datetime` (pandas.Timestamp).

In [8]:
data.first('AwareScreen', None)

Unnamed: 0,time,datetime
0,1531171000.0,2018-07-09 21:13:17.933000088


In [9]:
data.first('AwareScreen', None)['datetime'][0]

Timestamp('2018-07-09 21:13:17.933000088')

In [10]:
data.first('AwareScreen', None)['datetime'][0].strftime('%Y-%m-%d')

'2018-07-09'

### Count of data for (converter, user)
Basically the same as first/last timestamp:

In [11]:
data.count('AwareScreen', None)

Unnamed: 0,count
0,1156


In [12]:
data.count('AwareScreen', None)['count'][0]

1156

# Accessing data

### Raw data
This returns the raw data in a table.  It can tell you the columns, etc.

In [13]:
data.raw("AwareScreen", None).head(1)

Unnamed: 0,time,screen_status,datetime
2018-07-09 21:13:17.933000088,1531171000.0,1,2018-07-09 21:13:17.933000088


### Data hourly summaries

In [17]:
data.hourly("AwareScreen", None, columns=['screen_status']).head(1)

Unnamed: 0,day,hour,count,screen_status_mean,screen_status_std,screen_status_count
2018-07-10,2018-07-10,0,3,1.0,1.0,3


If you give it a list of columns, it will give you the mean/standard deviation/count 

In [18]:
data.hourly("AwareScreen", None, columns=['screen_status']).head(1)

Unnamed: 0,day,hour,count,screen_status_mean,screen_status_std,screen_status_count
2018-07-10,2018-07-10,0,3,1.0,1.0,3


### Data quality
This makes a measure of data quality for sensors which should be continually sending data.  To do this, it:
* Divides all time into hours
* Divides all hours into five 12-minute intervals
* Count the number of 12-minute intervals that have data.  This is $quality$
* For each hour, report $quality$.  If it is 5, then assume we have almost high-quality data.  If it is 0, then we had no data.

This isn't the perfect measure, but is reasonably effective and simple to calculate.  For data which isn't continuous (like screen data we are actually using), it shows how much the sensor has been used.

Column meanings: `day` is obvious, `hour` is hour of day, `quality` is the measure described above, `count` is total number of data points in this hour, `withdata` is which of the 12-min intervals (0-4) have data.

In [19]:
data.quality("AwareScreen", None).head()

Unnamed: 0,day,hour,quality,count,withdata
2018-07-10 00:00:00,2018-07-10,0,1,3,1
2018-07-10 12:00:00,2018-07-10,12,4,18,123
2018-07-10 14:00:00,2018-07-10,14,2,6,13
2018-07-10 15:00:00,2018-07-10,15,3,13,234
2018-07-10 19:00:00,2018-07-10,19,2,7,3


# Miscelaneous calculations

### Sum of survey scores

TODO: needs further documenting and an example.

The `get_survey_score` is a convenience method to get the sum of scores of a survey.  It can only be used on Survey tables.

It has the standard `table` and `user` arguments, a `survey` argument (filters for survey questions, this is a prefix for the "id" column).

TODO: get sample data and use it.

In [20]:
#data.get_survey_score(table='HyksSurveyAllAnswers', user=niimpy.ALL, survey='PHQ9')

# Utilities

## Bin calculation

In [22]:
import pandas as pd
pd.Series([1,2]).dtype

dtype('int64')

In [24]:
timestamps = data.raw("AwareScreen", None).index
occurrences = niimpy.util.occurrence(timestamps)
occurrences.head()

Unnamed: 0,day,hour,occurance
2018-07-09 21:00:00,2018-07-09,21,1
2018-07-10 09:00:00,2018-07-10,9,4
2018-07-10 11:00:00,2018-07-10,11,2
2018-07-10 12:00:00,2018-07-10,12,3
2018-07-10 16:00:00,2018-07-10,16,2


In [34]:
occurrences.index.hour
occurrences['hour2'] = occurrences.index.hour
occurrences.head()

Unnamed: 0,day,hour,occurance,hour2
2018-07-09 21:00:00,2018-07-09,21,1,21
2018-07-10 09:00:00,2018-07-10,9,2,9
2018-07-10 11:00:00,2018-07-10,11,2,11
2018-07-10 12:00:00,2018-07-10,12,2,12
2018-07-10 16:00:00,2018-07-10,16,2,16


In [26]:
# You can change the bin width, too.
occurrences = niimpy.util.occurrence(timestamps, bin_width=30)
occurrences.head()

Unnamed: 0,day,hour,occurance
2018-07-09 21:00:00,2018-07-09,21,1
2018-07-10 09:00:00,2018-07-10,9,2
2018-07-10 11:00:00,2018-07-10,11,2
2018-07-10 12:00:00,2018-07-10,12,2
2018-07-10 16:00:00,2018-07-10,16,2


In [None]:
#timestamps = data.raw("AwareScreen", None).index
#print(timestamps.dtype.__class__)

#print 
#timestamps.dtype??
timestamps = pd.Series([1, 10, 50, 600, 900, 3600, 3601, 4201])
gb2 = niimpy.util.interval_group(timestamps)
#gb2.reset_index(inplace=True)
#import pandas as pd
#gb2.index = gb2[['day', 'hour']].apply(lambda row: pd.Timestamp('%s %s:00'%(row['day'], row['hour'])), axis=1)
gb2


In [None]:
%debug

# Visualization

There is on built-in visualization yet.

To add visualization, please make functions which accept a `df` argument (a `pandas.DataFrame`) and an `ax` argument (a `matplotlib.Axes`), and draw the visualization on that `Axes`.  If the `ax`  This allows your function to be used for different purposes: make single plots, write to screen, etc.  If `ax` is not given, you can output to default axes (so to the screen).

This means that you'll need to make some other general overhead for making the axes and writing them (for example, to PDFs).