# How to read data from DataFrames with pyActigraphy

**Original Author:** Grégory Hammad

**Note:** *This is a legacy tutorial originally developed for `pyActigraphy`. It remains useful for reference, but will be updated following the release of the first milestone version of the `circStudio` package.*

**Observation:** *Once the first milestone version of the `circStudio` package is released, this will become the default method for opening files. Input files must be pre-formatted to ensure they can be properly cleaned and analyzed.*

## Imported packages and input data

The usual suspects:

In [1]:
import numpy as np

In [2]:
import os

In [3]:
import pandas as pd

In [4]:
import circStudio

In this example, let's generate some input data:

NB: if you already have your data under a pandas.DataFrame format, jump directly to the next section.

In [24]:
N = 1440*7 # 7 days of acquisition at a frequency of 60s.

In [25]:
activity = np.random.normal(10,1,N)
light = np.random.normal(100,10,N)

In [26]:
non_wear = np.empty(N)

In [27]:
# Set up a segment of  spurious inactivity
activity[2060:2160] = 0.0
non_wear[2060:2160] = 1.0

In [28]:
d = {'Activity': activity, 'Light': light, 'Non-wear': non_wear}

In [29]:
index = pd.date_range(start='01-01-2020',freq='60s',periods=N)

In [30]:
data = pd.DataFrame(index=index,data=d)

In [31]:
data

Unnamed: 0,Activity,Light,Non-wear
2020-01-01 00:00:00,10.812449,109.286549,1.408156e-311
2020-01-01 00:01:00,9.574570,98.377043,1.408158e-311
2020-01-01 00:02:00,9.116109,99.835302,0.000000e+00
2020-01-01 00:03:00,9.244513,92.352330,0.000000e+00
2020-01-01 00:04:00,10.584692,96.152580,0.000000e+00
...,...,...,...
2020-01-07 23:55:00,10.170783,113.681926,0.000000e+00
2020-01-07 23:56:00,7.881578,104.624959,0.000000e+00
2020-01-07 23:57:00,11.432021,96.652484,0.000000e+00
2020-01-07 23:58:00,10.557881,103.235590,0.000000e+00


## Manual creation of a BaseRaw object

In [5]:
from circStudio.io import BaseRaw

### Set activity and light data (if available)

In [33]:
raw = BaseRaw(
    name="myName", 
    uuid='DeviceId', 
    format='Pandas', 
    axial_mode=None,
    start_time=data.index[0],
    period=(data.index[-1]-data.index[0]),
    frequency=data.index.freq,
    data=data['Activity'],
    light=data['Light']
)

In [34]:
raw.data

2020-01-01 00:00:00    10.812449
2020-01-01 00:01:00     9.574570
2020-01-01 00:02:00     9.116109
2020-01-01 00:03:00     9.244513
2020-01-01 00:04:00    10.584692
                         ...    
2020-01-07 23:55:00    10.170783
2020-01-07 23:56:00     7.881578
2020-01-07 23:57:00    11.432021
2020-01-07 23:58:00    10.557881
2020-01-07 23:59:00     9.992240
Freq: 60s, Name: Activity, Length: 10080, dtype: float64

### Opening a file not natively supported by pyActigraphy

In [6]:
# Open the tsv (tab separated file) file
df = pd.read_csv(os.path.join('esteban.txt'), sep='\t')

# Create a datatime column, containing timestamps, and drop the individual 'date' and 'time' columns
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], format='%d/%m/%y %H:%M:%S')
df = df.drop(['date', 'time'], axis=1)

# Set the datetime column as an index
df = df.set_index('datetime')

# Set the frequency (in this case, 30 seconds)
df.index.freq = '30s'

In [7]:
# Create a new BaseRaw object
raw = BaseRaw(
    name="estevan", 
    uuid='kw3', 
    format='Pandas', 
    axial_mode=None,
    start_time=df.index[0],
    period=(df.index[-1]-df.index[0]),
    #frequency=pd.Timedelta(30,'s'),
    frequency = df.index.freq,
    data=df['t_mov'], # Assuming that activity is stored here
    light=df['visible_light'] # Working with visible_light
)

### Set up a mask

Most devices that have a wear sensor return this information as a binary time series with "1" when the device is most likely not worn and "0" otherwise.
In pyActigraphy, this information can be used to create a mask and thus invalidate the corresponding data points (set to "0" most probably). However, the mask, the value "1" correspond to "no masking". So, depending on your "non-wear" data, be careful to transform them appropriately:

In [17]:
# Here, I assume that 0: the device is worn, 1: device not worn. 
# As mentioned aboce, for the mask, 1: no masking. (NB: it is a convolution: data*mask)
raw.mask = np.abs(data['Non-wear']-1)

## Tests

In [8]:
raw.duration()

<866190 * Seconds>

In [9]:
raw.length()

28873

In [10]:
raw.ADAT(binarize=False)

259046.46074933468

In [11]:
raw.IV(binarize=False)

0.5778833611564067

In [12]:
# If you want to mask the data
raw.mask_inactivity = True

In [32]:
# For a gaussian noise, IV should be close to 2.
raw.IV(binarize=False)

0.5778833611564067

The masking seems to work!

Et voilà!