# How to read data from DataFrames with pyActigraphy

**Original Author:** Grégory Hammad

**Note:** *This is a legacy tutorial originally developed for `pyActigraphy`. It remains useful for reference, but will be updated following the release of the first milestone version of the `circStudio` package.*

**Observation:** *Once the first milestone version of the `circStudio` package is released, this will become the default method for opening files. Input files must be pre-formatted to ensure they can be properly cleaned and analyzed.*

## Imported packages and input data

The usual suspects:

In [2]:
import numpy as np

In [3]:
import pandas as pd

In [4]:
import circStudio

In this example, let's generate some input data:

NB: if you already have your data under a pandas.DataFrame format, jump directly to the next section.

In [5]:
N = 1440*7 # 7 days of acquisition at a frequency of 60s.

In [6]:
activity = np.random.normal(10,1,N)
light = np.random.normal(100,10,N)

In [7]:
non_wear = np.empty(N)

In [8]:
# Set up a segment of  spurious inactivity
activity[2060:2160] = 0.0
non_wear[2060:2160] = 1.0

In [9]:
d = {'Activity': activity, 'Light': light, 'Non-wear': non_wear}

In [10]:
index = pd.date_range(start='01-01-2020',freq='60s',periods=N)

In [11]:
data = pd.DataFrame(index=index,data=d)

In [12]:
data

Unnamed: 0,Activity,Light,Non-wear
2020-01-01 00:00:00,12.348719,111.992564,1.052974e-311
2020-01-01 00:01:00,9.509186,106.229410,1.052512e-311
2020-01-01 00:02:00,11.565412,106.166361,0.000000e+00
2020-01-01 00:03:00,10.747845,95.196543,0.000000e+00
2020-01-01 00:04:00,10.947222,107.643642,0.000000e+00
...,...,...,...
2020-01-07 23:55:00,10.641293,119.427444,0.000000e+00
2020-01-07 23:56:00,9.070925,95.883033,0.000000e+00
2020-01-07 23:57:00,10.157441,100.787815,0.000000e+00
2020-01-07 23:58:00,9.503215,114.941433,0.000000e+00


## Manual creation of a BaseRaw object

In [14]:
from circStudio.io import BaseRaw

help(BaseRaw)

### Set activity and light data (if available)

In [15]:
raw = BaseRaw(
    name="myName", 
    uuid='DeviceId', 
    format='Pandas', 
    axial_mode=None,
    start_time=data.index[0],
    period=(data.index[-1]-data.index[0]),
    frequency=data.index.freq,
    data=data['Activity'],
    light=data['Light']
)

In [16]:
raw.data

2020-01-01 00:00:00    12.348719
2020-01-01 00:01:00     9.509186
2020-01-01 00:02:00    11.565412
2020-01-01 00:03:00    10.747845
2020-01-01 00:04:00    10.947222
                         ...    
2020-01-07 23:55:00    10.641293
2020-01-07 23:56:00     9.070925
2020-01-07 23:57:00    10.157441
2020-01-07 23:58:00     9.503215
2020-01-07 23:59:00     9.744828
Freq: 60s, Name: Activity, Length: 10080, dtype: float64

### Set up a mask

Most devices that have a wear sensor return this information as a binary time series with "1" when the device is most likely not worn and "0" otherwise.
In pyActigraphy, this information can be used to create a mask and thus invalidate the corresponding data points (set to "0" most probably). However, the mask, the value "1" correspond to "no masking". So, depending on your "non-wear" data, be careful to transform them appropriately:

In [17]:
# Here, I assume that 0: the device is worn, 1: device not worn. 
# As mentioned aboce, for the mask, 1: no masking. (NB: it is a convolution: data*mask)
raw.mask = np.abs(data['Non-wear']-1)

## Tests

In [18]:
raw.duration()

<604800 * Seconds>

In [19]:
raw.length()

10080

In [20]:
raw.ADAT(binarize=False)

14262.576447031717

In [21]:
raw.IV(binarize=False)

1.1156970219746885

In [22]:
# If you want to mask the data
raw.mask_inactivity = True

In [23]:
# For a gaussian noise, IV should be close to 2.
raw.IV(binarize=False)

1.9207928861776853

The masking seems to work!

Et voilà!