# Tutorial on how to use PAAT

This tutorial provides some basic introduction to [Python](https://www.python.org/) and [PAAT](https://github.com/Trybnetic/paat) to process and analyse ActiGraph's GT3X files. To get started easily, we provide a GT3X file in the `data/` folder that we will use in the following. If you are comming from ActiLife, the declarative way of using Python might be new to you. We will therefore in each step describe what is happening.

The first thing you need to do in any analysis with python is to declare your dependencies. In this tutorial this is very easy as we will only use PAAT and the `os` module to disable Tensorflow's logging:

In [1]:
import paat

# disable tensorflow logging
import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

The next thing we need to do is to load the GT3X file from `data/example.gt3x`. The `read_gt3x()` function returns a [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and an integer indicating the sampling frequency of the file in hertz. If you have worked with [R](https://www.r-project.org/) before, you can think of a Pandas DataFrame as like a `data.frame` object in R.

In [2]:
data, sample_freq = paat.read_gt3x('data/example.gt3x')

If you want, feel free to try loading you own GT3X file at this point by replacing `data/example.gt3x` with the [file path](https://www.wikihow.com/Find-a-File%27s-Path-on-Windows) to your GT3X file.

In PAAT we implemented different functions based on our lab's research on acceleration data processing to identify non wear time and sleep periods which we obviously recommend. Nevertheless, we have also implemented some other methods, like the non wear time detection method proposed by [Van Hees et al. (2011)](https://doi.org/10.1371/journal.pone.0022922). In the following, we will use the non wear time method as described by [Syed et al. (2021)](https://doi.org/10.1038/s41598-021-87757-z)

In [3]:
data.loc[:, "Non Wear Time"] = paat.detect_non_wear_time_syed2021(data, sample_freq)

and the sleep detection method described by [Weitz et al. (2022)](https://doi.org/10.1101/2022.03.07.22270992)

In [4]:
data.loc[:, "Sleep"] = paat.detect_sleep_weitz2022(data, sample_freq)

We can always inspect the DataFrame, by calling it as the last command in a cell:

In [5]:
data

Unnamed: 0,X,Y,Z,Non Wear Time,Sleep
2022-01-03 10:20:00.000,0.804688,0.621094,0.085938,False,True
2022-01-03 10:20:00.010,0.804688,0.597656,0.085938,False,True
2022-01-03 10:20:00.020,0.804688,0.585938,0.078125,False,True
2022-01-03 10:20:00.030,0.804688,0.582031,0.074219,False,True
2022-01-03 10:20:00.040,0.800781,0.585938,0.074219,False,True
...,...,...,...,...,...
2022-01-03 10:29:59.950,0.289062,0.960938,-0.050781,False,False
2022-01-03 10:29:59.960,0.289062,0.960938,-0.054688,False,False
2022-01-03 10:29:59.970,0.285156,0.957031,-0.054688,False,False
2022-01-03 10:29:59.980,0.289062,0.957031,-0.054688,False,False


To analyze physical activity data, we implemented the standard vector magnitude based method to estimate physical activity levels which can be used with different thresholds. In the following, we use the thresholds presented by [Sanders et al. (2019)](https://doi.org/10.1080/02640414.2018.1555904) which are 0.069mg for moderate-to-vigorous activity and <0.015mg for sedentary activity:

In [6]:
data.loc[:, ["MVPA", "SB"]] = paat.calculate_pa_levels(data, sample_freq, mvpa_cutpoint=0.069, sb_cutpoint=0.015)

From all this data, we can now create one joint column that contains a label for each time step

In [7]:
data.loc[:, "Activity"] = paat.create_activity_column(data, columns=["SB", "MVPA", "Sleep", "Non Wear Time"])

The other columns `"Non Wear Time"`, `"Sleep"`, `"MVPA"` and `"SB"` are no longer needed so we can create a new DataFrame that only keeps the acceleration data and the just created `"Activity"` column:

In [8]:
data = data[["X", "Y", "Z", "Activity"]]
data

Unnamed: 0,X,Y,Z,Activity
2022-01-03 10:20:00.000,0.804688,0.621094,0.085938,Sleep
2022-01-03 10:20:00.010,0.804688,0.597656,0.085938,Sleep
2022-01-03 10:20:00.020,0.804688,0.585938,0.078125,Sleep
2022-01-03 10:20:00.030,0.804688,0.582031,0.074219,Sleep
2022-01-03 10:20:00.040,0.800781,0.585938,0.074219,Sleep
...,...,...,...,...
2022-01-03 10:29:59.950,0.289062,0.960938,-0.050781,SB
2022-01-03 10:29:59.960,0.289062,0.960938,-0.054688,SB
2022-01-03 10:29:59.970,0.285156,0.957031,-0.054688,SB
2022-01-03 10:29:59.980,0.289062,0.957031,-0.054688,SB


Finally, when you are done with processing, you need to export your data. This can be done using Pandas' [to_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) to export the DataFrame as a Comma Seperated Value (CSV) file which can be for example imported to Excel:

In [9]:
data.to_csv("data/raw_data.csv")

You can also create the ActiLife counts and export them:

In [10]:
counts = paat.calculate_actigraph_counts(data, sample_freq, "10s")
counts.to_csv("data/count_data.csv")