# Preparing Your Data for wearablehrv 

After you have gone through the individual pipeline and pre-processed and analyzed data for each participant, each device, and condition, you save your data with the format of [ParticipantID].csv, for instance, P01.csv, P02.csv, ..., P10.csv. These files are now ready to be incorporated into the Group pipeline of `wearablehrv`.

<div style="border:1px solid; padding:10px; border-radius:5px; margin:10px 0;">

**Note**: Throughout the example notebooks and also in the code, we used the term "<u>criterion</u>," which refers to the device that the rest of the devices are compared against. This is also referred to as "reference system," "ground truth," and "gold standard" in the literature. This is usually an electrocardiography (ECG) device.

</div>

## Previous Steps

If you have not done so, first take a look at the following notebooks:

- [How to prepare your data for the individual pipeline](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/individual_pipeline/1.individual_data_preparation.ipynb)
- [Preprocess your data](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/individual_pipeline/2.individual_data_preprocessing.ipynb)
- [Analyze your data](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/individual_pipeline/3.individual_data_analysis.ipynb)
- [Plot your data](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/individual_pipeline/4.individual_data_plotting.ipynb)


### Shape of datasets

Your data needs to be located in a folder. We suggest that you save your data with the participant's ID, like this:

- P01.csv
- P02.csv
- etc.

Each file should contain a table similar to what you created and saved with the individual script for each participant.

**Note: Do not place any other files in this folder.**

To test the following functionalities, you can download data from 10 participants. We have already run the data through the individual pipeline and saved the .csv files. When you run the following code, it will first ensure there is no data in your Cache, then download only the relevant files for Group analysis. 

For your personal analysis, you should replace this path with the location of your actual datasets that have gone through the individual pipeline.

In [None]:
# Importing Module
import wearablehrv

In [None]:
wearablehrv.data.clear_wearablehrv_cache() # To make sure your cache is removed
# To download and save only relevant files for group analysis
path = wearablehrv.data.download_data_and_get_path(["P01.csv", "P02.csv", "P03.csv", "P04.csv", "P05.csv", "P06.csv", "P07.csv", "P08.csv", "P09.csv", "P10.csv"])

In [None]:
import os 

# Check out what files you have downloaded and where are they located
print (path)
os.listdir(path)

The following section needs to be modified based on your datasets:

* `conditions`: the experimental conditions you used should be listed, for example: `['sitting', 'breathing']`.
* `devices`: the devices you used should be listed, for example: `['empatica', 'heartmath']`. Please ensure that this is exactly the same as the names used to save your files. Additionally, ensure that you always specify the criterion device as the last element of the list.
* `criterion`: specify the name of your criterion device, for example: `vu`. Please ensure that this is exactly the same as the name used to save your file.
* `features`: the HRV features that you wish to include in your final group analysis. 
  
**Note:** Make sure to include `nibi_after_cropping` and `artefact` your `features` list.  

In [None]:
# Define your experimental conditions
conditions = ['sitting', 'arithmetic', 'recovery', 'standing', 'breathing', 'neurotask', 'walking', 'biking'] 

# Define the devices you want to validate against the criterion. 
# Note: MAKE SURE TO PUT THE CRITERION DEVICE THE LAST ONE IN THE LIST 
devices = ["kyto", "heartmath", "rhythm", "empatica", "vu"] 

# Redefine the name of the criterion device
criterion = "vu" 
features = ["rmssd", "hf",'pnni_50','mean_hr','sdnn', 'nibi_after_cropping', 'artefact'] # Make sure at least all these features exist, but feel free to add more 

## Importing Data

Once you have set all these up, **it is very easy to read all your files from all your participants in one go**. You just need to run the following code.

In [None]:
data, file_names = wearablehrv.group.import_data (path, conditions, devices, features)

`data` is now a dictionary. For instance, if you want to retrieve the RMSSD values for the "kyto" device in the "biking" condition for all participants, you can simply run the following code:

In [None]:
data["kyto"]["rmssd"]["biking"]

### Handling NaN Values

When running the individual pipeline for each participant, if certain devices or conditions are missing for a participant, the calculation of time or frequency domains may not be performed. As a result, NaN values are replaced for these cases. This interference affects some of the upcoming statistical analyses. By utilizing the `nan_handling` function provided below, any `[nan]` values can be transformed, if present, into empty brackets `[]`, and this will solve the issue.

**Note:** It is suggested to always run the following code after reading your files. 

In [None]:
data = wearablehrv.group.nan_handling (data, devices, features, conditions)

That's it! At this point, you should have been able to read all your data from all individual participants, and removed NaN values.

## Next Steps

You're now ready to move on to the next notebook examples. 

Continue by consulting: 

- [Determine the signal quality of your wearables](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/group_pipeline/2.group_signal_quality.ipynb)