# Signal Quality

After you load the data from all your participants, which includes calculated outcome variables for HR and HRV in all devices and experimental conditions, an important step is to determine the overall signal quality in each device and condition.

`wearablehrv` makes this really easy by incorporating a comprehensive signal quality check function. It also enables you to generate informative .csv reports and visualizations that can immediately help you understand the signal quality of your devices.

<div style="border:1px solid; padding:10px; border-radius:5px; margin:10px 0;">

**Note**: Throughout the example notebooks and also in the code, we used the term "<u>criterion</u>," which refers to the device that the rest of the devices are compared against. This is also referred to as "reference system," "ground truth," and "gold standard" in the literature. This is usually an electrocardiography (ECG) device.

</div>

## Previous Steps

If you have not done so, first take a look at the following notebooks:

- [How to prepare your data for the group pipeline](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/group_pipeline/1.group_data_preparation.ipynb)

In [None]:
# Importing Module
import wearablehrv

The code in the following cell has been explained in the previous notebook. Run it, so we can continue with the examples in this notebook.

In [None]:
wearablehrv.data.clear_wearablehrv_cache() 
path = wearablehrv.data.download_data_and_get_path(["P01.csv", "P02.csv", "P03.csv", "P04.csv", "P05.csv", "P06.csv", "P07.csv", "P08.csv", "P09.csv", "P10.csv"])
conditions = ['sitting', 'arithmetic', 'recovery', 'standing', 'breathing', 'neurotask', 'walking', 'biking'] 
devices = ["kyto", "heartmath", "rhythm", "empatica", "vu"] 
criterion = "vu" 
features = ["rmssd", "hf",'pnni_50','mean_hr','sdnn', 'nibi_after_cropping', 'artefact'] 
data, file_names = wearablehrv.group.import_data (path, conditions, devices, features)
data = wearablehrv.group.nan_handling (data, devices, features, conditions) 

## Calculating Signal Quality

`signal_quality` function helps you report signal quiality and to exclude readings from the devices where the signal quality is deemed insufficient according to certain thresholds. Exclusion of the poor data is optional, and is a matter of preference.

Two key metrics are used to determine signal quality:

- `nibi_after_cropping`: The number of detected beat-to-beat intervals. A difference of more than `ibi_threshold` (20% by default) between a given device and the criterion device for the same participant and condition indicates a poor signal.

- `artefact`: The number of detected artefacts. If the artefacts represent more than `artefact_threshold` (20% by default) of the detected beats in the given device for the same participant and condition, this also indicates a poor signal.

If `exclude = True`, <u>all feature</u> values for that participant and condition will be replaced with empty lists. Otherwise, they are kept for further analysis.

Both the `ibi_threshold` and `artefact_threshold` parameters can be tuned according to the specific needs of your analysis. A lower value makes the criteria more stringent, leading to more readings being excluded, while a higher value makes the criteria more lenient, leading to fewer exclusions.

In cases where very few beats are detected in your signal, you may consider manually flagging them as missing. To do this, set `manual_missing=True`, and define a `missing_threshold`. For example, setting `missing_threshold` to 10 means that if only 10 beats are detected, the signal will be flagged as missing.

After processing, the code generates two pandas dataframes that can be saved if `save_as_csv = True`:

- `quality_report1.csv`: Detailed report showing for each participant, device, and condition, the number of detected beats and artefacts, and the decision to keep or exclude the data.

- `quality_report2.csv`: Summary report showing the total count and percentage of decisions ("Acceptable", "Poor", "Missing") for each device and condition.

The function saves the outputs of the signal quality assessment in two variables, `summary_df` and `quality_df`, which can be used for further plotting in the upcoming functions.

At the end of the process, 'artefact' and 'nibi_after_cropping' data is removed from the `data` and `features` variable.

**Note: it is important to run this code, even though you may not wish to save the reports or exclude the outliers.**

In [None]:
data, features, summary_df, quality_df = wearablehrv.group.signal_quality (data, path, conditions, devices, features, criterion,  file_names, exclude = False, save_as_csv = False, ibi_threshold = 0.20, artefact_threshold = 0.20, manual_missing=False, missing_threshold=10)

`quality_df` shows you for each device, participant, condition, how many beats were detected in that device, in the criterion device, and how many artifacts were detected in the device. It also then gives you the decision made based on the thresholds you define, whether this is an acceptable signal quality, poor, or whether the data is missing. 

In [None]:
quality_df

It is then really convenient to immediately check, for instance, if you want to know in the "sitting" condition, in the "heartmath" device, how the signal quality looks like, by running the following code:

In [None]:
quality_df[(quality_df["Device"] == "heartmath") & (quality_df["Condition"] == "sitting")]["Decision"].value_counts()

If you take a look at the `summary_df`, you can see it's a dataframe. You can clearly see some interesting patterns: for instance, for the Empatica device, out of 4 available participants (Total = 4), none of the signal qualities was acceptable in the biking condition.

What constitutes a poor signal again? Based on the threshold we used, either more than 20% of the detected beats included artifacts, or detected beats in Empatica in this condition deviated more than 20% from the criterion (VU) device. That would be considered a poor signal for us here.

Obviously, you can change these thresholds and get different results.

In [None]:
summary_df

## Plotting Signal Quality

After using the `signal_quality` function, you now have two plotting options to visually inspect your signal quality. 

To use `signal_quality_plot1`, you first need to define a `condition_mapping` dictionary that helps you group together the conditions and plot them in a more meaningful way. For instance, it can be based on your guess about the movement involved in each condition (or objectively based on accelerometer data, if available), or anything else that you wish. You can exclude the criterion device from being included in the calculation of the plot, if `criterion_exclusion = True`. If `device_selection = False`, this plot is created regardless of the devices (pooled together), whereas if you wish to plot the signal quality for a specific device, then set `device_selection = True`, and select the device, for instance, by `device = 'empatica'`. An example of a `condition_mapping` dictionary is as follows:

In [None]:
# Categorizing conditions
condition_mapping = {
    'sitting': 'no movement', 
    'recovery': 'no movement', 
    'breathing': 'no movement', 
    'standing': 'no movement',
    'arithmetic': 'subtle movement', 
    'neurotask': 'subtle movement',
    'walking': 'involves movement', 
    'biking': 'involves movement'
}

In [None]:
wearablehrv.group.signal_quality_plot1 (summary_df, condition_mapping, criterion, device_selection = False, device=None, criterion_exclusion = True, x_label = "'Condition Categories")

You can now see that, in the conditions that involved movement (defined by me as "walking" and "biking" conditions), almost 48.75% of the signal was flagged as poor based on the defined criteria. 25% is also missing, and only 26.25% is acceptable.

If you wish to see for a specific device, let's say, "heartmath," you can run the following code:

In [None]:
wearablehrv.group.signal_quality_plot1 (summary_df, condition_mapping, criterion, device_selection = True, device="heartmath", criterion_exclusion = True, x_label = "'Condition Categories")

Which you can see looks a lot better! 

You can also plot the signal quality for each device, regardless of the conditions, using the `signal_quality_plot2` function. If you wish to zoom into a specific condition, set `condition_selection = True`, and select the condition, for instance: `condition = "sitting"`.

In [None]:
wearablehrv.group.signal_quality_plot2 (summary_df, condition_selection=False, condition=None)

Here, for instance, you can see the overall device "rhythm" showing poor signal quality in all conditions 73.75% of the time. What if you zoom into, let's say, the "sitting" condition? 

In [None]:
wearablehrv.group.signal_quality_plot2 (summary_df, condition_selection=True, condition="sitting")

Then it's a bit better (50% poor signal quality).

That's it! At this point, you should have been able to determine your signal quality, which is a very essential component in reporting the validation of the devices.

## Next Steps

You're now ready to move on to the next notebook examples. 

Continue by consulting: 

- [Perform four major statistical analyses to determine validity](https://github.com/Aminsinichi/wearable-hrv/blob/master/docs/examples/group_pipeline/3.group_data_analysis.ipynb)