### Testing the DIHC Feature Manager Package

In [1]:
# %pip install ipywidgets tqdm
# jupyter nbextension enable --py widgetsnbextension
# !jupyter nbextension enable --py widgetsnbextension

#### Settings and Data Reading

#####  Importing the package
Load the package "DIHC_FeatureManager" which is in the same directory as this notebook (or your main python script/notebook)

In [2]:
# Importing necessary modules
import numpy as np
import pandas as pd
from DIHC_FeatureManager.DIHC_FeatureManager import *

##### Data loading
Reading sample data from the file "signal_data.csv" which is in the same directory as this notebook (or your main python script/notebook)

In [3]:
print(f'Data reading started...')
sample_df = pd.read_csv('./signal_data.csv')
print(f'Data reading completed...')
print(f"Data read from file: ")
sample_df

Data reading started...
Data reading completed...
Data read from file: 


Unnamed: 0,signal,label
0,-17.777778,0
1,0.195360,0
2,0.195360,0
3,0.586081,0
4,0.195360,0
...,...,...
921595,-33.797314,0
921596,-27.545788,0
921597,-17.777778,0
921598,-8.791209,0


##### Data inspection
Observing the column names, shape and other basic information of the data

In [4]:
print(f'Columns available in the dataframe: {sample_df.columns}')
print(f'Columns available in the dataframe: {sample_df.shape}')
print(f'Dataframe details: ')
sample_df.info()

Columns available in the dataframe: Index(['signal', 'label'], dtype='object')
Columns available in the dataframe: (921600, 2)
Dataframe details: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 921600 entries, 0 to 921599
Data columns (total 2 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   signal  921600 non-null  float64
 1   label   921600 non-null  int64  
dtypes: float64(1), int64(1)
memory usage: 14.1 MB


##### Settings and partial sample data access
This data file contains 3 columns: "time", "signal", and "label".
Only the "signal" column that contains the (time series) data will be used for feature extraction.
For simplicity only the first few seconds of the signal and few segments are used for feature extraction.

In [5]:
# signal_frequency = 10
# segment_length = 5
# segment_overlap = 0
# total_segments = 40000

# signal_frequency = 32
# segment_length = 90*60
# segment_overlap = 0
# total_segments = 4

signal_frequency = 256
segment_length = 5
segment_overlap = 0
total_segments = 4

print(f"Signal frequency: {signal_frequency} Hz \nSegment length: {segment_length} s \nSegment overlap: {segment_overlap} s \nTotal segments: {total_segments}")

Signal frequency: 256 Hz 
Segment length: 5 s 
Segment overlap: 0 s 
Total segments: 4


In [6]:
print(f'Data sub-sampling started...')
# sample_data = np.array([52, 54, 6, 45, 14, 40, 42, 48, 52, 20, 28, 8, 63, 47, 23])

# sample_data = sample_df['signal'].values.tolist()
sample_data = sample_df.loc[:(total_segments*segment_length)*signal_frequency-1, 'signal'].values#.tolist()
# sample_data = sample_df.loc[:(total_segments*segment_length)*signal_frequency-1, 'signal'].values#.tolist()
# sample_data = sample_df.loc[:20*signal_frequency-1, 'signal'].values#.tolist()
# sample_data = sample_df.loc[:5100, 'signal'].values#.tolist()
# sample_data = sample_df.iloc[:20*256-1, 0:1].values#.tolist()
# print(len(sample_data))
# print(sample_data.shape, sample_data)
print(f'Data sub-sampling completed...')

print(f"Sample data shape: {sample_data.shape}")
sample_data

Data sub-sampling started...
Data sub-sampling completed...
Sample data shape: (5120,)


array([-17.77777778,   0.19536019,   0.19536019, ...,  -0.58608059,
         0.19536019,  -0.19536019])

#### Segmentation Test

##### Segmentation using `get_segments_for_data` function
- Create the object of the class "DIHC_FeatureManager" and call the method "get_segments_for_data" to extract features from the data.
- Use different parameters of the method "get_segments_for_data" to extract different number of segments.

In [7]:
print(f'Data segmentation started...')

feature_manager = DIHC_FeatureManager()
segmented_data_array = feature_manager.get_segments_for_data(sample_data, segment_length=segment_length, segment_overlap=segment_overlap, signal_frequency=signal_frequency)
# segmented_data_array = feature_manager.get_segments_for_data(sample_data, segment_length=5, signal_frequency=signal_frequency)
# segmented_data_array = feature_manager.get_segments_for_data(sample_data, segment_length=5, segment_overlap=3, signal_frequency=signal_frequency)

print(f'Data segmentation completed...')

Data segmentation started...


Data segmentation started...:   0%|          | 0/4 [00:00<?, ?it/s]

Finished segmentation of data...:   0%|          | 0/4 [00:00<?, ?it/s]

Finished segmentation of data...:   0%|          | 0/4 [00:00<?, ?it/s]

Data segmentation completed...


##### Display the segmented data

In [8]:
print(f"Segmented data array shape: {segmented_data_array.shape}")
print(f"Extracted segments data: ")
segmented_data_array

Segmented data array shape: (4, 1280)
Extracted segments data: 


array([[-17.77777778,   0.19536019,   0.19536019, ..., -15.82417582,
        -15.43345543, -18.94993895],
       [-21.68498168, -18.94993895, -16.21489621, ...,  29.89010989,
         42.002442  ,  50.5982906 ],
       [ 59.97557998,  67.78998779,  74.43223443, ...,   0.58608059,
          3.32112332,   4.88400488],
       [  4.49328449,   2.93040293,   4.88400488, ...,  -0.58608059,
          0.19536019,  -0.19536019]])

#### Feature Extraction Test

##### Feature extraction using `extract_features_from_data` function
- Create the object of the class "DIHC_FeatureManager" and call the method "extract_features_from_data" to extract features from the data
- Use different parameters of the method "extract_features_from_data" to extract different features
- Additionally can remove some computationally expensive features to save time and/or solve the memory exhaustion problem for larger segments

###### Select features and/or remove computationally expensive features

In [9]:
# Comment of uncommenting the following line to remove computationally expensive features

# feature_names = None  ###For all features- None works similarly as [DIHC_FeatureGroup.all]
# feature_names = [DIHC_FeatureGroup.tdNl, DIHC_FeatureGroup.fdLin]  ###For some specific features- in this case, Time-domain non-linear and frequency domain linear features
feature_names = DIHC_FeatureGroup.remove_computationally_expensive_features( comp_exp_list_index=4 ) ###For all features, except the level-4 computationally expensive ones
# feature_names = DIHC_FeatureGroup.remove_computationally_expensive_features( feature_list=[DIHC_FeatureGroup.tdNl, DIHC_FeatureGroup.fdLin], comp_exp_list_index=4 ) ###For some specific features- in this case, Time-domain non-linear and frequency domain linear features, except the level-4 computationally expensive ones

print(f"Final feature list: {feature_names}")

Final feature list: [<DIHC_FeatureGroup.all: ['maximum', 'minimum', 'mean', 'median', 'standardDeviation', 'variance', 'kurtosis', 'skewness', 'numberOfZeroCrossing', 'positiveToNegativeSampleRatio', 'positiveToNegativePeakRatio', 'meanAbsoluteValue', 'approximateEntropy', 'sampleEntropy', 'permutationEntropy', 'singularValueDecompositionEntropy', 'fuzzyEntropy', 'shannonEntropy', 'renyiEntropy', 'lempelZivComplexity', 'hjorthMobility', 'hjorthComplexity', 'fisherInfo', 'petrosianFd', 'katzFd', 'higuchiFd', 'detrendedFluctuation', 'fd_maximum', 'fd_minimum', 'fd_mean', 'fd_median', 'fd_standardDeviation', 'fd_variance', 'fd_kurtosis', 'fd_skewness', 'fd_maximum_alpha', 'fd_minimum_alpha', 'fd_mean_alpha', 'fd_median_alpha', 'fd_standardDeviation_alpha', 'fd_variance_alpha', 'fd_kurtosis_alpha', 'fd_skewness_alpha', 'fd_maximum_beta', 'fd_minimum_beta', 'fd_mean_beta', 'fd_median_beta', 'fd_standardDeviation_beta', 'fd_variance_beta', 'fd_kurtosis_beta', 'fd_skewness_beta', 'fd_maximum_

In [10]:
print(f'Feature extraction started...')

feature_manager = DIHC_FeatureManager()
feature_df = pd.DataFrame()

if feature_names is None:
    feature_df = feature_manager.extract_features_from_data(sample_data, segment_length=segment_length, segment_overlap=segment_overlap, signal_frequency=signal_frequency)
else:
    feature_df = feature_manager.extract_features_from_data(sample_data, feature_names=feature_names, segment_length=segment_length, segment_overlap=segment_overlap, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, segment_length=segment_length, segment_overlap=segment_overlap, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, segment_length=5, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, segment_length=5, segment_overlap=4, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, feature_names=[DIHC_FeatureGroup.fdNlPw, DIHC_FeatureGroup.fdNlPwBnd], segment_length=5, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, feature_names=[DIHC_FeatureGroup.tdNlEn, DIHC_FeatureGroup.td], segment_length=5, signal_frequency=signal_frequency)
# feature_df = feature_manager.extract_features_from_data(sample_data, feature_names=[DIHC_FeatureGroup.tdNlEn, DIHC_FeatureGroup.tdNl], segment_length=5, signal_frequency=signal_frequency)

print(f'Feature extraction completed...')

Feature extraction started...
Data started segmenting for features: [<DIHC_FeatureGroup.all: ['maximum', 'minimum', 'mean', 'median', 'standardDeviation', 'variance', 'kurtosis', 'skewness', 'numberOfZeroCrossing', 'positiveToNegativeSampleRatio', 'positiveToNegativePeakRatio', 'meanAbsoluteValue', 'approximateEntropy', 'sampleEntropy', 'permutationEntropy', 'singularValueDecompositionEntropy', 'fuzzyEntropy', 'shannonEntropy', 'renyiEntropy', 'lempelZivComplexity', 'hjorthMobility', 'hjorthComplexity', 'fisherInfo', 'petrosianFd', 'katzFd', 'higuchiFd', 'detrendedFluctuation', 'fd_maximum', 'fd_minimum', 'fd_mean', 'fd_median', 'fd_standardDeviation', 'fd_variance', 'fd_kurtosis', 'fd_skewness', 'fd_maximum_alpha', 'fd_minimum_alpha', 'fd_mean_alpha', 'fd_median_alpha', 'fd_standardDeviation_alpha', 'fd_variance_alpha', 'fd_kurtosis_alpha', 'fd_skewness_alpha', 'fd_maximum_beta', 'fd_minimum_beta', 'fd_mean_beta', 'fd_median_beta', 'fd_standardDeviation_beta', 'fd_variance_beta', 'fd_

Feature extraction started...:   0%|          | 0/4 [00:00<?, ?it/s]

Feature extraction started...:   0%|          | 0/76 [00:00<?, ?it/s]

Feature extraction started...:   0%|          | 0/76 [00:00<?, ?it/s]

Feature extraction started...:   0%|          | 0/76 [00:00<?, ?it/s]

Feature extraction started...:   0%|          | 0/76 [00:00<?, ?it/s]

Finished extracting features for all segments...:   0%|          | 0/4 [00:00<?, ?it/s]

Finished extracting features for all segments...:   0%|          | 0/4 [00:00<?, ?it/s]

Feature extraction completed...


##### Display all features

In [11]:
print(f"For a total of {feature_df.shape[0]} segments, {feature_df.shape[1]} features were extracted")
print(f"The name of the features are: {feature_df.columns}")
print(f"Extracted features: ")
feature_df


For a total of 4 segments, 76 features were extracted
The name of the features are: Index(['maximum', 'minimum', 'mean', 'median', 'standardDeviation', 'variance',
       'kurtosis', 'skewness', 'numberOfZeroCrossing',
       'positiveToNegativeSampleRatio', 'positiveToNegativePeakRatio',
       'meanAbsoluteValue', 'approximateEntropy', 'sampleEntropy',
       'permutationEntropy', 'singularValueDecompositionEntropy',
       'fuzzyEntropy', 'shannonEntropy', 'renyiEntropy', 'lempelZivComplexity',
       'hjorthMobility', 'hjorthComplexity', 'fisherInfo', 'petrosianFd',
       'katzFd', 'higuchiFd', 'detrendedFluctuation', 'fd_maximum',
       'fd_minimum', 'fd_mean', 'fd_median', 'fd_standardDeviation',
       'fd_variance', 'fd_kurtosis', 'fd_skewness', 'fd_maximum_alpha',
       'fd_minimum_alpha', 'fd_mean_alpha', 'fd_median_alpha',
       'fd_standardDeviation_alpha', 'fd_variance_alpha', 'fd_kurtosis_alpha',
       'fd_skewness_alpha', 'fd_maximum_beta', 'fd_minimum_beta',
      

Unnamed: 0,maximum,minimum,mean,median,standardDeviation,variance,kurtosis,skewness,numberOfZeroCrossing,positiveToNegativeSampleRatio,...,fd_skewness_theta,fd_maximum_gamma,fd_minimum_gamma,fd_mean_gamma,fd_median_gamma,fd_standardDeviation_gamma,fd_variance_gamma,fd_kurtosis_gamma,fd_skewness_gamma,spectralEntropy
0,67.79,-88.11,0.44,0.2,27.23,741.72,0.14,-0.33,86.0,1.04,...,0.22,7304.63,64.79,1324.17,757.75,1414.1,1999673.39,4.19,1.96,3.59
1,92.01,-81.86,2.73,2.74,26.46,700.24,0.18,0.02,77.0,1.16,...,0.7,5466.18,39.44,1173.54,874.03,1058.23,1119851.09,3.99,1.79,3.69
2,96.7,-133.43,-3.39,-2.54,35.9,1288.69,0.9,-0.08,82.0,0.89,...,0.06,9289.86,50.07,1675.66,1001.09,1690.16,2856644.69,4.43,1.8,3.57
3,62.32,-66.62,2.58,2.93,23.42,548.52,0.2,-0.23,107.0,1.35,...,0.25,5741.63,37.5,1103.65,748.27,1120.48,1255472.71,5.98,2.23,3.8


##### Save all features

In [12]:
# # save_file_path = './all_features_matlab.csv'
# save_file_path = './all_features_python.csv'
# feature_df.to_csv(save_file_path, index=False)
# print(f"All features successfully saved to: {save_file_path}")

#### Entropy (SampEn) Profile Test

##### Sample Entropy (SampEn) Profile extraction
- Create the object of the class "DIHC_FeatureManager" and call the method "extract_sampEn_profile_from_data" to extract Sample entropy (SampEn) profile from the data
- Use different parameters of the method "extract_sampEn_profile_from_data" to extract entropy profile for Sample entropy (SampEn)

In [13]:
print(f'Entropy profile extraction started...')

feature_manager = DIHC_FeatureManager()

sampEn_Profile_df = feature_manager.extract_sampEn_profile_from_data(sample_data, segment_length=segment_length, segment_overlap=segment_overlap, signal_frequency=signal_frequency)
# sampEn_Profile_df = feature_manager.extract_sampEn_profile_from_data(sample_data, segment_length=5, signal_frequency=signal_frequency)
# sampEn_Profile_df = feature_manager.extract_sampEn_profile_from_data(sample_data, segment_length=5, signal_frequency=signal_frequency)
# sampEn_Profile_df = feature_manager.extract_sampEn_profile_from_data(sample_data, segment_length=5, signal_frequency=signal_frequency)
# sampEn_Profile_df = feature_manager.extract_sampEn_profile_from_data(sample_data, segment_length=5, segment_overlap=0, signal_frequency=signal_frequency)

print(f'Entropy profile extraction completed...')

Entropy profile extraction started...
Entropy profile calculation started...


Entropy profile calculation started...:   0%|          | 0/4 [00:00<?, ?it/s]

Entropy profile extraction completed...


##### Display entropy profile data

In [14]:
print(f"SampEn entropy profile shape: {sampEn_Profile_df.shape}")
print(f"SampEn entropy profile values: ")
sampEn_Profile_df

SampEn entropy profile shape: (1722, 2)
SampEn entropy profile values: 


Unnamed: 0,Segment_No,sampEn_profile
0,1,3.595747
1,1,2.218863
2,1,1.868785
3,1,1.579396
4,1,1.368360
...,...,...
1717,4,0.000006
1718,4,0.000004
1719,4,0.000002
1720,4,0.000001


##### Save entropy profile data

In [15]:
# # save_file_path = './sampEn_profile_data_matlab.csv'
# save_file_path = './sampEn_profile_data_python.csv'
# sampEn_Profile_df.to_csv(save_file_path, index=False)
# print(f"SamEn profile data successfully saved to: {save_file_path}")