# Feature Extraction Pipeline

In this tutorial, we show how to use the pipeline to extract features from a Physionet database.

We extract features from the [ECGRDVQ](https://physionet.org/content/ecgrdvq/1.0.0/) database.  From the database description: "The ECGRDVQ database contains 5232 extracted 10-second standard 12 lead ECG segments".
The meta-data of the database (```ecgrdvq_metadata```), which is a dictionary consisting of the record names ```record_name``` and their coressponding public directories ```public_dir```, is avaialbe in ```cmda.data.wfdb```.
The pipeline includes importing the data, applying filters to the records and feature extraction from records. To import the data from Physionet WFDB we use the ```cmda.read_data.ReadWFDB```. We create the filtering object by ```cmda.filter.Filters``` and create the feature object by ```cmda.feature_extraction.Features```. A detailed tutorial of creating feature objects can be found here.

In [1]:
from cmda.data.wfdb import ecgrdvq_metadata
from cmda.read_data import ReadWFDB
from cmda.filter import Filters
from cmda.feature_extraction import Features
from cmda.pipeline import Pipeline

# Load the meta-data
metadata = ecgrdvq_metadata()

# get the record names and their corresponding public directories
# In this example we select the first 12 instances to have a fast results
record_names = metadata['record_name'][0:12]
public_dir = metadata['public_dir'][0:12]

# Build the importer object using ReadWFDB 
# Set the channels to ["II",'V1'] to import these channels exclusively. 
importer = ReadWFDB(record_names=record_names, public_dir=public_dir, channels=['II','V1'])

# Create the filter object and add a buuterworth low-pass filter
filters = Filters()
filters.add.butter_filter(cutoff=60, btype="lowpass")

# Create the feature object
features = Features()
features.add.mnf()
features.add.stdf()
features.add.band_sum(low=1, high=7)

In [2]:
# Build the pipeline
ecgrdvq_pipeline = Pipeline(importer=importer,features=features, filters=filters)

# Run the pipeline
# Set the dataframe_output 
res = ecgrdvq_pipeline.run(n_jobs=4, dataframe_output = True)

Running the pipeline on 12 instances...



100%|██████████| 12/12 [00:05<00:00,  2.04it/s]

finished!





In [3]:
# the extracted features
res

Unnamed: 0,II_mnf,II_stdf,II_band_sum,V1_mnf,V1_stdf,V1_band_sum
491af4aa-941a-4a89-b74c-b38d91cfc5e9,8.667959,6.152966,0.46475,7.560547,5.179558,0.503667
db4d09aa-f26c-4acb-92fd-6ac316918bc8,8.791343,6.136429,0.466031,7.498114,5.225458,0.512484
dd3caf18-354d-4c81-9ff5-6aed843cd84b,8.994343,6.254645,0.427771,7.572526,5.366888,0.492164
12133f6e-efcf-48cc-a184-2e4d7cc05a89,8.70413,6.257274,0.449003,7.631221,5.24302,0.497363
2c179592-3c18-47f2-930a-7f17ae4bc596,8.956668,6.766684,0.441981,7.723886,5.29362,0.484876
43457f03-eb84-49ef-a00a-8cbad7d5108d,9.5746,7.229321,0.375681,7.931221,8.430665,0.47349
00ed2097-cd14-4f03-ab33-853da5be5550,9.279138,6.419775,0.410092,7.540609,5.198409,0.493821
9d7c5729-9458-4f8f-bf32-532d6b5ee5a4,9.025684,6.204357,0.421404,7.546192,5.129932,0.49171
cd50da94-9c66-4b62-88a2-5879fbc17999,9.012871,6.206022,0.431222,7.681139,5.088155,0.488235
3289874c-4da9-4a88-b68f-517ebacf0862,8.821893,6.395041,0.429677,7.547925,5.285196,0.487406
