## EMG Signal Analysis and Feature Extraction

In this notebook, you will explore and extract meaningful features from EMG (Electromyography) signals from the temporal and spectral domain.
This task will be performed using the common libraries NeuroKit2, scipy.signal and pywt.

To get started, install neurokit2. To do it, you can run the following command on the vscode terminal.
- pip install neurokit2


In [None]:
import neurokit2 as nk
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
import pywt

### Temporal Features Extraction

Now that we have the required imports, we can extract features from the signal.
To extract features, there are 2 approaches you can use:
- The libraries we are using already provide certain features. You can check what features are extracted automatically with the libraries in the last blocks of this notebook.
- Certain features are statistical. We can use numpy to perform statistical operations on properties of the signal to acquire them.

In the script below we work as follows:
1. Clean the original singal using NeuroKit2.
2. Extract the RMS of the signal directly from the clean format.

Now you can add and remove features according to the list of features you chose for your project.

In [None]:
def extract_emg_temporal_features( emg_signal:np.array, sampling_rate:int ) -> tuple :
    """
    Extracts temporal features from the EMG signal.

    Parameters:
        emg_signal (np.array): The EMG signal
        sampling_rate (int): Sampling frequency in Hz

    Returns:
        tuple of the extracted temporal features
    """
    # Clean EMG signal using NeuroKit2
    emg_cleaned = nk.emg_clean(emg_signal, sampling_rate=sampling_rate)

    # Add features here
    rms = np.sqrt(np.mean(np.square(emg_cleaned)))

    return (rms)

### Spectral Features Extraction

Additionally, we can also extract spectral features from the signal.
To extrach EMG spectral features, we'll be using Welch's method to speed up and simplify the tranformation to spectral domain.
However, we also encourage to search the internet to use other techniques if you believe more suitable

In the script below we:
1. Clear the signal using NeuroKit2.
2. Extract the freqs and psd using the scipy.signal library, which will correspond to the frequencies we are detecting from the signal and the power of each frequency (respectively).
    - You can view them as a histogram, where freqs are the bars at the x axis and psd the y value of each bar. 
3. Extract the median frequency directly from the power distribution of the spectral domain.
4. Perform the wavelet-based frequency estimation and normalize the resultant powers
5. Extract the wavelet median frequency.

Now you can add and remove features according to the list of features you chose for your project.

In [None]:
def extract_emg_spectral_features(emg_signal: np.array, sampling_rate: int) -> tuple:
    """
    Extracts spectral features from an EMG signal.

    Parameters:
        emg_signal (np.array): The EMG signal
        sampling_rate (int): Sampling frequency in Hz

    Returns:
        tuple of the extracted spectral features.
    """
    # Clean the EMG signal using NeuroKit2
    emg_cleaned = nk.emg_clean(emg_signal, sampling_rate=sampling_rate)

    ## Frequency domain analysis (Welch's method)
    freqs, psd = signal.welch(emg_cleaned, fs=sampling_rate, nperseg=1024)

    #Add features here
    cumulative_power = np.cumsum(psd)
    total_power = cumulative_power[-1]
    median_freq = freqs[np.where(cumulative_power >= total_power / 2)[0][0]]

    ## Wavelet-based frequency estimation
    coeffs = pywt.wavedec(emg_cleaned, 'db4', level=4)
    wavelet_powers = [np.sum(np.square(c)) for c in coeffs[1:]]  # Ignore approximation
    wavelet_freqs = [sampling_rate / (2 ** (i + 1)) for i in range(len(wavelet_powers))]
    
    ## Normalize wavelet powers
    wavelet_total_power = sum(wavelet_powers)
    wavelet_cumsum = np.cumsum(wavelet_powers)

    #Add features here
    wmf = wavelet_freqs[np.where(wavelet_cumsum >= wavelet_total_power / 2)[0][0]]


    return (median_freq, wmf)


### NeuroKit2 Structure

Neurokit2 has the power to process multiple signals. For this module, we will be using the method emg_process() to obtain the working_data(DataFrame) and information(dict).
The script below is meant to help you understand how the DataFrame is structured.

In [None]:
sr = 1000

# Simulate EMG signal
emg_signal = nk.emg_simulate(duration=10, sampling_rate=sr, burst_number=3)

# Plot raw EMG signal
nk.signal_plot(emg_signal, sampling_rate=sr)
plt.show()

# Process the EMG signal
signals, info = nk.emg_process(emg_signal, sampling_rate=sr)

# Plot processed EMG: Clean signal + Amplitude (Envelope)
plt.figure(figsize=(12, 4))
plt.plot(signals["EMG_Clean"], label="EMG Clean", color='slateblue')
plt.plot(signals["EMG_Amplitude"], label="EMG Amplitude", color='orange')
plt.title("💪 Processed EMG Signal with Amplitude (Envelope)")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.legend()
plt.show()

# Print columns and first few rows
print("\n📊 Available columns in processed signal:")
print(signals.columns.tolist())

print("\n🔍 First 5 rows of processed signal:")
print(signals.head())