# _Machine Learning Project: "Classification of EMG Signals"_
## Introduction: EMG Data Analysis
This notebook is based on the EMG data analysis - a study conducted by the UC of Irvine in California. It presents an analysis of EMG data using machine learning models. The goal is to classify physical actions based on EMG signals.

Electromyography (EMG) measures muscle response or electrical activity in response to a nerve's stimulation of the muscle. The test is used to help detect neuromuscular abnormalities. During the test, one or more small needles (also called electrodes) are inserted through the skin into the muscle.

Possible Use cases for this type of analysis can be- medical diagnosis or build a human-robot interface.

Dataset: https://archive.ics.uci.edu/ml/datasets/EMG+Physical+Action+Data+Set

Derived from the work by:

Theo Theodoridis\
School of Computer Science and Electronic Engineering\
University of Essex

Prepared by:

Bernard Maacaron\
Anna Hauschild\
EMARO - European Master on Advanced Robotics\
University of Genova - Grande Ecole Centrale de Nantes



## Understanding the Dataset
Bio Electrical Signals collected from a group of 4 subjects who were asked to perform specific physical activities. The dataset contains 8 columns, each corresponding to a different electrode location on the body. The dataset is divided into two categories: normal and aggressive activities. The normal activities include standing, sitting, and hugging, while the aggressive activities include kicking, punching, and hammering.

- R-Bic: right bicep (C1)
- R-Tri: right tricep (C2)
- L-Bic: left bicep (C3)
- L-Tri: left tricep (C4)
- R-Thi: right thigh (C5)
- R-Ham: right hamstring (C6)
- L-Thi: left thigh (C7)
- L-Ham: left hamstring (C8)

Each file in the dataset contains overall 8 columns.

- Measurment frequency: $10^4\ \text{samples per second.}$

### Neurobiology and Machine Learning for Activity Prediction

In neurobiology, it has been observed that the same neural signals are activated when imagining an activity, such as jumping, as when physically performing the activity. This phenomenon presents a unique opportunity to leverage machine learning (ML) models for predicting physical activities based on neural signals. Such models have potential applications in controlling robotic movements and aiding in various diagnostic and monitoring tasks.

For each activity, neural signals are recorded from the onset to the completion of the activity. This results in approximately 10,000 rows of data for a single activity, capturing the dynamic nature of neural activity throughout its performance.


## Data Loading and Preprocessing
The dataset is loaded and preprocessed to prepare it for model training.\
It is available in the form of text files, where each file corresponds to a different physical activity. The data is loaded into a pandas DataFrame and preprocessed to ensure that it is in the correct format for training machine learning models.

In [10]:
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


data = pd.DataFrame()
Num_Subjects = 1
data_path = os.path.join('emg+physical+action+data+set','EMG Physical Action Data Set')

# Extracting the data from the files and storing it in a dataframe
for i in range(Num_Subjects):
    actions = {}
    action_ind = 0
    for action_type in ['Aggressive', 'Normal']:
        input_path = os.path.join(data_path, f'sub{i+1}', action_type, 'txt')
        
        for files in os.listdir(input_path):
            subject_path =  os.path.join(input_path, files)
            df = pd.read_csv(subject_path,
                                      sep='\t',
                                      names = ["ch" + str(i) for i in range(1, 9)],
                                      header=None)
            
            action_name = files[:-4]  # Extract action name from filename
            if action_name not in actions:
                actions[action_name] = action_ind  # Assign a new numeric label if action is not in the dictionary
                action_ind += 1

            df["Action"] = [action_name]*len(df)
            data = pd.concat([data, df], ignore_index=True)

### Exploratory data analysis

In [11]:
display(data.head())

Unnamed: 0,ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,Action
0,-245,266,2615,-29,-4000,-549,-4000,4000,Elbowing
1,-814,391,-22,-277,-4000,-130,-4000,4000,Elbowing
2,-445,257,-3628,-428,-4000,97,-4000,4000,Elbowing
3,-844,201,-4000,-498,-4000,62,-4000,4000,Elbowing
4,-1996,233,-4000,-552,-4000,109,-4000,4000,Elbowing


In [12]:
# Exploratory Data Analysis
display(data.info())
display(data.describe())

# Splitting the data into features and labels
X = data.drop('Action', axis=1)
y = data['Action']

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197058 entries, 0 to 197057
Data columns (total 9 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   ch1     197058 non-null  int64 
 1   ch2     197058 non-null  int64 
 2   ch3     197058 non-null  int64 
 3   ch4     197058 non-null  int64 
 4   ch5     197058 non-null  int64 
 5   ch6     197058 non-null  int64 
 6   ch7     197058 non-null  int64 
 7   ch8     197058 non-null  int64 
 8   Action  197058 non-null  object
dtypes: int64(8), object(1)
memory usage: 13.5+ MB


None

Unnamed: 0,ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8
count,197058.0,197058.0,197058.0,197058.0,197058.0,197058.0,197058.0,197058.0
mean,-20.789833,6.355266,-5.971526,-7.287119,8.434177,40.086898,32.72096,5.381725
std,1012.033723,676.259593,1264.403517,855.951252,2430.808542,1892.478019,2236.220824,1780.551824
min,-4000.0,-4000.0,-4000.0,-4000.0,-4000.0,-4000.0,-4000.0,-4000.0
25%,-142.0,-77.0,-141.0,-89.0,-1162.0,-342.0,-786.0,-360.0
50%,-11.0,4.0,-7.0,-12.0,24.0,28.0,24.0,20.0
75%,108.0,93.0,128.0,88.0,1148.0,439.0,887.0,373.0
max,4000.0,4000.0,4000.0,4000.0,4000.0,4000.0,4000.0,4000.0


In [13]:
display(np.unique(y)) # Displaying the unique labels
display(actions) # Displaying the actions dictionary
display(y.value_counts()) # Value counts of each action


array(['Bowing', 'Clapping', 'Elbowing', 'Frontkicking', 'Hamering',
       'Handshaking', 'Headering', 'Hugging', 'Jumping', 'Kneeing',
       'Pulling', 'Punching', 'Pushing', 'Running', 'Seating',
       'Sidekicking', 'Slapping', 'Standing', 'Walking', 'Waving'],
      dtype=object)

{'Elbowing': 0,
 'Frontkicking': 1,
 'Hamering': 2,
 'Headering': 3,
 'Kneeing': 4,
 'Pulling': 5,
 'Punching': 6,
 'Pushing': 7,
 'Sidekicking': 8,
 'Slapping': 9,
 'Bowing': 10,
 'Clapping': 11,
 'Handshaking': 12,
 'Hugging': 13,
 'Jumping': 14,
 'Running': 15,
 'Seating': 16,
 'Standing': 17,
 'Walking': 18,
 'Waving': 19}

Action
Headering       10000
Hamering        10000
Clapping        10000
Kneeing         10000
Seating         10000
Walking         10000
Waving          10000
Jumping         10000
Running          9964
Bowing           9830
Sidekicking      9829
Frontkicking     9811
Slapping         9788
Elbowing         9772
Hugging          9756
Standing         9725
Pushing          9676
Pulling          9659
Punching         9637
Handshaking      9611
Name: count, dtype: int64

## Model Exploration

### Random Forest Model
Random Forest is chosen for its robustness and ability to handle non-linear data. It's particularly good for classification tasks and can manage the high dimensionality of EMG data without extensive preprocessing.

In [14]:
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train_scaled, y_train)
rf_predictions = rf_model.predict(X_test_scaled)

# Evaluate Random Forest performance
# metrics like accuracy, precision, recall, F1 score

NameError: name 'X_train_scaled' is not defined

### Support Vector Machine (SVM) Model
SVM is selected for its effectiveness in high-dimensional spaces and its ability to use the kernel trick for non-linear classification.

In [None]:
from sklearn.svm import SVC

svm_model = SVC(kernel='linear')
svm_model.fit(X_train_scaled, y_train)
svm_predictions = svm_model.predict(X_test_scaled)

# Evaluate SVM performance
# metrics like accuracy, precision, recall, F1 score

## Conclusion
This notebook presented an analysis of EMG data using Random Forest and SVM models. The choice of models was based on their suitability for the dataset's characteristics. Further analysis could explore more complex models or deep learning approaches for potentially improved accuracy.

Future work may include the application of Convolutional Neural Networks (CNNs) to leverage spatial correlations in the data for potentially superior classification performance.