# Introduction
This notebook explores the impact of sensor reduction on the model's ability to differentiate between similar sign language movements. By limiting the data to a specific combination of sensors that remain consistent across movements, we aim to understand how sensor selection influences the model's accuracy and ability to distinguish between similar signs, based on out current dataset.


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from keras.models import Sequential
from keras.layers import SimpleRNN, Dense, Dropout
from keras.callbacks import EarlyStopping
from keras.layers import SimpleRNN, Bidirectional, BatchNormalization





# Data Preparation and Columns Reduction
In this section, we use a function to prepare our existing dataset, on which the model will be training, by selecting a specific combination of columns. This selection will, in the next sections, be adapted to two scenarios : Sensors' Reduction and Recording Frequency Deceleration.


In [2]:
def prepare_dataframe(dataframes, prefixes, suffixes):
    # Concatenate the dataframes
    df = pd.concat(dataframes, ignore_index=True)

    # Identify columns to keep based on sensor names
    cols_to_keep = []
    for prefix in prefixes or ['']:
        for suffix in suffixes or ['']:
            cols_to_keep.extend([col for col in df.columns if col.startswith(prefix) and col.endswith(suffix)])

    # Add the last column to the list of columns to keep
    cols_to_keep.append(df.columns[-1])

    # Keep only the columns corresponding to the sensors in sensors_to_keep
    df = df[cols_to_keep]

    return df

In [6]:
def train_model(dataframes, prefixes, suffixes):

    # Keep only the columns corresponding to the sensors in sensors_to_keep
    df = prepare_dataframe(dataframes, prefixes, suffixes)

    # Convert all feature columns to numeric and set non-convertible values to NaN
    for col in df.columns[:-1]:  # Excluding the last column
        df[col] = pd.to_numeric(df[col], errors='coerce')

    # Removing rows with NaN values
    df.dropna(inplace=True)

    # Separate features and labels
    X = df.iloc[:, :-1].values  # All columns except the last one
    y = df.iloc[:, -1].values   # Only the last column

    # Scale the features
    scaler = MinMaxScaler()
    X = scaler.fit_transform(X)

    # Reshape X to fit the RNN model (samples, time steps, features)
    X = X.reshape((X.shape[0], 1, X.shape[1]))

    # Encode the labels
    encoder = OneHotEncoder(sparse=False)
    y_encoded = encoder.fit_transform(y.reshape(-1, 1))

    # Define the RNN model
    model_rnn = Sequential()
    model_rnn.add(Bidirectional(SimpleRNN(30, activation='relu', return_sequences=True), input_shape=(X.shape[1], X.shape[2])))
    model_rnn.add(BatchNormalization())
    model_rnn.add(SimpleRNN(32, activation='relu'))
    model_rnn.add(Dropout(0.3))
    model_rnn.add(Dense(16, activation='relu'))
    model_rnn.add(Dense(y_encoded.shape[1], activation='softmax'))

    # Compile the model with categorical_crossentropy loss function
    model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    # Add EarlyStopping as a callback
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

    # Split the dataset into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

    # Train the model
    history = model_rnn.fit(X_train, y_train, epochs=100, validation_data=(X_test, y_test), callbacks=[early_stopping])
    return history


In [7]:
df_1 = pd.read_csv('../dataset/sensor_data_badr.csv')
df_2 = pd.read_csv('../dataset/sensor_data_mouad.csv')
df_3 = pd.read_csv('../dataset/sensor_data_ismail.csv')

# Model Training and Evaluation with Reduced Sensor Data
Here, we retrain our model using the reduced sensor dataset. The columns' selection is based on the hypothesis that these sensors will provide similar readings across different sign language movements, thereby challenging the model's differentiation capability. The focus is to observe how the model performs when provided with data that is potentially less distinctive between different signs.

In [8]:
prefixes = ['Flex-Right-1']

history = train_model([df_1, df_2, df_3], prefixes, None)

# Access the loss and accuracy values
train_loss = history.history['loss']
val_loss = history.history['val_loss']
train_accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']

print('Train loss: ',train_loss[-1])
print('Validation loss: ', val_loss[-1])
print('Train accuracy: ', train_accuracy[-1])
print('Validation accuracy: ', val_accuracy[-1])



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Train loss:  0.5930998921394348
Validation loss:  0.5831094980239868
Train accuracy:  0.7353351712226868
Validation accuracy:  0.7541899681091309


# Model Training with Reduced Recording Frequency
Here, we retrain our model using the reduced recording frequency. This means that the columns' selection is based on the recution of the number of frames for each sign, thereby challenging the model's robustness. The focus is to observe how the model performs when provided with less detailed sequence of frames.

In [9]:
# Impement a function that takes an integer n as parameters, and appends to a list the strings 'Frame-k', where k%n == 0 and k<=20
# In order to simulate reduced recording frequency
def get_frame_names(n):
    frame_names = []
    for k in range(1, 21):
        if k % n == 0:
            frame_names.append('Frame-' + str(k))
    return frame_names

In [10]:
suffixes = get_frame_names(4)
print(suffixes)

history = train_model([df_1, df_2, df_3], None, suffixes)

# Access the loss and accuracy values
train_loss = history.history['loss']
val_loss = history.history['val_loss']
train_accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']

print('Train loss: ',train_loss[-1])
print('Validation loss: ', val_loss[-1])
print('Train accuracy: ', train_accuracy[-1])
print('Validation accuracy: ', val_accuracy[-1])

['Frame-4', 'Frame-8', 'Frame-12', 'Frame-16', 'Frame-20']




Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Train loss:  0.006376589648425579
Validation loss:  0.0012447142507880926
Train accuracy:  0.9979050159454346
Validation accuracy:  1.0


# Conclusion
The results of this experiment underscore the importance of sensor diversity in the precise interpretation of sign language. The model's performance under reduced sensor data underscores the difficulty in distinguishing similar signs and suggests a potential requirement for a comprehensive sensor setup to attain optimal accuracy. Additionally, the model maintained its high performance even when presented with a less detailed sequence of frames, supporting the hypothesis that the model's accuracy is attributable to the distinct nature of the performed signs.