###Description

Data were acquired with three sensors (2 accelerometers - ADXL345/MMA8451Q and 1 gyroscope - ITG3200) at a frequency sample of 200 Hz



---


**Dataset fields:**

| Field | Description | Datatype |
| --- | --- | -- |
| ADXL345_X | X-axis acceleration data (ADXL345) | int |
| ADXL345_Y | Y-axis acceleration data (ADXL345) | int | 
| ADXL345_Z | Z-axis acceleration data (ADXL345) | int |
| ITG3200_X | X-axis rotation data (ITG3200) | int |
| ITG3200_Y | Y-axis rotation data (ITG3200) | int |
| ITG3200_Z | Z-axis rotation data (ITG3200) | int |
| MMA8451Q_X | X-axis acceleration data (MMA8451Q) | int |
| MMA8451Q_Y | Y-axis acceleration data (MMA8451Q) | int |
| MMA8451Q_Z | Z-axis acceleration data (MMA8451Q) | int |
| subject | Test participant code | string |
| trial | Test trial # | string |
| code | Activity code | string |
| code_ref | Activity code description | string |
| age | Test participant age | int |
| height | Test participant height | int |
| weight | Test participant weight | float |
| gender | Test participant sex | string |
| *class* | *Fall/No Fall (predictor label)* | *int* |

## Import Libraries

In [1]:
import glob
import os
import math
import time
import pandas as pd
import numpy as np
from typing import List

In [2]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

## Download Dataset

In [None]:
# Install googleDriveFileDownloader
!pip install googleDriveFileDownloader

In [None]:
# Download SisFall dataset from google drive

from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1orS2KkOVzBadAIOqN4Od6ZUKFhHolz5F&export=download")

In [None]:
# Unzip SisFall dataset from zip file
!unzip -oq SisFall_dataset.zip

## Load User data

In [3]:
# Subject attribute data from Readme.txt file
Subject_attrs = {"SA01": {"Age": 26, "Height": 165, "Weight": 53, "Gender": "F"},
                 "SA02": {"Age": 23, "Height": 176, "Weight": 58.5, "Gender": "M"},
                 "SA03": {"Age": 19, "Height": 156, "Weight": 48, "Gender": "F"},
                 "SA04": {"Age": 23, "Height": 170, "Weight": 72, "Gender": "M"},
                 "SA05": {"Age": 22, "Height": 172, "Weight": 69.5, "Gender": "M"},
                 "SA06": {"Age": 21, "Height": 169, "Weight": 58, "Gender": "M"},
                 "SA07": {"Age": 21, "Height": 156, "Weight": 63, "Gender": "F"},
                 "SA08": {"Age": 21, "Height": 149, "Weight": 41.5, "Gender": "F"},
                 "SA09": {"Age": 24, "Height": 165, "Weight": 64, "Gender": "M"},
                 "SA10": {"Age": 21, "Height": 177, "Weight": 67, "Gender": "M"},
                 "SA11": {"Age": 19, "Height": 170, "Weight": 80.5, "Gender": "M"},
                 "SA12": {"Age": 25, "Height": 153, "Weight": 47, "Gender": "F"},
                 "SA13": {"Age": 22, "Height": 157, "Weight": 55, "Gender": "F"},
                 "SA14": {"Age": 27, "Height": 160, "Weight": 46, "Gender": "F"},
                 "SA15": {"Age": 25, "Height": 160, "Weight": 52, "Gender": "F"},
                 "SA16": {"Age": 20, "Height": 169, "Weight": 61, "Gender": "F"},
                 "SA17": {"Age": 23, "Height": 182, "Weight": 75, "Gender": "M"},
                 "SA18": {"Age": 23, "Height": 181, "Weight": 73, "Gender": "M"},
                 "SA19": {"Age": 30, "Height": 170, "Weight": 76, "Gender": "M"},
                 "SA20": {"Age": 30, "Height": 150, "Weight": 42, "Gender": "F"},
                 "SA21": {"Age": 30, "Height": 183, "Weight": 68, "Gender": "M"},
                 "SA22": {"Age": 19, "Height": 158, "Weight": 50.5, "Gender": "F"},
                 "SA23": {"Age": 24, "Height": 156, "Weight": 48, "Gender": "F"},
                 "SE01": {"Age": 71, "Height": 171, "Weight": 102, "Gender": "M"},
                 "SE02": {"Age": 75, "Height": 150, "Weight": 57, "Gender": "F"},
                 "SE03": {"Age": 62, "Height": 150, "Weight": 51, "Gender": "F"},
                 "SE04": {"Age": 63, "Height": 160, "Weight": 59, "Gender": "F"},
                 "SE05": {"Age": 63, "Height": 165, "Weight": 72, "Gender": "M"},
                 "SE06": {"Age": 60, "Height": 163, "Weight": 79, "Gender": "M"},
                 "SE07": {"Age": 65, "Height": 168, "Weight": 76, "Gender": "M"},
                 "SE08": {"Age": 68, "Height": 163, "Weight": 72, "Gender": "F"},
                 "SE09": {"Age": 66, "Height": 167, "Weight": 65, "Gender": "M"},
                 "SE10": {"Age": 64, "Height": 156, "Weight": 66, "Gender": "F"},
                 "SE11": {"Age": 66, "Height": 169, "Weight": 63, "Gender": "F"},
                 "SE12": {"Age": 69, "Height": 164, "Weight": 56.5, "Gender": "M"},
                 "SE13": {"Age": 65, "Height": 171, "Weight": 72.5, "Gender": "M"},
                 "SE14": {"Age": 67, "Height": 163, "Weight": 58, "Gender": "M"},
                 "SE15": {"Age": 64, "Height": 150, "Weight": 50, "Gender": "F"}
                }
Subject_attrs_df = pd.DataFrame.from_dict(Subject_attrs).transpose()

# Downcast numeric columns
Subject_attrs_df[['Age', 'Height']] = Subject_attrs_df[['Age', 'Height']].astype(np.int16)
Subject_attrs_df[['Weight']] = Subject_attrs_df[['Weight']].astype("float")

Subject_attrs_df.head()

Unnamed: 0,Age,Height,Weight,Gender
SA01,26,165,53.0,F
SA02,23,176,58.5,M
SA03,19,156,48.0,F
SA04,23,170,72.0,M
SA05,22,172,69.5,M


In [4]:
# Code reference data from Readme.txt file
code_dict = {"D01": "Walking slowly",
              "D02": "Walking quickly",
              "D03": "Jogging slowly",
              "D04": "Jogging quickly",
              "D05": "Walking upstairs and downstairs slowly",
              "D06": "Walking upstairs and downstairs quickly",
              "D07": "Slowly sit in a half height chair, wait a moment, and up slowly",
              "D08": "Quickly sit in a half height chair, wait a moment, and up quickly",
              "D09": "Slowly sit in a low height chair, wait a moment, and up slowly",
              "D10": "Quickly sit in a low height chair, wait a moment, and up quickly",
              "D11": "Sitting a moment, trying to get up, and collapse into a chair",
              "D12": "Sitting a moment, lying slowly, wait a moment, and sit again",
              "D13": "Sitting a moment, lying quickly, wait a moment, and sit again",
              "D14": "Being on one’s back change to lateral position, wait a moment, and change to one’s back",
              "D15": "Standing, slowly bending at knees, and getting up",
              "D16": "Standing, slowly bending without bending knees, and getting up",
              "D17": "Standing, get into a car, remain seated and get out of the car",
              "D18": "Stumble while walking",
              "D19": "Gently jump without falling (trying to reach a high object)",
              "F01": "Fall forward while walking caused by a slip",
              "F02": "Fall backward while walking caused by a slip",
              "F03": "Lateral fall while walking caused by a slip",
              "F04": "Fall forward while walking caused by a trip",
              "F05": "Fall forward while jogging caused by a trip",
              "F06": "Vertical fall while walking caused by fainting",
              "F07": "Fall while walking, with use of hands in a table to dampen fall, caused by fainting",
              "F08": "Fall forward when trying to get up",  
              "F09": "Lateral fall when trying to get up",  
              "F10": "Fall forward when trying to sit down",  
              "F11": "Fall backward when trying to sit down",  
              "F12": "Lateral fall when trying to sit down",  
              "F13": "Fall forward while sitting, caused by fainting or falling asleep",  
              "F14": "Fall backward while sitting, caused by fainting or falling asleep",   
              "F15": "Lateral fall while sitting, caused by fainting or falling asleep"            
              }

## Load Dataset

In [5]:
# Datatype Optimization Functions
def optimize_floats(df: pd.DataFrame) -> pd.DataFrame:
    floats = df.select_dtypes(include=['float64']).columns.tolist()
    df[floats] = df[floats].apply(pd.to_numeric, downcast='float')
    return df

def optimize_ints(df: pd.DataFrame) -> pd.DataFrame:
    ints = df.select_dtypes(include=['int64']).columns.tolist()
    df[ints] = df[ints].apply(pd.to_numeric, downcast='integer')
    return df

def optimize_objects(df: pd.DataFrame, datetime_features: List[str]) -> pd.DataFrame:
    for col in df.select_dtypes(include=['object']):
        if col not in datetime_features:
            num_unique_values = len(df[col].unique())
            num_total_values = len(df[col])
            if float(num_unique_values) / num_total_values < 0.5:
                df[col] = df[col].astype('category')
        else:
            df[col] = pd.to_datetime(df[col])
    return df

def optimize(df: pd.DataFrame, datetime_features: List[str] = []):
    return optimize_floats(optimize_ints(optimize_objects(df, datetime_features)))

def LoadSisFallDataset(userPrefix="", userNames=[]):
  """Load the SisFall Dataset.

  Found at http://sistemic.udea.edu.co/wp-content/uploads/2016/03/SisFall_dataset.zip 

  Args:
      userPrefix: user profile (SA - adults, SE - Elderly).
      userNames: list of users e.g. ["SA01", "SE06"]

  Returns:
      n-tuple of accelerometer/Gyroscope features
  """

  # Create new dataframe for SisFall data
  ds = pd.DataFrame()

  # Get list of filepaths for SExx subjects
  filepaths = glob.glob('SisFall_dataset/**/*.txt', recursive=True)
  if(userPrefix != ""):
    filepaths = [f for f in filepaths if userPrefix in f]
  elif(len(userNames) != 0):
    filepaths = [f for f in filepaths if any(userName in f for userName in userNames)]

  column_names = ["ADXL345_X", "ADXL345_Y", "ADXL345_Z", 
                  "ITG3200_X", "ITG3200_Y", "ITG3200_Z", 
                  "MMA8451Q_X", "MMA8451Q_Y", "MMA8451Q_Z"]
  # Generate dataset from .txt files in filepaths
  for i, f in enumerate(filepaths):
    filename = os.path.splitext(os.path.basename(f))[0]
    code, subject, trial = filename.split("_")
    df = pd.read_csv(f, header=None, names=column_names)
    df["trial"] = trial
    df["code"] = code
    df["subject"] = subject

    ds = pd.concat([ds, df], ignore_index=True)

  # Remove trailing ; character from "MMA8451Q_Z" column
  ds["MMA8451Q_Z"] = ds["MMA8451Q_Z"].apply(lambda x: str(x)[:-1])

  # Convert "MMA8451Q_Z" column to integer datatype
  ds["MMA8451Q_Z"] = pd.to_numeric(ds["MMA8451Q_Z"], downcast="integer")

  # Merge subject attribute data to ds dataset
  ds = pd.merge(left=ds, right=Subject_attrs_df, left_on="subject", right_on=Subject_attrs_df.index)

  # Create "class" column for fall/non-fall classification labels
  ds["class"] = ds["code"].apply(lambda x: 1 if x.startswith("F") else 0)

  ds = optimize(ds, [])
  ds.info()

  return ds

## Feature Generation

In [6]:
def Featurize(accx, accy, accz, gyrox, gyroy, gyroz, accx2, accy2, accz2, fs):
  """Featurization of the accelerometer signal.

  Args:
      accx: (np.array) x-channel of the ADXL345.
      accy: (np.array) y-channel of the ADXL345.
      accz: (np.array) z-channel of the ADXL345.
      gyrox: (np.array) x-channel of the ITG3200.
      gyroy: (np.array) y-channel of the ITG3200.
      gyroz: (np.array) z-channel of the ITG3200.
      accx2: (np.array) x-channel of the MMA8451Q.
      accy2: (np.array) y-channel of the MMA8451Q.
      accz2: (np.array) z-channel of the MMA8451Q.
      fs: (number) the sampling rate of the accelerometer

  Returns:
      n-tuple of accelerometer features
  """

  # The mean of each channel
  mn_accx = np.mean(accx)
  mn_accy = np.mean(accy)
  mn_accz = np.mean(accz)
  mn_gyrox = np.mean(gyrox)
  mn_gyroy = np.mean(gyroy)
  mn_gyroz = np.mean(gyroz)
  mn_accx2 = np.mean(accx2)
  mn_accy2 = np.mean(accy2)
  mn_accz2 = np.mean(accz2)

  return (mn_accx,
          mn_accy,
          mn_accz,
          mn_gyrox,
          mn_gyroy,
          mn_gyroz,
          mn_accx2,
          mn_accy2,
          mn_accz2
  )
  
def GenerateFeatures(data, fs, window_length_s, window_shift_s, unique_users, unique_codes):
  """
  Generate features by sliding a window across each dataset and computing
  the features on each window.

  Args:
    data: As returned by LoadSisFallDataset()
    fs: (number) The sampling rate of the data
    window_length_s: (number) The length of the window in seconds
    window_shift_s: (number) The amount to shift the window by

  Returns:
    feature_ds: (np.array) 2D Array, n_samples X n_features. The feature matrix.
  """
  window_length = window_length_s * fs
  window_shift = window_shift_s * fs

  codes, subjects, features = [], [], []

  for user in unique_users:
    for code in unique_codes:
      df = data[(data['subject'] == user) & (data['code'] == code)]
      for i in range(0, len(df) - window_length, window_shift):
        window = df[i: i + window_length]
        accx = window.ADXL345_X.values
        accy = window.ADXL345_Y.values
        accz = window.ADXL345_Z.values
        gyrox = window.ITG3200_X.values 
        gyroy = window.ITG3200_Y.values 
        gyroz = window.ITG3200_Z.values 
        accx2 = window.MMA8451Q_X.values 
        accy2 = window.MMA8451Q_Y.values 
        accz2 = window.MMA8451Q_Z.values 
        features.append(Featurize(accx, accy, accz, gyrox, gyroy, gyroz, accx2, accy2, accz2, fs))
        subjects.append(user)
        codes.append(code)

  codes = np.array(codes)
  subjects = np.array(subjects)
  features = np.array(features)

  featureNames = ['mn_accx',
                  'mn_accy',
                  'mn_accz',
                  'mn_gyrox',
                  'mn_gyroy',
                  'mn_gyroz',
                  'mn_accx2',
                  'mn_accy2',
                  'mn_accz2'
]

  features_list = []

  for i in range(len(features)):
    feature_dict = {}
    for j in range(len(featureNames)):
      feature_dict[featureNames[j]] = features[i][j]
    feature_dict["subject"] = subjects[i]
    feature_dict["code"] = codes[i]
    feature_dict["class"] = 1 if codes[i].startswith("F") else 0
    features_list.append(feature_dict)

  feature_ds = pd.DataFrame.from_dict(features_list)

  feature_ds.info()

  return feature_ds

In [7]:
# Load Dataset
trainUsers = ["SA01", "SA02", "SA03", "SA04", "SA05",
              "SA06", "SA07", "SA08", "SA09", "SA10",
              "SA11", "SA12", "SA13", "SA14", "SA15",
              "SA16", "SA17", "SA18", "SA19", "SA20",
              "SE01", "SE02", "SE03", "SE04", "SE05",
              "SE07", "SE08", "SE09", "SE10", "SE11"]
valUsers = ["SA21", "SA22", "SA23", "SE12", "SE13", "SE14"]
testUsers = ["SE06", "SE15"]

userNames = trainUsers + valUsers + testUsers

start_time=time.time()
raw_ds = LoadSisFallDataset(userNames=userNames)
end_time=time.time()
print("Loading Time:", (end_time - start_time))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15858929 entries, 0 to 15858928
Data columns (total 17 columns):
 #   Column      Dtype   
---  ------      -----   
 0   ADXL345_X   int16   
 1   ADXL345_Y   int16   
 2   ADXL345_Z   int16   
 3   ITG3200_X   int16   
 4   ITG3200_Y   int16   
 5   ITG3200_Z   int16   
 6   MMA8451Q_X  int16   
 7   MMA8451Q_Y  int16   
 8   MMA8451Q_Z  int16   
 9   trial       category
 10  code        category
 11  subject     category
 12  Age         int16   
 13  Height      int16   
 14  Weight      float32 
 15  Gender      category
 16  class       int8    
dtypes: category(4), float32(1), int16(11), int8(1)
memory usage: 589.8 MB
Loading Time: 3613.6000788211823


In [8]:
raw_ds.to_csv("raw_data.csv", index=False)

In [9]:
unique_users = raw_ds['subject'].unique()
unique_codes = raw_ds['code'].unique()

# Generate Features
feature_ds = GenerateFeatures(raw_ds, 200, 10, 10, unique_users, unique_codes)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7145 entries, 0 to 7144
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   mn_accx   7145 non-null   float64
 1   mn_accy   7145 non-null   float64
 2   mn_accz   7145 non-null   float64
 3   mn_gyrox  7145 non-null   float64
 4   mn_gyroy  7145 non-null   float64
 5   mn_gyroz  7145 non-null   float64
 6   mn_accx2  7145 non-null   float64
 7   mn_accy2  7145 non-null   float64
 8   mn_accz2  7145 non-null   float64
 9   subject   7145 non-null   object 
 10  code      7145 non-null   object 
 11  class     7145 non-null   int64  
dtypes: float64(9), int64(1), object(2)
memory usage: 670.0+ KB


In [10]:
feature_ds.to_csv("feature_data.csv", index=False)

In [11]:
raw_ds.head(100)

Unnamed: 0,ADXL345_X,ADXL345_Y,ADXL345_Z,ITG3200_X,ITG3200_Y,ITG3200_Z,MMA8451Q_X,MMA8451Q_Y,MMA8451Q_Z,trial,code,subject,Age,Height,Weight,Gender,class
0,17,-179,-99,-18,-504,-352,76,-697,-279,R01,D01,SA01,26,165,53.0,F,0
1,15,-174,-90,-53,-568,-306,48,-675,-254,R01,D01,SA01,26,165,53.0,F,0
2,1,-176,-81,-84,-613,-271,-2,-668,-221,R01,D01,SA01,26,165,53.0,F,0
3,-10,-180,-77,-104,-647,-227,-34,-697,-175,R01,D01,SA01,26,165,53.0,F,0
4,-21,-191,-63,-128,-675,-191,-74,-741,-133,R01,D01,SA01,26,165,53.0,F,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,-53,-306,70,-173,145,106,-201,-1175,383,R01,D01,SA01,26,165,53.0,F,0
96,-52,-304,69,-106,100,144,-206,-1169,385,R01,D01,SA01,26,165,53.0,F,0
97,-50,-305,66,-56,67,183,-199,-1178,373,R01,D01,SA01,26,165,53.0,F,0
98,-45,-303,65,-18,48,229,-168,-1175,362,R01,D01,SA01,26,165,53.0,F,0
