The dataset is downloaded from UCI Learning Repository:https://archive.ics.uci.edu/ml/datasets/Activity+recognition+with+healthy+older+people+using+a+batteryless+wearable+sensor, 

Data Set Information:

This dataset contains the motion data of 14 healthy older aged between 66 and 86 years old, performed broadly scripted activities using a batteryless, wearable sensor on top of their clothing at sternum level. Data is sparse and noisy due to the use of a passive sensor.
Participants were allocated in two clinical room settings (S1 and S2). The setting of S1 (Room1) uses 4 RFID reader antennas around the room (one on ceiling level, and 3 on wall level) for the collection of data, whereas the room setting S2 (Room2) uses 3 RFID reader antennas (two at ceiling level and one at wall level) for the collection of motion data.
The activities performed were:
walking to the chair,
sitting on the chair,
getting off the chair,
walking to bed,
lying on bed,
getting off the bed and
walking to the door.
Hence the possible class labels assigned for every sensor observation are:
- Sitting on bed
- Sitting on chair
- Lying on bed
- Ambulating, where ambulating includes standing, walking around the room.

**Note on the dataset:**

Characteristics of Dataset

The files for the dataset represent a individual trial performed by a subject wearing 
a wireless batteryless sensor. Each trial consists of the performance of scripted ADL. 
The files are anonymized; however, indication of the gender of the participant is 
indicated at the end of the file name.

The content of the file is as follows:
Comma separated values (CSV) format.

Column 1: Time in seconds starting from 0 rounded to the closest 0.025s

Column 2: Acceleration reading in G for frontal axis

Column 3: Acceleration reading in G for vertical axis

Column 4: Acceleration reading in G for lateral axis

Column 5: Id of antenna reading sensor

Column 6: Received signal strength indicator (RSSI) 

Column 7: Phase

Column 8: Frequency

Column 9: Label of activity, 1: sit on bed, 2: sit on chair, 3: lying, 4: ambulating

Additional Information

If you wish to use these datasets please cite this paper.

#Loading the dataset

In [85]:
# Using Google colab
##accessing google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [86]:
#imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import glob

I loaded all the study files provided by UCI into my Google directory "/content/drive/MyDrive/Colab Notebooks/DataScience_Project2/Data/". Now I am going to iterate through the files and load them all in one DataFrame. Notice from the UIC Learning Repository description that the files ending with "M" are for male users and the files ending with "F" are for Female users of the study.

In [72]:
# I wanted to add the .csv extension to the loaded files and found:
#https://stackoverflow.com/questions/45627352/python-renaming-all-files-in-a-directory-using-a-loop
path = '/content/drive/MyDrive/Colab Notebooks/DataScience_Project2/Data/'
i = 0
for filename in os.listdir(path):
    os.rename(os.path.join(path,filename), os.path.join(path,filename+'.csv'))
    i = i +1

I need to load the Male datasets into one dataframe and the Female datasets into one frame so I found this tip: https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe

In [88]:
header_list = ['Time', 'acc_front', 'acc_vert', 'acc_lat', 'antenna_id', 'rssi', 'phase', 'frequency', 'activity']
path = r'/content/drive/MyDrive/Colab Notebooks/DataScience_Project2/Data'
all_files = glob.glob(path + "/*.csv")

li_m = []
li_f = []

for filename in all_files:
    if filename.endswith('M.csv'):
      df = pd.read_csv(filename, names=header_list)
      li_m.append(df)
    elif filename.endswith('F.csv'):
      df = pd.read_csv(filename, names=header_list)
      li_f.append(df)

activity_df_m = pd.concat(li_m, axis=0, ignore_index=True)
activity_df_f = pd.concat(li_f, axis=0, ignore_index=True)

I am going to add a gender column and assign M and F depending on the dataset before appending them both into one

In [89]:
activity_df_f['gender'] = 'F'
activity_df_f.head()

Unnamed: 0,Time,acc_front,acc_vert,acc_lat,antenna_id,rssi,phase,frequency,activity,gender
0,0.0,0.51826,0.89339,0.13456,4,-56.5,5.8368,921.75,1,F
1,0.25,0.51826,0.89339,0.13456,3,-68.0,4.8412,925.75,1,F
2,0.75,0.51826,0.89339,0.13456,4,-55.5,3.6417,924.25,1,F
3,1.25,0.51826,0.89339,0.13456,3,-57.5,1.7779,924.75,1,F
4,1.75,0.51826,0.89339,0.13456,4,-61.5,0.24083,922.75,1,F


In [90]:
activity_df_m['gender'] = 'M'
activity_df_m.head()

Unnamed: 0,Time,acc_front,acc_vert,acc_lat,antenna_id,rssi,phase,frequency,activity,gender
0,0.0,-0.044557,0.93932,0.11175,1,-58.0,0.17794,920.75,4,M
1,0.25,-0.044557,0.93932,0.11175,1,-60.0,0.4694,920.25,4,M
2,0.75,-0.044557,0.93932,0.11175,3,-68.5,0.15033,923.25,1,M
3,1.5,-0.044557,0.93932,0.11175,4,-57.5,5.1082,925.75,1,M
4,2.5,0.61207,0.89339,0.009122,4,-57.5,4.3949,920.75,1,M


Appending the dataframes into one dataframe finally :)

a nice walkthrough here: https://www.geeksforgeeks.org/python-pandas-dataframe-append/

In [91]:
activity_df = activity_df_f.append(activity_df_m, ignore_index=True)
activity_df

Unnamed: 0,Time,acc_front,acc_vert,acc_lat,antenna_id,rssi,phase,frequency,activity,gender
0,0.00,0.51826,0.89339,0.134560,4,-56.5,5.83680,921.75,1,F
1,0.25,0.51826,0.89339,0.134560,3,-68.0,4.84120,925.75,1,F
2,0.75,0.51826,0.89339,0.134560,4,-55.5,3.64170,924.25,1,F
3,1.25,0.51826,0.89339,0.134560,3,-57.5,1.77790,924.75,1,F
4,1.75,0.51826,0.89339,0.134560,4,-61.5,0.24083,922.75,1,F
...,...,...,...,...,...,...,...,...,...,...
75123,938.25,0.88175,0.75559,0.145960,1,-64.0,4.85810,925.25,2,M
75124,950.00,0.83485,0.77856,0.123150,1,-62.5,2.25340,923.75,2,M
75125,959.50,0.83485,0.77856,0.123150,1,-60.0,1.40970,920.25,2,M
75126,964.50,0.83485,0.77856,0.123150,1,-59.5,1.34380,920.25,2,M


I now have one full dataset with 75128 rows (matches the number of rows on the UCI Learning Repository link) and 10 columns after adding the gender column to the original dataset.