# Data Preprocessing

The dataset used below was downloaded from kaggle (https://www.kaggle.com/datasets/die9origephit/human-activity-recognition/data)

This section of the code imports necessary packages:

1. Loads the data set from the sensor_data.csv using it's relative path and storing it in a pandas object
2. Removes any instances in the data that are missing information 

In [44]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

# Loads data using relative path
data = pd.read_csv("../data/sensor_data.csv")

print(data.head())


   user activity      timestamp  x-axis  y-axis  z-axis
0     1  Walking  4991922345000    0.69   10.80   -2.03
1     1  Walking  4991972333000    6.85    7.44   -0.50
2     1  Walking  4992022351000    0.93    5.63   -0.50
3     1  Walking  4992072339000   -2.11    5.01   -0.69
4     1  Walking  4992122358000   -4.59    4.29   -1.95


## Data Normalization using Min-Max Scaling

This section normalizes the data by executing the following steps:

1. Removing any data instances with missing values.
2. Normalizing the position values.
3. Encoding the activity labels.

In [None]:
# Removes any data instances with missing informaion
data = data.dropna() 

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Normalize x-axis, y-axis, z-axis columns
data[['x-axis', 'y-axis', 'z-axis']] = scaler.fit_transform(data[['x-axis', 'y-axis', 'z-axis']])

activities = data['activity']

encoder = LabelEncoder()
data['activity'] = encoder.fit_transform(data['activity'])

# Check the normalized data
print(data.head())
print("\n")

activity_catalog = dict(zip(encoder.classes_, encoder.transform(encoder.classes_)))

# Print the catalog
print("Activity Encoding Legend:")
for activity, encoded_value in activity_catalog.items():
    print(f"{activity}: {encoded_value}")


   user  activity      timestamp    x-axis    y-axis    z-axis
0     1         5  4991922345000  0.513145  0.766961  0.450901
1     1         5  4991972333000  0.668857  0.682219  0.489723
2     1         5  4992022351000  0.519211  0.636570  0.489723
3     1         5  4992072339000  0.442366  0.620933  0.484902
4     1         5  4992122358000  0.379676  0.602774  0.452931


Activity Encoding Catalogue:
Downstairs: 0
Jogging: 1
Sitting: 2
Standing: 3
Upstairs: 4
Walking: 5
