In [None]:
# Human activity prediction using mobile sensors

# Bhaskar dev
# 2101014030

# ■ Motivation –
# Human Activity Recognition (HAR) using mobile sensors is at the heart of many modern applications, from fitness tracking and healthcare monitoring to smart home automation and security.
# Smartphones, each equipped with powerful sensors like accelerometers and gyroscopes, makes it possible to collect rich data about our daily movements.
# I was drawn to this topic because of its immediate relevance to real-world scenarios and its potential to improve quality of life-especially for elder care, rehabilitation, and personalized health.
# Additionally, HAR presents a fascinating challenge in machine learning: extracting meaningful patterns from noisy, high-dimensional sensor data.
# ■ How does it connect with past and current work done in Multimodal learning (a short (recent)
# historical perspective)?

# The field of HAR has evolved rapidly, paralleling advances in multimodal learning. Early HAR systems relied on vision-based approaches, using video and image data to classify activities.
# However, these methods often struggled with lighting, and privacy concerns. The development of wearable and mobile sensors shifted the focus to sensor-based HAR,
# leveraging data from accelerometers, gyroscopes, and magnetometers.
# ■ Explain your learning from this work.
# Working on this project deepened my understanding of the end-to-end HAR pipeline:
# 1)Data Preprocessing: I learned the importance of handling missing values and outliers to ensure model reliability. Using z-score filtering for outlier removal and mean imputation for missing values produced a cleaner dataset, crucial for downstream modeling.
# 2)Dimensionality Reduction: With 561 features, dimensionality reduction was essential. Principal Component Analysis (PCA) enabled me to retain 95% of the variance with just 89 components, reducing computational load while preserving signal integrity.
# 3)Model Selection and Tuning: I experimented with neural networks (MLPClassifier) and used randomized hyperparameter search to optimize model performance. This reinforced the value of systematic experimentation in machine learning.
# 4)Classification: Despite the data losing its temporal structure due to averaging, the multivariate classification approach proved effective for the task at hand



In [None]:


# ■ Code / Notebook – include demos, experiments, or visualizations, if applicable
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.impute import SimpleImputer
from scipy import stats

# Load data
data = pd.read_csv("test.csv")  # assumes cleaned CSV with 561 features + label

# Separate features and target
X = data.drop("Activity", axis=1)
y = data["Activity"]

# Handle missing values
imputer = SimpleImputer(strategy="mean")
X = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)

In [None]:
z_scores = np.abs(stats.zscore(X))
X = X[(z_scores < 3).all(axis=1)]
y = y[X.index]  # align labels

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [None]:
pca = PCA()
X_pca = pca.fit_transform(X_scaled)
explained_variance = np.cumsum(pca.explained_variance_ratio_)

In [None]:
n_components = np.argmax(explained_variance >= 0.95) + 1
print(f"Number of components explaining 95% variance: {n_components}")

Number of components explaining 95% variance: 89


In [None]:
pca = PCA(n_components=n_components)
X_reduced = pca.fit_transform(X_scaled)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X_reduced, y, test_size=0.2, random_state=42, stratify=y
)


In [None]:
mlp = MLPClassifier(max_iter=1000, random_state=42)
param_dist = {
    'hidden_layer_sizes': [(50,), (100,), (100, 50), (50, 25)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'sgd'],
    'alpha': [1e-4, 1e-3, 1e-2],
    'learning_rate': ['constant', 'adaptive']
}

In [None]:
random_search = RandomizedSearchCV(
    mlp, param_distributions=param_dist,
    n_iter=20, cv=5, scoring='accuracy', random_state=42, n_jobs=-1
)
random_search.fit(X_train, y_train)

In [None]:
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)

print("\nBest Parameters:")
print(random_search.best_params_)
print("\nAccuracy on test set:", accuracy_score(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Best Parameters:
{'solver': 'adam', 'learning_rate': 'adaptive', 'hidden_layer_sizes': (100, 50), 'alpha': 0.01, 'activation': 'relu'}

Accuracy on test set: 0.9615384615384616

Classification Report:
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        27
           SITTING       0.95      0.92      0.93        59
          STANDING       0.91      0.95      0.93        56
           WALKING       1.00      1.00      1.00        25
WALKING_DOWNSTAIRS       1.00      1.00      1.00        11
  WALKING_UPSTAIRS       1.00      1.00      1.00        30

          accuracy                           0.96       208
         macro avg       0.98      0.98      0.98       208
      weighted avg       0.96      0.96      0.96       208



In [None]:
from sklearn.metrics import confusion_matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))



Confusion Matrix:
[[27  0  0  0  0  0]
 [ 0 54  5  0  0  0]
 [ 0  3 53  0  0  0]
 [ 0  0  0 25  0  0]
 [ 0  0  0  0 11  0]
 [ 0  0  0  0  0 30]]


In [None]:
■ Reflections –
 (a) What surprised you?
  Reducing so many features 561 features to mere 89 features still capturing 95% variance making the model light and more stable and faster than before
 (b) What can be the scope for improvement?
 1) we can try non-linear dimensionality reduction methods like t-SNE or autoencoders, which may capture more complex variance in the data.
 2) instead of pre-extracted features, we can use the raw time-series data and train Convolutional or Recurrent Neural Networks, which are better suited for sequential inputs.
 3) integrating this model into a real-time mobile app would allow real-world testing, and incorporating dynamic time modeling could further improve prediction accuracy for transitional activities.
■ References – Any papers, repos, videos, LLM tools you used


Research paper on this topic: https://www.semanticscholar.org/paper/Deep-Analysis-for-Smartphone-based-Human-Activity-Shan-Han/8e8a040a937b83db33cf23071b4af9dbbca893e7
