#### Module 9: Supervised Learning- II

#### Case Study–2

Objective:

• Practice Naive Bayes algorithm-based classification.
• Identify the predictors that can be of influence by experiment.

Questions:

1. Load the kinematics dataset as measured on mobile sensors from the file “run_or_walk.csv”. List out the columns in the dataset.
2. Let the target variable ‘y’ be the activity and assign all the columns after it to ‘x’.
3. Using Scikit-learn fit a Gaussian Naive Bayes model and observe the accuracy. Generate a classification report using scikit learn.
4. Repeat the model once using only the acceleration values as predictors and then using only the gyro values as predictors. Comment on the difference in accuracy between both the models.

In [1]:
# Load the kinematics dataset as measured on mobile sensors from the file “run_or_walk.csv”. List out the columns in the dataset.

import pandas as pd

# Load the dataset
df = pd.read_csv("run_or_walk.csv")

# List out the columns
print("Columns in the dataset:")
print(df.columns.tolist())

Columns in the dataset:
['date', 'time', 'username', 'wrist', 'activity', 'acceleration_x', 'acceleration_y', 'acceleration_z', 'gyro_x', 'gyro_y', 'gyro_z']


In [2]:
# Let the target variable ‘y’ be the activity and assign all the columns after it to ‘x’.


# Define target variable 'y' as the activity column
y = df['activity']

# Assign all columns after 'activity' to X
# The index of 'activity' column
activity_index = df.columns.get_loc('activity')

# Select all columns after 'activity'
X = df.iloc[:, activity_index+1:]

print("Target variable (y):")
print(y.head())

print("\nFeature set (X):")
print(X.head())


Target variable (y):
0    0
1    0
2    0
3    0
4    0
Name: activity, dtype: int64

Feature set (X):
   acceleration_x  acceleration_y  acceleration_z  gyro_x  gyro_y  gyro_z
0          0.2650         -0.7814         -0.0076 -0.0590  0.0325 -2.9296
1          0.6722         -1.1233         -0.2344 -0.1757  0.0208  0.1269
2          0.4399         -1.4817          0.0722 -0.9105  0.1063 -2.4367
3          0.3031         -0.8125          0.0888  0.1199 -0.4099 -2.9336
4          0.4814         -0.9312          0.0359  0.0527  0.4379  2.4922


In [3]:
# Using Scikit-learn fit a Gaussian Naive Bayes model and observe the accuracy. Generate a classification report using scikit learn.

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

#  Define target and features
y = df['activity']
activity_index = df.columns.get_loc('activity')
X = df.iloc[:, activity_index+1:]

# Train-test split (80-20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Fit Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy and classification report
print("Gaussian Naive Bayes Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Gaussian Naive Bayes Accuracy: 0.9554125747827068

Classification Report:
               precision    recall  f1-score   support

           0       0.92      0.99      0.96      8845
           1       0.99      0.92      0.95      8873

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718



In [4]:
# Repeat the model once using only the acceleration values as predictors and then using only the gyro values as predictors. 
# Comment on the difference in accuracy between both the models.

# Separate predictors
# Assuming columns are named like acc_x, acc_y, acc_z and gyro_x, gyro_y, gyro_z
acc_cols = [col for col in df.columns if 'acc' in col.lower()]
gyro_cols = [col for col in df.columns if 'gyro' in col.lower()]

X_acc = df[acc_cols]
X_gyro = df[gyro_cols]

#  Train-test split
X_acc_train, X_acc_test, y_train, y_test = train_test_split(
    X_acc, y, test_size=0.2, random_state=42, stratify=y
)

X_gyro_train, X_gyro_test, _, _ = train_test_split(
    X_gyro, y, test_size=0.2, random_state=42, stratify=y
)

#  Fit Gaussian Naive Bayes on acceleration
gnb_acc = GaussianNB()
gnb_acc.fit(X_acc_train, y_train)
y_pred_acc = gnb_acc.predict(X_acc_test)

print("=== Acceleration Model ===")
print("Accuracy:", accuracy_score(y_test, y_pred_acc))
print("Classification Report:\n", classification_report(y_test, y_pred_acc))

# Fit Gaussian Naive Bayes on gyroscope
gnb_gyro = GaussianNB()
gnb_gyro.fit(X_gyro_train, y_train)
y_pred_gyro = gnb_gyro.predict(X_gyro_test)

print("\n=== Gyroscope Model ===")
print("Accuracy:", accuracy_score(y_test, y_pred_gyro))
print("Classification Report:\n", classification_report(y_test, y_pred_gyro))


=== Acceleration Model ===
Accuracy: 0.9578959250479738
Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.99      0.96      8845
           1       0.99      0.92      0.96      8873

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718


=== Gyroscope Model ===
Accuracy: 0.6488316965797494
Classification Report:
               precision    recall  f1-score   support

           0       0.62      0.74      0.68      8845
           1       0.68      0.55      0.61      8873

    accuracy                           0.65     17718
   macro avg       0.65      0.65      0.65     17718
weighted avg       0.65      0.65      0.65     17718

