# Assignment 2: Logistic Regression and Classification Error Metrics

# CAP 4630: Intro Artificial Intelligence

**Student's name:** Annaly Rocha

**Section:** 001

## Introduction

You will be using the Human Activity Recognition with Smartphones dataset, which was built from the recordings of study participants performing activities of daily living (ADL) while carrying a smartphone with an embedded inertial sensors. The objective is to classify activities into one of the six activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying) performed.

The dataset is already provided on canvas. So, you can download the dataset from canvas. To know more information about the features of the dataset, you can take a look at the website: [Human Activity Recognition with Smartphones](https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones).

For each record in the dataset it is provided:

- Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
- A 561-feature vector with time and frequency domain variables.
- Its activity label.

In [1]:
from __future__ import print_function
import os
#Data Path has to be set as per the file location in your system
#data_path = ['..', 'data']
data_path = ['data']

# Ignore the warning
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

In [2]:
import pandas as pd
import numpy as np

from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score
from sklearn.preprocessing import label_binarize

## Question 1 (10 points)

Import the data and do the following:

* Read the dataset
* By looking at the dataset, do you think the floating point values need to be scaled? Just write your answer (no need to write code for this)
* Examine the data types--there are many columns, so it might be wise to use value counts

* Split the dataset into X_data and y_data
* Determine the breakdown of each activity
* Encode the activity label as an integer

In [4]:
#write code to read the dataset
data = pd.read_csv('Human_Activity_Recognition_Using_Smartphones_Data.csv')
print("Dataset successfully loaded!")

print(data.head())

Dataset successfully loaded!
   tBodyAcc-mean()-X  tBodyAcc-mean()-Y  tBodyAcc-mean()-Z  tBodyAcc-std()-X  \
0           0.288585          -0.020294          -0.132905         -0.995279   
1           0.278419          -0.016411          -0.123520         -0.998245   
2           0.279653          -0.019467          -0.113462         -0.995380   
3           0.279174          -0.026201          -0.123283         -0.996091   
4           0.276629          -0.016570          -0.115362         -0.998139   

   tBodyAcc-std()-Y  tBodyAcc-std()-Z  tBodyAcc-mad()-X  tBodyAcc-mad()-Y  \
0         -0.983111         -0.913526         -0.995112         -0.983185   
1         -0.975300         -0.960322         -0.998807         -0.974914   
2         -0.967187         -0.978944         -0.996520         -0.963668   
3         -0.983403         -0.990675         -0.997099         -0.982750   
4         -0.980817         -0.990482         -0.998321         -0.979672   

   tBodyAcc-mad()-Z  tBodyA

By looking at the dataset, do you think the floating point values need to be scaled? Just write your answer (no need to write code for this).

Your answer: Yes, because there are different ranges.

In [5]:
#Write code to examine the data types--there are many columns, so it might be wise to use value counts
for column in data.select_dtypes(include=['object', 'category']).columns:
    print(f"\nValue counts for {column}:")
    print(data[column].value_counts())
if 'activity' in data.columns:
    y_column = 'activity'
else:
    # Try to identify the target column (often the last column)
    y_column = data.columns[-1]


Value counts for Activity:
Activity
LAYING                1407
STANDING              1374
SITTING               1286
WALKING               1226
WALKING_UPSTAIRS      1073
WALKING_DOWNSTAIRS     986
Name: count, dtype: int64


In [6]:
# Write code to split the dataset into X_data and y_data
X_data = data.drop(y_column, axis=1)
y_data = data[y_column]

print("Features shape:", X_data.shape)
print("Target shape:", y_data.shape)

Features shape: (7352, 562)
Target shape: (7352,)


In [7]:
#Write code to examine the breakdown of activities.
print("\nBreakdown of activities:")
activity_counts = y_data.value_counts()
print(activity_counts)


Breakdown of activities:
Activity
LAYING                1407
STANDING              1374
SITTING               1286
WALKING               1226
WALKING_UPSTAIRS      1073
WALKING_DOWNSTAIRS     986
Name: count, dtype: int64


In [8]:
#Write code to Encode the activity label as an integer. Use `LabelEncoder` to fit_transform the "Activity" column.
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y_data)

# Display the mapping between original labels and encoded integers
print("\nActivity encoding mapping:")
for i, activity in enumerate(label_encoder.classes_):
    print(f"{activity} -> {i}")

print("\nFirst 10 encoded activity labels:")
print(y_encoded[:10])


Activity encoding mapping:
LAYING -> 0
SITTING -> 1
STANDING -> 2
WALKING -> 3
WALKING_DOWNSTAIRS -> 4
WALKING_UPSTAIRS -> 5

First 10 encoded activity labels:
[2 2 2 2 2 2 2 2 2 2]


## Question 2 (5 points)

When we train a machine learning model, we usually divide the dataset into training and testing sets.
However, if our dataset contains multiple activity classes (for example, Walking, Sitting, Standing, Laying, Walking Upstairs, Walking Downstairs), it’s important that each class appears in the same proportion in both sets.

For instance, if 20% of your entire dataset represents Sitting, you don’t want a situation where only 5% of the test data represents Sitting, this would make your test results unreliable because the distribution of activities has changed.

That’s where Stratified Sampling comes in.
Scikit-learn’s StratifiedShuffleSplit helps us split the data while maintaining the same class distribution (ratio) in both training and test sets. It randomly shuffles the data, but ensures that each class keeps the same proportion as in the original dataset.

Why Use StratifiedShuffleSplit?
*   Prevents bias toward overrepresented classes in training or testing.
*   Ensures consistent and fair evaluation across all activity classes.
*   Especially useful for multi-class classification problems like Human Activity Recognition.

Your task is to split your Human Activity Recognition dataset into training and testing sets. You may use any method (e.g., train_test_split), but to ensure the same ratio of activity classes in both sets, apply Scikit-learn’s StratifiedShuffleSplit.



In [9]:
from sklearn.model_selection import StratifiedShuffleSplit

# Get the split indexes
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.3, random_state=42)
train_index, test_index = next(sss.split(X_data, y_data))

#write code here
X_train = X_data.iloc[train_index]
y_train = y_data.iloc[train_index]

X_test = X_data.iloc[test_index]
y_test = y_data.iloc[test_index]

* Regardless of methods used to split the data, you can compare the ratio of classes in both the train and test splits if you run the below cells (you dont need to write any code just observe the usefulness of using StratifiedShuffleSplit)

In [10]:
#no need to modify anything in this cell. just run and see the outcome
y_train.value_counts(normalize=True)

Activity
LAYING                0.191411
STANDING              0.186941
SITTING               0.174893
WALKING               0.166731
WALKING_UPSTAIRS      0.145939
WALKING_DOWNSTAIRS    0.134085
Name: proportion, dtype: float64

In [11]:
#no need to modify anything in this cell. just run and see the outcome
y_test.value_counts(normalize=True)

Activity
LAYING                0.191296
STANDING              0.186763
SITTING               0.174977
WALKING               0.166818
WALKING_UPSTAIRS      0.145966
WALKING_DOWNSTAIRS    0.134180
Name: proportion, dtype: float64

## Question 3 (10 points)

* Fit a logistic regression model without any regularization using all of the features, and perform model prediction.
* Next, use cross validation to determine the hyperparameters, fit models using L2 regularization, and perform model prediction.

In [12]:
from sklearn.linear_model import LogisticRegression

# Write code to fit a logistic regression model without any regularization using all of the features and perform prediction.

model_no = LogisticRegression(penalty=None, solver='lbfgs', max_iter=1000, random_state=42)
model_no.fit(X_train, y_train)
y_pred_no = model_no.predict(X_test)

In [13]:
from sklearn.metrics import accuracy_score
#write code to calculate accuracy score for model without regularization

print("Accuracy:", accuracy_score(y_test, y_pred_no))

Accuracy: 0.9796010879419764


In [16]:
from sklearn.linear_model import LogisticRegressionCV

#Write code to use cross validation to determine the hyperparameters, fit models using L2 regularization and perform prediction.
model_l2 = LogisticRegressionCV(
    Cs = 5,
    cv = 3,
    penalty = 'l2',
    solver = 'lbfgs',
    max_iter = 500,
    random_state = 42
)

model_l2.fit(X_train, y_train)
y_pred_l2 = model_l2.predict(X_test)

print("Best C:", model_l2.C_[0])

Best C: 1.0


## Question 4 (5 points)

For each model (one without regularization and one with regularization), calculate the following error metrics:

* accuracy
* precision
* recall
* fscore

In [20]:
#Write code
from sklearn.metrics import classification_report, accuracy_score

print("Model without Regularization:")
report_no = classification_report(y_test, y_pred_no, digits=4)
accuracy_no = accuracy_score(y_test, y_pred_no)
print(report_no)
print("Accuracy:", accuracy_no)

print("\nModel with L2 Regularization (CV):")
report_l2 = classification_report(y_test, y_pred_l2, digits=4)
accuracy_l2 = accuracy_score(y_test, y_pred_l2)
print(report_l2)
print("Accuracy:", accuracy_l2)

print("Best C for L2-regularized model:", model_l2.C_[0])

Model without Regularization:
                    precision    recall  f1-score   support

            LAYING     1.0000    1.0000    1.0000       422
           SITTING     0.9481    0.9456    0.9468       386
          STANDING     0.9469    0.9515    0.9492       412
           WALKING     1.0000    1.0000    1.0000       368
WALKING_DOWNSTAIRS     0.9933    0.9966    0.9949       296
  WALKING_UPSTAIRS     0.9969    0.9907    0.9938       322

          accuracy                         0.9796      2206
         macro avg     0.9808    0.9807    0.9808      2206
      weighted avg     0.9796    0.9796    0.9796      2206

Accuracy: 0.9796010879419764

Model with L2 Regularization (CV):
                    precision    recall  f1-score   support

            LAYING     1.0000    1.0000    1.0000       422
           SITTING     0.9635    0.9585    0.9610       386
          STANDING     0.9614    0.9660    0.9637       412
           WALKING     0.9973    1.0000    0.9986       368
W

## Question 5 Confusion Matrix Construction (5 points)

You trained a model to classify six human activities.

Write code to generate the confusion matrix for your trained model on the test set.

Identify which activities are most frequently confused. Provide one possible explanation.

In [22]:
#Write code here
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred_no)
cm_df = pd.DataFrame(cm, index=label_encoder.classes_, columns=label_encoder.classes_)

print("Confusion Matrix:")
print(cm_df)

Confusion Matrix:
                    LAYING  SITTING  STANDING  WALKING  WALKING_DOWNSTAIRS  \
LAYING                 422        0         0        0                   0   
SITTING                  0      365        21        0                   0   
STANDING                 0       20       392        0                   0   
WALKING                  0        0         0      368                   0   
WALKING_DOWNSTAIRS       0        0         0        0                 295   
WALKING_UPSTAIRS         0        0         1        0                   2   

                    WALKING_UPSTAIRS  
LAYING                             0  
SITTING                            0  
STANDING                           0  
WALKING                            0  
WALKING_DOWNSTAIRS                 1  
WALKING_UPSTAIRS                 319  


In [23]:
#b
cm_copy = cm.copy()

np.fill_diagonal(cm_copy, 0)
most_confused_index = np.unravel_index(np.argmax(cm_copy), cm_copy.shape)
true_activity = label_encoder.classes_[most_confused_index[0]]
predicted_activity = label_encoder.classes_[most_confused_index[1]]
print(f"\nThe model most often confused '{true_activity}' with '{predicted_activity}'.")


The model most often confused 'SITTING' with 'STANDING'.


## 
c. If a class has high precision but low recall, the model is usually right when it predicts it, but it misses many real cases. If a class has high recall but low precision, the model finds most real cases, but also makes more mistakes by labeling other activities as that class.

## Question 6: Precision, Recall, and F1-Score (10 points)

a. Write code to compute precision, recall, and F1-score for each activity class.

b. Identify one class with high precision but low recall and one with high recall but low precision.

c. Interpret what these results mean in terms of your classifier’s behavior.

In [29]:
#write code
from sklearn.metrics import precision_recall_fscore_support

precision, recall, f1, support = precision_recall_fscore_support(
    y_test, 
    y_pred_no, 
    average=None, 
    labels=label_encoder.classes_
)
metrics_df = pd.DataFrame({
    'Activity': label_encoder.classes_,
    'Precision': precision,
    'Recall': recall,
    'F1-Score': f1,
    'Support': support
})
print(metrics_df)

             Activity  Precision    Recall  F1-Score  Support
0              LAYING   1.000000  1.000000  1.000000      422
1             SITTING   0.948052  0.945596  0.946822      386
2            STANDING   0.946860  0.951456  0.949153      412
3             WALKING   1.000000  1.000000  1.000000      368
4  WALKING_DOWNSTAIRS   0.993266  0.996622  0.994941      296
5    WALKING_UPSTAIRS   0.996875  0.990683  0.993769      322


In [33]:
high_precision_low_recall = results_df.loc[
    (results_df['Precision'] > 0.98) & (results_df['Recall'] < 0.95)
]

print("\nClasses with HIGH PRECISION but LOW RECALL:")
print(high_precision_low_recall)

high_recall_low_precision = results_df.loc[
    (results_df['Recall'] > 0.98) & (results_df['Precision'] < 0.95)
]

print("\nClasses with HIGH RECALL but LOW PRECISION:")
print(high_recall_low_precision)


Classes with HIGH PRECISION but LOW RECALL:
Empty DataFrame
Columns: [Activity, Precision, Recall, F1-Score, Support]
Index: []

Classes with HIGH RECALL but LOW PRECISION:
Empty DataFrame
Columns: [Activity, Precision, Recall, F1-Score, Support]
Index: []


# **Write explanation for Question 5b and 5c.**








##
The confusion matrix shows that the model predicts most activities very accurately. The activity SITTING is most often confused with STANDING, likely because their movements are very similar and hard for the sensors to tell apart.