<a href="https://colab.research.google.com/github/ROOPESH9462/ml-assignments/blob/main/assignment%205.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import Libraries

These libraries are used for data handling, preprocessing, modeling, and evaluation.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

Load and Analyze the Dataset

This part loads the dataset and provides an overview of its structure and contents

In [8]:
data_path = '/content/sleep_health_lifestyle_dataset.csv'
df = pd.read_csv(data_path)

print("Dataset Head:\n", df.head())
print("\nDataset Info:\n")
df.info()
print("\nMissing Values:\n", df.isnull().sum())
print("\nClass Distribution:\n", df.iloc[:, -1].value_counts())

Dataset Head:
    Person ID  Gender  Age     Occupation  Sleep Duration (hours)  \
0          1    Male   29   Manual Labor                     7.4   
1          2  Female   43        Retired                     4.2   
2          3    Male   44        Retired                     6.1   
3          4    Male   29  Office Worker                     8.3   
4          5    Male   67        Retired                     9.1   

   Quality of Sleep (scale: 1-10)  Physical Activity Level (minutes/day)  \
0                             7.0                                     41   
1                             4.9                                     41   
2                             6.0                                    107   
3                            10.0                                     20   
4                             9.5                                     19   

   Stress Level (scale: 1-10) BMI Category  \
0                           7        Obese   
1                          

Data Preprocessing

This section handles missing values, encodes categorical features, scales numerical features, and splits the data.

In [9]:
df.fillna(df.median(numeric_only=True), inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)
label_encoders = {}
for col in df.select_dtypes(include='object').columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train Models

This part defines multiple machine learning models and trains them on the dataset

In [10]:
models={
    "Logistic Regression": LogisticRegression(),
    "Naive Bayes": GaussianNB(),
    "KNN": KNeighborsClassifier(),
    "Decision Tree": DecisionTreeClassifier(),
    "SVC": SVC()
}
results={}
for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    results[model_name]={
        "Accuracy": accuracy,
        "Precision": precision,
        "Recall": recall,
        "F1-Score": f1
    }

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Results

This part displays the performance metrics for each model in a readable format.

In [11]:
print("\nPerformance Metrics:\n")
for model_name, metrics in results.items():
    print(f"{model_name}:\n")
    for metric, value in metrics.items():
        print(f"  {metric}: {value:.4f}")
    print("\n")


Performance Metrics:

Logistic Regression:

  Accuracy: 0.9500
  Precision: 0.9025
  Recall: 0.9500
  F1-Score: 0.9256


Naive Bayes:

  Accuracy: 0.9500
  Precision: 0.9025
  Recall: 0.9500
  F1-Score: 0.9256


KNN:

  Accuracy: 0.9500
  Precision: 0.9025
  Recall: 0.9500
  F1-Score: 0.9256


Decision Tree:

  Accuracy: 0.8375
  Precision: 0.8965
  Recall: 0.8375
  F1-Score: 0.8660


SVC:

  Accuracy: 0.9500
  Precision: 0.9025
  Recall: 0.9500
  F1-Score: 0.9256


