# CS 5 - Spring 2025 - Machine Learning Final Project Proposal

**Project Title**: <span style="color: blue">Predicting Mental Health Treatment-Seeking Behavior Using Workplace and Lifestyle Factors</span>

**Project Member(s)**: <span style="color: blue">Lena Munad (individual)</span>

## Introduction
This project explores how workplace environment, personal history, and lifestyle habits influence mental health support-seeking behavior among tech employees. Using data from a mental health survey, we aim to build machine learning models to predict whether someone has sought mental health treatment.

This binary classification problem will be tackled using three supervised learning algorithms from this course: k-Nearest Neighbors (k-NN), Naïve Bayes (GaussianNB), and Support Vector Machines (SVM). The results will help us compare performance and interpret which factors are most predictive.


## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC


## Domain
The dataset used in this project is from Kaggle: [Mental Health in Tech Survey](https://www.kaggle.com/datasets/osmi/mental-health-in-tech-survey). It contains self-reported responses from tech workers on:

- Mental health history and treatment
- Employer support and workplace culture
- Comfort discussing mental health
- Demographics like age, gender, and work setup

The target variable is: **“Have you sought treatment for a mental health condition?”** (Yes/No). We'll clean, encode, and preprocess the data to prepare it for modeling.


## ML Algorithm #1
### Algorithm 1: k-Nearest Neighbors (k-NN)

k-NN will be used as a baseline model. It classifies a data point based on the majority class of its `k` closest neighbors. Since k-NN is sensitive to feature scaling, normalization will be applied. We'll explore different `k` values to optimize performance.


## ML Algorithm #2
### Algorithm 2: Naïve Bayes (GaussianNB)

Naïve Bayes applies Bayes' Theorem under the assumption that features are conditionally independent. We'll use the GaussianNB classifier for this dataset after encoding categorical variables. It is computationally efficient and effective for high-bias scenarios.


## ML Algorithm #3
### Algorithm 3: Support Vector Machine (SVM)

SVM will be used to build a model that can separate data points with a maximum margin hyperplane. After scaling the features, we’ll train an SVM and experiment with kernel options and parameters to improve prediction accuracy.


## Results and Analysis
We will evaluate the performance of each model using:

- Accuracy
- Precision, Recall, and F1 Score
- Confusion Matrix

Visualizations will be provided where useful, and models will be compared based on performance and interpretability. We'll also note which features contributed most to predictions.


## Conclusion
The conclusion will summarize our findings from model evaluations. We'll discuss:

- Which model performed best
- The most important features influencing mental health treatment-seeking
- Ethical considerations and real-world implications for using ML in mental health contexts
