# Introduction

Mental health in the workplace is an increasingly important issue, especially in the technology industry where job demands are high and stigma often discourages employees from speaking openly. Many companies provide supports such as mental health resources, employer-led discussions, and medical coverage, but the effectiveness of these supports in improving employees’ comfort in discussing mental health is not always clear.

In this project, we analyze survey data from technology professionals to examine whether workplace supports influence employees’ comfort in discussing mental health. We focus on three main supports (workplace resources, employer discussion, and medical coverage) along with an engineered feature, combined support, which captures whether an employee has at least two supports available. We also consider demographic effects of gender and age.  

Our approach combines exploratory data analysis, hypothesis testing, and predictive modeling to provide both statistical evidence and machine learning insights into the role of workplace supports.

# Hypotheses

**Null Hypothesis (H₀):** Workplace supports (resources, employer discussion, medical coverage, and combined support) have no effect on employees’ comfort in discussing mental health.  

**Alternative Hypothesis (H₁):** Employees with workplace supports (resources, employer discussion, medical coverage, or multiple combined supports) report significantly higher comfort in discussing mental health compared to those without supports.  

We will test this using both traditional statistical methods (t-tests, confidence intervals) and predictive modeling (logistic regression, random forest, gradient boosting).

In [None]:
from DataProcessing import pd, np, sns, plt, pt
from DataProcessing import DataProcessor
from DescriptiveAnalysis import DescriptiveAnalisis
from ModelAnalysis import ModelAnalysis
from sklearn.model_selection import train_test_split

# Data Cleanup and Processing
dp = DataProcessor("data.csv")
dp.load_datafile()
dp.keep_relevant_columns()
dp.filter_tech()
df = dp.preprocess_data()

print(f"Rows after filtering and cleaning data: {len(df):,}")

In [None]:
# Exploratory Data Analysis
da = DescriptiveAnalisis(df)
desc_table = da.descriptive_stats()
display(desc_table.style.set_caption("Table X. Descriptive Statistics")
        .format(precision=3))
da.descriptive_visualization()

In [None]:
# Model Analysis: T-test
ma = ModelAnalysis(df)
ttable = ma.ttest_table()
display(ttable.style.set_caption("Table Y. Independent Samples t-Tests for Mental Health Sharing (mh_share)"))

In [None]:
# Model Analysis: Regressions Test
split = ma.train_test_split()
X_train, X_test, y_train, y_test = split.X_train, split.X_test, split.y_train, split.y_test
feature_names = list(X_train.columns)

# sklearn coefficients/ORs (matches what you printed earlier)
log_imp = ma.fit_improved_logistic(X_train, y_train)
coef_or = ma.coef_table(log_imp, feature_names)
display(coef_or.style.set_caption("Table Z. Logistic Regression Coefficients and Odds Ratios (Sklearn)"))

# Optional: statsmodels table with SE, p, CI
sm_table, sm_result = ma.statsmodels_logit(X_train, y_train)
display(sm_table.style.set_caption("Appendix Table. Statsmodels Logit Estimates (Train Set)"))

In [None]:
# Model Analysis: Comparing Models Used
perf = ma.compare_models(X_train, X_test, y_train, y_test)
display(perf.style.set_caption("Model Comparison Summary — Test Set")
        .format(precision=3))