In [1]:
import pandas as pd
import statsmodels.api as sm


In [3]:
mh_df = pd.read_csv('/content/Mental Health Dataset/mental_health_dataset.csv')
mh_df.head(5)

Unnamed: 0,age,gender,employment_status,work_environment,mental_health_history,seeks_treatment,stress_level,sleep_hours,physical_activity_days,depression_score,anxiety_score,social_support_score,productivity_score,mental_health_risk
0,56,Male,Employed,On-site,Yes,Yes,6,6.2,3,28,17,54,59.7,High
1,46,Female,Student,On-site,No,Yes,10,9.0,4,30,11,85,54.9,High
2,32,Female,Employed,On-site,Yes,No,7,7.7,2,24,7,62,61.3,Medium
3,60,Non-binary,Self-employed,On-site,No,No,4,4.5,4,6,0,95,97.0,Low
4,25,Female,Self-employed,On-site,Yes,Yes,3,5.4,0,24,12,70,69.0,High


# **Logistic Regression**

> This model is built to predict the likelihood of employees seeking mental health treatment

In [4]:
# Encoding binary target variable (0 and 1)
mh_df['seeks_treatment'] = mh_df['seeks_treatment'].map({'No': 0, 'Yes': 1})

# Defining predictors and target variables
X = mh_df[['stress_level', 'sleep_hours', 'physical_activity_days',
        'depression_score', 'anxiety_score', 'social_support_score', 'productivity_score']]
y = mh_df['seeks_treatment']

# Adding constant for intercept
X = sm.add_constant(X)

# Fitting logistic regression model
model = sm.Logit(y, X).fit()

# Summary output
print(model.summary())


Optimization terminated successfully.
         Current function value: 0.672375
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:        seeks_treatment   No. Observations:                10000
Model:                          Logit   Df Residuals:                     9992
Method:                           MLE   Df Model:                            7
Date:                Sun, 15 Jun 2025   Pseudo R-squ.:               0.0002191
Time:                        14:37:28   Log-Likelihood:                -6723.7
converged:                       True   LL-Null:                       -6725.2
Covariance Type:            nonrobust   LLR p-value:                    0.8899
                             coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                     -0.5809      0.438     -1.327      0.184      -1.439      

## Observation


### **Model**: Logistic regression predicting *"seeks_treatment"*

***Report***

> Coefficients and p-values were examined for each predictor.
>
> A p-value less than 0.05 indicates that the predictor has a statistically significant effect on the odds of seeking treatment.


**Result Interpretation**: ***ALL Predictors***

> The results for ALL predictors show that they were not statistical significance predictors for "seeking treatment" (P>0.05). In other words, increase or decrease in the anaylysed predictor variables does not affect the odds of seeking treatment.


## General Insight

> The model proved that the anaylsed predictor variables (stress_level
sleep_hours, physical_activity_day, depression_score, anxiety_score, social_support_score, and productivity_score) were insignificant predictors of the likelihood of employees to seek treatment for their mental health troubless.
>
> However, although the predictor variables are insignificant, they may still play a role in a multivariate context or interact with others, which could be tested in more advanced models.


# **Simple Linear Regression**

> This model is built to investigate the social and lifestyle determinants of mental health among employees in the workplace.

In [6]:
# Defining target (outcome) and predictor variables
target_vars = ['depression_score', 'anxiety_score', 'productivity_score']
predictor_vars = ['sleep_hours', 'physical_activity_days', 'social_support_score']

# Run simple linear regression for each pair
for target in target_vars:
    for predictor in predictor_vars:
        print(f"\nSimple Linear Regression: {target} ~ {predictor}")
        X = sm.add_constant(mh_df[predictor])
        y = mh_df[target]
        model = sm.OLS(y, X).fit()
        print(model.summary())



Simple Linear Regression: depression_score ~ sleep_hours
                            OLS Regression Results                            
Dep. Variable:       depression_score   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.2576
Date:                Sun, 15 Jun 2025   Prob (F-statistic):              0.612
Time:                        14:44:06   Log-Likelihood:                -36151.
No. Observations:               10000   AIC:                         7.231e+04
Df Residuals:                    9998   BIC:                         7.232e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------

## Observation


### **Model**: OLS - Simple linear regression

**Target Variables**: *depression_score, anxiety_score, and productivity_score*

**Predictor Variables**: *sleep_hours, physical_activity_days, social_support_score*

> Result showed no statistical significant correlation between the predictor and target variables

### **Interpretation**

> The model established that sleep hours, physical activity and having social support were insignificant predictors of mental health outcomes such as depression, anxiety, and productivity.

## General Insight

> These results suggest that lifestyle factors like sleep and physical activity, and social factors like support, cannot determine or explain variations in mental health status among employees in the work places.


# **Simple Linear Regression Predicting Productivity Score**

> From previously noted significant correlation between depression_score vs productivity_score, we built this model to determine causation


In [8]:
# work_environment vs anxiety_score

X = sm.add_constant(mh_df['depression_score'])
y = mh_df['productivity_score']
model = sm.OLS(y, X).fit()
print(model.summary())



                            OLS Regression Results                            
Dep. Variable:     productivity_score   R-squared:                       0.882
Model:                            OLS   Adj. R-squared:                  0.882
Method:                 Least Squares   F-statistic:                 7.473e+04
Date:                Sun, 15 Jun 2025   Prob (F-statistic):               0.00
Time:                        15:29:54   Log-Likelihood:                -29938.
No. Observations:               10000   AIC:                         5.988e+04
Df Residuals:                    9998   BIC:                         5.989e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
const               99.4025      0.094  

## Observation


### **Model**: Simple linear regression

**Target Variable**: ***Productivity_score***

**Predictor Variable**: ***Depression_score***

> Coefficient (β) = -1.4688, p-value = 0.000
>
> R-squared = 0.882
>
> There is a significant association between Depression_score and Productivity_score.


### **Interpretation**

> A significant negative β coefficient for depression_score predicting productivity_score (β = -1.4688, p = 0.000, R² = 0.882) suggests that as depression increases, productivity decreases. Hence indicating a strong linear relationship.

## General Insight

> The result suggest that depression can explain variations in productivity among employees in the work place.
