# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** - Tushar Garg


# **Project Summary -**

Write the summary here within 500-600 words.

The dataset analyzed originates from a 2014 mental health survey conducted within the tech industry. It comprises responses from 1,259 individuals across various demographic and workplace backgrounds. The dataset explores topics like family history of mental illness, workplace mental health support, anonymity, treatment-seeking behavior, and perceived consequences of discussing mental health at work.

The goal of this exploratory data analysis (EDA) was to understand behavioral trends and workplace patterns that influence how employees address mental health concerns. The UBM (Univariate, Bivariate, Multivariate) method was used to extract insights.

 During preprocessing, we corrected major outliers and cleaned gender responses into standard categories (Male, Female, Other). We also handled inconsistent or non-numeric age entries and removed unrealistic values (<18 and >65).

In the univariate analysis phase, we found:

- Majority of respondents were Male and aged between 25–35.

- Significant work interference due to mental health issues.

- Limited awareness about employer-provided mental health benefits.

- A considerable number of employees reported lack of trust in anonymity when seeking help.

Bivariate analysis highlighted the relationships between treatment and features like gender, age, remote work status, and company size:

- Female respondents were slightly more likely to seek treatment than males.

- Employees in larger companies were more likely to be aware of mental health benefits.

- Remote workers showed treatment-seeking patterns slightly different from non-remote employees.

Multivariate analysis included correlation heatmaps and pair plots. Categorical variables were encoded for a correlation matrix, revealing weak to moderate relationships between family history, treatment, and work structure. The pair plot visually confirmed the relationships between age, gender, and treatment behavior.

From these analyses, we identified major themes:

- Lack of mental health support is still prevalent.

- Stigma prevents open conversations about mental health.

- There is inadequate training among supervisors to handle such discussions.

- Companies often fail to communicate policies clearly.

Based on the findings, here are strategic business recommendations:

- **Improve Awareness:** Employers must regularly communicate about mental health benefits and support resources.

- **Build Trust:** Ensure anonymity and confidentiality when employees seek mental health help.

- **Supervisor Training:** Equip leaders with the skills to respond to mental health concerns empathetically.

- **Tailored Wellness Programs:** Customize initiatives based on age, gender, remote work, and family history.

- **Equal Treatment:** Align mental health policies to match those for physical health in accessibility and importance.

This analysis emphasizes that mental health is not a one-size-fits-all issue. Proactive, inclusive, and well-communicated support systems are essential to create healthier tech workplaces. Implementing these strategies can reduce stigma, enhance productivity, and improve employee retention.

This project demonstrates the power of structured EDA in uncovering actionable insights from real-world behavioral datasets.

This will not only improve employee well-being but also contribute positively to retention, engagement, and overall performance.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

Mental health is a critical yet often overlooked aspect of employee well-being, especially in the fast-paced and high-stress environment of the tech industry.

Despite growing awareness, many employees continue to face stigma, lack of support, and poor access to mental health resources at the workplace. Employers often lack the data-driven insights needed to design effective, inclusive, and proactive mental health strategies.

This project aims to analyze survey data collected from tech workers across various regions to identify patterns in mental health awareness, treatment-seeking behavior, and perceived workplace support.

By understanding these patterns through exploratory data analysis (EDA), organizations can be guided toward meaningful actions that improve employee well-being, productivity, and workplace inclusiveness.

#### **Define Your Business Objective?**

Answer Here.

The objective of this project is to uncover actionable insights that can help tech organizations better understand the mental health landscape of their workforce.

By analyzing employee survey data, we aim to identify patterns in mental health awareness, treatment-seeking behavior, workplace support, and perceptions of stigma.

These insights will guide decision-makers in implementing effective mental health policies, promoting a culture of openness, improving resource accessibility, and designing targeted interventions for high-risk groups such as remote workers, younger professionals, and employees with a family history of mental illness.

Ultimately, the goal is to empower organizations to create inclusive, supportive, and stigma-free work environments that improve employee well-being, boost productivity, and reduce attrition.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import files
uploaded = files.upload()

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv('survey.csv')
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(),cbar=False)

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Convert Age to numeric and remove invalid values
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
df.loc[(df['Age'] < 18) | (df['Age'] > 65), 'Age'] = np.nan
df['Age'].describe()

def clean_gender(gender):
    gender = str(gender).lower()
    if 'female' in gender:
        return 'Female'
    elif 'male' in gender:
        return 'Male'
    else:
        return 'Other'

df['Gender'] = df['Gender'].apply(clean_gender)
df['Gender'].value_counts()

### What all manipulations have you done and insights you found?

Answer Here.

Add exception handling:

In [None]:
try:
    df = pd.read_csv('survey.csv')
except Exception as e:
    print("Error loading dataset:", e)

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Gender Distribution (Univariate)

# Fix: Ensure gender values are cleaned before plotting
unique_genders_before = df['Gender'].unique()
print("Unique gender values before fix:", unique_genders_before)

# Plot only cleaned categories
sns.countplot(x='Gender', data=df[df['Gender'].isin(['Male', 'Female', 'Other'])])
plt.title('Distribution of Gender')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

To examine gender diversity in the dataset.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

Majority of respondents are Male, with a smaller representation of Female and Other.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Indicates gender diversity challenge in tech. Diversity impacts inclusivity and mental health policy adoption.


#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Age Distribution (Univariate)

# Fix: Remove extreme outliers and use only valid age values
valid_age = df['Age'].dropna()
valid_age = valid_age[(valid_age >= 18) & (valid_age <= 65)]  # filter again for good measure

sns.histplot(valid_age, kde=True, bins=25, color='teal')
plt.title('Age Distribution of Respondents')
plt.xlabel('Age')
plt.ylabel('Count')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

 Understand the most represented age groups in the tech workplace.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Respondents are largely aged 25–35. Young workforce dominates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Mental health strategies should be tailored for younger professionals who might have different expectations.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Treatment Count (Univariate)

sns.countplot(x='treatment', data=df)
plt.title('Have You Sought Mental Health Treatment?')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

 Understand the proportion of people seeking mental health support.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

A significant number of people have sought treatment.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Indicates a demand for mental health support in tech organizations.


#### Chart - 4

In [None]:
# Chart - 4 visualization code

# Remote Work Distribution (Univariate)

sns.countplot(x='remote_work', data=df)
plt.title('Remote Work Status')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Remote work can impact isolation and mental health.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

A good portion of employees work remotely.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Remote work policies need to include mental health support mechanisms.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Company Size Distribution (Univariate)

sns.countplot(y='no_employees', data=df, order=df['no_employees'].value_counts().index)
plt.title('Company Size Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

To find the count of companies containing certain number of employees.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Majority work in mid-to-large size organizations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Mental health resources may be more available in large organizations.

Company size may influence availability of resources.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Work Interference Frequency (Univariate)

sns.countplot(x='work_interfere', data=df)
plt.title('Work Interference Due to Mental Health')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Measures how often mental health interferes with work.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Many respondents report "Sometimes" or "Often" interference.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Suggests workplace productivity could improve with better support systems.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Self-Employment Status (Univariate)

sns.countplot(x='self_employed', data=df)
plt.title('Self-Employment Status')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.
 To find the count of status of self employeed workers.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Most respondents are not self-employed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Self-employed workers may have different access to support.

Indicates need to also target mental health policies toward freelancers.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Family History of Mental Illness (Univariate)

sns.countplot(x='family_history', data=df)
plt.title('Family History of Mental Illness')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Family history is a predictor of mental health conditions.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Substantial number of respondents have a family history.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Preventive mental health efforts may benefit these groups.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# Mental Health Consequences Perceived (Univariate)

sns.countplot(x='mental_health_consequence', data=df)
plt.title('Perceived Mental Health Consequences at Work')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To find the count if there has been any percieved mental health consequences at work

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Many respondents fear consequences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Indicates stigma remains a significant barrier.

Perception of negative consequences influences help-seeking.

#### Chart - 10

In [None]:
# Chart - 10 visualization code

# Treatment vs Gender (Bivariate)
cleaned_gender_df = df[df['Gender'].isin(['Male', 'Female', 'Other'])]
sns.countplot(x='treatment', hue='Gender', data=cleaned_gender_df)
plt.title('Treatment Seeking by Cleaned Gender')
plt.xlabel('Sought Treatment')
plt.ylabel('Count')
plt.legend(title='Gender')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Analyze if treatment-seeking differs by gender.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

All genders report treatment, with slightly higher Female proportion.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Helps tailor outreach programs by gender sensitivity.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

# Age vs Treatment (Bivariate)

sns.boxplot(x='treatment', y=valid_age, data=df)
plt.title('Age vs Treatment Seeking')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Check if age affects likelihood of seeking treatment.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Median age for both groups is similar; younger skew for treated.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Younger employees may benefit more from early interventions.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# Remote Work vs Treatment (Bivariate)

sns.countplot(x='remote_work', hue='treatment', data=df)
plt.title('Remote Work vs Mental Health Treatment')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

See if remote workers seek treatment differently.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

Minor difference observed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Remote work flexibility may improve access or hide struggles.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# Company Size vs Benefits (Bivariate)

sns.countplot(y='no_employees', hue='benefits', data=df)
plt.title('Company Size vs Mental Health Benefits')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Larger companies may provide more benefits.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Confirmed: Large firms report more mental health benefits.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

SMEs may need policy guidance or partnership support.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
from sklearn.preprocessing import LabelEncoder

# Select and encode categorical variables for heatmap
encoded_df = df[['Age', 'Gender', 'treatment', 'remote_work', 'family_history']].dropna()
label_cols = ['Gender', 'treatment', 'remote_work', 'family_history']

le = LabelEncoder()
for col in label_cols:
    encoded_df[col] = le.fit_transform(encoded_df[col])

corr_matrix = encoded_df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap (Encoded Features)')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

 Understand how numeric values relate to each other.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Since only Age is numeric, correlation heatmap is simple but useful when more numeric data is engineered.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Framework to include other variables like mental health scores or leave duration.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Pairplot for Age, Gender and Treatment (Multivariate)

pairplot_df = df[(df['Age'].notna()) & (df['Age'] >= 18) & (df['Age'] <= 65)]
pairplot_df = pairplot_df[pairplot_df['Gender'].isin(['Male', 'Female', 'Other'])]

sns.pairplot(
    pairplot_df[['Age', 'Gender', 'treatment']],
    hue='treatment',
    palette='Set2',
    diag_kind='kde'
)
plt.suptitle('Pairplot of Age, Gender and Treatment (Cleaned)', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

 Explore interactions between numerical and categorical features visually.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Visual clusters reveal potential patterns among treatment, age, and gender.


In [None]:
# Chart 14: Care Options Known (Univariate)

sns.countplot(x='care_options', data=df)
plt.title('Awareness of Mental Health Care Options')
plt.xticks(rotation=45)
plt.show()

1. Why did you pick the specific chart?


Measures employee awareness.

2. What is/are the insight(s) found from the chart?

Many are not aware of their options.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Education on available support must be improved.

In [None]:
# Chart 15: Anonymity Protection (Univariate)

sns.countplot(x='anonymity', data=df)
plt.title('Anonymity of Mental Health Disclosure')
plt.show()

1. Why did you pick the specific chart?

Anonymous disclosures increase likelihood of help-seeking.

2. What is/are the insight(s) found from the chart?

Most do not trust anonymity is preserved.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Stronger privacy policies are needed to reduce fear.

In [None]:
# Chart 16: Wellness Programs Discussion (Univariate)

sns.countplot(x='wellness_program', data=df)
plt.title('Wellness Programs in Organizations')
plt.show()

1. Why did you pick the specific chart?

Wellness programs are proactive tools.

2. What is/are the insight(s) found from the chart?

Few organizations have discussed them.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Missed opportunity to prevent crises through proactive support.

In [None]:
# Chart 17: Supervisor Support (Univariate)

sns.countplot(x='supervisor', data=df)
plt.title('Willingness to Talk to Supervisor')
plt.show()

1. Why did you pick the specific chart?

Supervisor openness reflects workplace culture.

2. What is/are the insight(s) found from the chart?

Mixed response: not all are comfortable.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Supervisor training on mental health sensitivity needed.

In [None]:
# Chart 18: Leave Policy Perception (Univariate)

sns.countplot(x='leave', data=df)
plt.title('Ease of Taking Mental Health Leave')
plt.xticks(rotation=45)
plt.show()

1. Why did you pick the specific chart?

Examines perceived accessibility of leave.

2. What is/are the insight(s) found from the chart?

 Many respondents are unsure or report difficulty.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Policy clarity and ease can significantly impact recovery.


In [None]:
# Chart 19: Mental vs Physical Health Perception (Univariate)

sns.countplot(x='mental_vs_physical', data=df)
plt.title('Mental vs Physical Health Perception')
plt.xticks(rotation=45)
plt.show()

1. Why did you pick the specific chart?

Equal perception is key to workplace parity.

2. What is/are the insight(s) found from the chart?

Many don't believe mental and physical health are equally valued.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Companies should communicate parity clearly.

In [None]:
# Chart 20: Observed Negative Consequences (Univariate)

sns.countplot(x='obs_consequence', data=df)
plt.title('Observed Negative Outcomes for Others')
plt.show()

1. Why did you pick the specific chart?

 Observing stigma discourages disclosure.

2. What is/are the insight(s) found from the chart?

Substantial responses of "Yes".

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Culture change and peer support initiatives are crucial.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Business Objective Recommendation**

To effectively address mental health challenges in the tech industry, the client should focus on the following strategic actions:

**1. Promote a Culture of Openness**

*   A significant number of employees fear negative consequences when discussing mental health.
*   Introduce anonymous feedback tools, peer support groups, and open dialogue campaigns.



**2. Prioritize Manager & Supervisor Training**


*   Not all employees are comfortable discussing issues with their supervisors.
*   Train managers in mental health first-aid and empathetic communication.

**3. Implement & Communicate Clear Leave Policies**


*   Many employees are unaware or unsure about mental health leave.
*   
Simplify, document, and communicate leave policy procedures clearly


**4. Expand Awareness of Benefits & Care Options**


*   Employees lack awareness of mental health benefits and care options.
*   Run internal wellness campaigns and mandatory onboarding sessions on available resources.

**5. Tailor Programs Based on Demographics**


*   Younger professionals (25–35) and those with family history are more likely to seek help.
*   Design targeted programs for these high-impact groups (e.g., mentorship, early-career burnout support).

**6. Normalize Mental and Physical Health Equity**


*   Many respondents feel mental health isn’t treated equally.
*   
Standardize benefits, terminology, and recovery protocols for both types of conditions.







# **Conclusion**

Write the conclusion here.

The analysis highlights a clear need for improved mental health support systems in tech workplaces.

**Key areas for action include:**


*    Raising awareness about mental health benefits and care options.

*    Building trust through protected anonymity and supportive leave policies.
*   Training supervisors to handle mental health conversations sensitively.


*  Ensuring parity between physical and mental health support.

*   Targeting efforts toward younger employees and those with family history of mental illness.


 Implementing these strategies can foster a more inclusive, productive, and emotionally healthy work environment.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***