# **Project Name**    - **Mental Health Survey EDA**

##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

* Performed Exploratory Data Analysis (EDA) on the *Mental Health in Tech Survey (2014)* dataset to study mental health awareness and workplace support in the tech industry.
* Analyzed survey responses related to mental health treatment, family history, employer benefits, and openness in discussing mental health at work.
* Cleaned and preprocessed the data by handling missing values, standardizing gender categories, and removing unrealistic age values to improve data quality.
* Used visualizations and statistical analysis to identify key patterns and trends in mental health prevalence among tech professionals.
* Observed that a significant number of employees have experienced mental health issues and many have sought professional treatment.
* Found that individuals with a family history of mental illness are more likely to seek mental health treatment.
* Identified that mental health conditions often interfere with work performance and productivity.
* Noted that many employees are unaware of mental health benefits provided by their organizations.
* Discovered that mental health is not always treated with the same importance as physical health in workplaces.
* Highlighted hesitation among employees to discuss mental health concerns with supervisors, indicating the presence of workplace stigma.
* Concluded that organizations need better mental health awareness programs, clear communication of support resources, and a more open and supportive work culture.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


* Mental health issues are increasingly common among professionals in the technology industry, yet they often remain unaddressed due to stigma, lack of awareness, and insufficient workplace support.
* Many organizations do not clearly communicate mental health benefits or provide an environment where employees feel comfortable discussing mental health concerns.
* As a result, mental health conditions can negatively impact employee well-being, productivity, and overall organizational performance.
* There is a need to analyze real survey data to understand employee perceptions, treatment patterns, and the effectiveness of workplace mental health initiatives in the tech sector.


#### **Define Your Business Objective?**

* To analyze mental health trends and treatment patterns among employees in the technology industry using survey data.
* To understand how workplace factors such as company size, benefits, wellness programs, and employer support influence mental health awareness.
* To identify gaps in organizational mental health policies and communication of available support resources.
* To assess employee willingness to discuss mental health issues with coworkers, supervisors, and potential employers.
* To generate data-driven insights that can help organizations improve mental health initiatives, reduce stigma, and create a healthier work environment.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
dataset=pd.read_csv("/content/drive/MyDrive/datasets/survey.csv")

### Dataset First View

In [None]:
# Dataset First Look
pd.set_option('display.max_columns',None)
pd.set_option('display.max_colwidth',None)
dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset.shape

### Dataset Information

In [None]:
# Dataset Info
dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
dataset.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
dataset.isnull().sum()

### What did you know about your dataset?

- The dataset contains survey responses from employees in the technology sector about their mental health, demographics, and workplace environment.
- It includes 27 variables covering personal attributes (age, gender, country), employment context (company size, tech or non‑tech, remote work), and mental health–related perceptions such as family history, treatment, interference with work, employer benefits, and openness to discuss issues.
- Initial checks on shape, duplicates, and missing values reveal a moderate amount of categorical data with some nulls in fields like state, work_interfere, and comments that will need cleaning or imputation before analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe
dataset.describe()

### Variables Description

1. Timestamp : Date survey was filled.
2. Age : Age of the surveyed person.
3. Gender : Gender identity of the surveyed person.
4. Country : Country of residence.
5. State : If you live in the United States, which state or territory do you live in?
6. self_employed : Are you self-employed ?
7. family_history : Do you have a family history of mental illness.
8. treatment : Have you sought treatment for a mental health condition.
9. work_interfere : If you have a mental health condition , do you feel that it interferes with your work?
10. no_employees : How many employees does your company or organization have ?
11. remote_work : Do you work remotely ( outside of an office) atleast 50% of the time?
12. tech_company : Is your employer primarily a tech company / organization ?
13. benefits : Does your employer provide mental health benefits ?
14. care_options : Do you know the options for mental health care your employer provides ?
15. wellness_program : Has your employer ever discussed about mental health as part of employee wellness program?
16. seek_help : Does your employer provide resources to learn more about mental health issues and how to seek help?
17. anonymity : Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?
18. leave : How easy is it for you to take medical leave for a mental health condition?
19. mental_health_consequence : Do you think that discussing a mental health issue with your employer would have negative consequences?
20. phys_health_consequence : Do you think that discussing a physical health issue with your employer would have negative consequences?
21. coworkers : Would you be willing to discuss a mental health issue with your coworkers?
22. supervisor : Would you be willing to discuss a mental health issue with your direct supervisor(s)?
23. mental_health_interview : Would you bring up a mental health issue with a potential employer in an interview?
24. phys_health_interview : Would you bring up a physical health issue with a potential employer in an interview?
25. mental_vs_physical : Do you feel that your employer takes mental health as seriously as physical health?
26. obs_consequence : Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
27. comments : Any additional notes or comments


### Check Unique Values for each variable.

In [None]:
dataset.nunique()

In [None]:
# Check Unique Values for each variable.
dataset['Gender'].unique()

In [None]:
dataset['Country'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

# Categorical Data Cleaning and Standardization

In [None]:
# Normalizing text
dataset['Gender'].str.lower().str.strip()

In [None]:
# Creating group of 4 for Gender column

# 1. Male
male=['m', 'male', 'male-ish', 'maile', 'mal', 'make', 'mail', 'msle',
'man', 'cis male', 'male (cis)', 'guy (-ish) ^_^',
'cis man', 'male ', 'malr',
'ostensibly male, unsure what that really means',
'something kinda male?',
'male leaning androgynous'
]

# 2. Female
female=['f', 'female', 'female ', 'femake', 'femail',
'woman', 'cis female', 'female (cis)',
'trans-female', 'trans woman', 'female (trans)',
'cis-female/femme'
]

# 3. Others (Non-binary / Gender-diverse)
others=['non-binary', 'enby', 'genderqueer', 'agender',
'androgyne', 'fluid', 'neuter',
'queer', 'queer/she/they', 'all'
]

# 4.Unknown / Invalid
unknown=['nah', 'p', 'a little about you']

In [None]:
def clean_gender(g):
  if g in male:
    return 'Male'
  elif g in female:
    return 'Female'
  elif g in others:
    return 'Others'
  else:
    return 'Unknown'
dataset['Gender_cleaned']=dataset['Gender'].apply(clean_gender)

In [None]:
dataset['Gender_cleaned'].value_counts()

In [None]:
dataset.shape

# Age Analysis & Outlier Removal
- Some ages are unrealistic (e.g. <10 or >100)

In [None]:
dataset = dataset[(dataset['Age'] >= 18) & (dataset['Age'] <= 65)]

In [None]:
sns.histplot(dataset['Age'], bins=10)
plt.xlabel("Age")
plt.ylabel("Count")
plt.title("Age Distribution of Survey Respondents")
plt.show()

In [None]:
# Handle Missing Values
dataset['self_employed'] = dataset['self_employed'].fillna('No')
dataset['work_interfere'] = dataset['work_interfere'].fillna('Unknown')
dataset.drop(columns=['comments', 'state', 'Timestamp'], inplace=True)

### What all manipulations have you done and insights you found?

- Removed irrelevant and high-missing-value columns (e.g., free-text comments) to keep the analysis focused and clean.

- Handled missing values by analyzing their impact and retaining only meaningful attributes for EDA.

- Cleaned and standardized the Gender column by merging multiple inconsistent entries into common categories (Male, Female, Other, Unknown).

- Identified and removed unrealistic Age values (extreme outliers) using domain knowledge to ensure accurate analysis.

- Filtered the dataset to include only valid working-age respondents (18–65 years).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 : Boxplot- Age

In [None]:
# Chart - 1 visualization code
sns.boxplot(y=dataset['Age'])
plt.title("Age Boxplot")
plt.show()

##### 1. Why did you pick the specific chart?

- Detects spread & outliers

##### 2. What is/are the insight(s) found from the chart?

- Most respondents are 25–35 years old

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – Validates data quality

#### Chart - 2 : Gender Distribution

In [None]:
dataset['Gender_cleaned'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title("Gender Distribution")
plt.ylabel("")
plt.show()


##### 1. Why did you pick the specific chart?

- Shows proportion clearly

##### 2. What is/are the insight(s) found from the chart?

- Unknown respondents dominate

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Negative – lack of diversity may bias insights

#### Chart - 3 : Treatment Taken or Not

In [None]:
# Chart - 3 visualization code
sns.countplot(x='treatment', data=dataset)
plt.xlabel("Treatment")
plt.ylabel("Count")
plt.title("Mental Health Treatment Status")
plt.show()

##### 1. Why did you pick the specific chart?

- Simple yes/no comparison

##### 2. What is/are the insight(s) found from the chart?

- Almost equal number of people have / have not sought treatment

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – shows awareness

#### Chart - 4 : Family History of Mental Illness

In [None]:
# Chart - 4 visualization code
sns.countplot(x='family_history', data=dataset,color='yellow')
plt.title("Family History of Mental Illness")
plt.xlabel("Family History")
plt.ylabel("Count")
plt.show()

##### 1. Why did you pick the specific chart?

- Categorical frequency

##### 2. What is/are the insight(s) found from the chart?

- Significant family history presence

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – early intervention possible

#### Chart - 5 : Family History vs Treatment

In [None]:
# Chart - 5 visualization code
sns.countplot(x='family_history', hue='treatment', data=dataset)
plt.title("Family History vs Treatment")
plt.show()


##### 1. Why did you pick the specific chart?

- Bivariate comparison

##### 2. What is/are the insight(s) found from the chart?

- Family history increases treatment likelihood

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – helps risk-based policies

#### Chart - 6 : Work Interference

In [None]:
# Chart - 6 visualization code
sns.countplot(x='work_interfere', data=dataset)
plt.title("Mental Health Impact on Work")
plt.show()

##### 1. Why did you pick the specific chart?

- Measures productivity impact

##### 2. What is/are the insight(s) found from the chart?

-Sometimes mental health affects work performance

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Negative – productivity loss

#### Chart - 7 : Company Size

In [None]:
# Chart - 7 visualization code
sns.countplot(x='no_employees', data=dataset,color='silver')
plt.title("Company Size Distribution")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

- Organizational segmentation

##### 2. What is/are the insight(s) found from the chart?

- Many from mid-size firms

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – scalable solutions possible

#### Chart - 8 : Remote Work Status

In [None]:
# Chart - 8 visualization code
sns.countplot(x='remote_work', data=dataset,color='pink')
plt.title("Remote Work Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

- Policy analysis

##### 2. What is/are the insight(s) found from the chart?

- Less people work remotely

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Positive – flexible work aids mental health

#### Chart - 9 : Gender vs Treatment

In [None]:
# Chart - 9 visualization code
sns.countplot(x='Gender_cleaned', hue='treatment', data=dataset,color='r')
plt.title("Gender vs Treatment")
plt.show()

##### 1. Why did you pick the specific chart?

- Compare treatment across genders

##### 2. What is/are the insight(s) found from the chart?

- Treatment-seeking varies by gender

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Enables inclusive policies

#### Chart - 10 : Benefits vs Treatment

In [None]:
# Chart - 10 visualization code
sns.countplot(x='benefits', hue='treatment', data=dataset)
plt.title("Benefits vs Treatment")
plt.show()

##### 1. Why did you pick the specific chart?

- To test policy effectiveness

##### 2. What is/are the insight(s) found from the chart?

- Benefits increase treatment access

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Supports HR investment

#### Chart - 11 : Supervisor vs Mental Health Discussion

In [None]:
# Chart - 11 visualization code
sns.countplot(x='supervisor', data=dataset,color='indigo')
plt.title("Comfort Discussing Mental Health with Supervisor")
plt.show()

##### 1. Why did you pick the specific chart?

- Measure openness

##### 2. What is/are the insight(s) found from the chart?

- Employees don't hesitate with supervisors

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Leadership training needed

#### Chart - 12 : Wellness Program vs Seek Help

In [None]:
# Chart - 12 visualization code
sns.countplot(x='wellness_program', hue='seek_help', data=dataset)
plt.title("Wellness Program vs Help Seeking")
plt.show()


##### 1. Why did you pick the specific chart?

- Evaluate program impact

##### 2. What is/are the insight(s) found from the chart?

- Programs encourage help-seeking

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Improves well-being

#### Chart - 13 : Age vs Treatment

In [None]:
# Chart - 13 visualization code
sns.violinplot(x='treatment', y='Age', data=dataset)
plt.title("Age vs Treatment")
plt.show()

##### 1. Why did you pick the specific chart?

- Shows distribution shape

##### 2. What is/are the insight(s) found from the chart?

- Mid-age employees seek treatment more

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Targeted support

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

# Select useful columns
corr_df = dataset[['Age', 'treatment', 'family_history', 'remote_work']].copy()

# Encode categorical values
corr_df['treatment'] = corr_df['treatment'].map({'Yes': 1, 'No': 0})
corr_df['family_history'] = corr_df['family_history'].map({'Yes': 1, 'No': 0})
corr_df['remote_work'] = corr_df['remote_work'].map({'Yes': 1, 'No': 0})

# Correlation heatmap
plt.figure(figsize=(6,4))
sns.heatmap(corr_df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap – Mental Health Factors")
plt.show()

##### 1. Why did you pick the specific chart?

- The correlation heatmap helps identify relationships between mental health factors and treatment behavior

##### 2. What is/are the insight(s) found from the chart?

- Family history shows positive correlation with treatment

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(dataset[['Age', 'treatment']], hue='treatment')
plt.show()

##### 1. Why did you pick the specific chart?

- Multi-feature relationship

##### 2. What is/are the insight(s) found from the chart?

- Age alone not decisive

#### Chart - 16 : Age vs Work Interfere

In [None]:
# Chart - 16 visualization code
plt.scatter(dataset['Age'], dataset.index, s=dataset['Age'], c='g')
plt.xlabel("Age")
plt.title("Bubble Chart: Age Impact")
plt.show()

##### 1. Why did you pick the specific chart?

- Visual weight comparison

##### 2. What is/are the insight(s) found from the chart?
- Mental health affects all ages

#### Chart - 17 : Company Size + Benefits + Treatment

In [None]:
# Chart - 17 visualization code
sns.catplot(x='no_employees', hue='treatment', col='benefits', data=dataset, kind='count',color='red')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?
- Policy comparison across company sizes

##### 2. What is/are the insight(s) found from the chart?
- Larger firms offer better support

#### Chart - 18 : Remote Work + Work Interference + Treatment

In [None]:
# Chart - 18 visualization code
sns.catplot(x='remote_work', hue='work_interfere', col='treatment', data=dataset, kind='count')
plt.show()

##### 1. Why did you pick the specific chart?
- Work mode impact analysis

##### 2. What is/are the insight(s) found from the chart?
- Remote workers still face interference

#### Chart - 19 : Benefits + Care Options + Treatment

In [None]:
# Chart - 19 visualization code
sns.catplot(x='benefits', hue='care_options', col='treatment', data=dataset, kind='count')
plt.show()

##### 1. Why did you pick the specific chart?
- Resource awareness analysis

##### 2. What is/are the insight(s) found from the chart?
- Care awareness boosts treatment

#### Chart - 20 : Wellness Program + Mental Health Consequence + Treatment

In [None]:
# Chart - 20 visualization code
sns.catplot(x='wellness_program', hue='mental_health_consequence', col='treatment', data=dataset, kind='count')
plt.show()

##### 1. Why did you pick the specific chart?
- Measure fear vs support

##### 2. What is/are the insight(s) found from the chart?
- Fear reduces help-seeking

#### Chart - 21 : Country + Mental Health Consequence

In [None]:
top_countries = dataset['Country'].value_counts().head(10).index

plt.figure(figsize=(12,6))
sns.countplot(
    data=dataset[dataset['Country'].isin(top_countries)],
    x='Country',
    hue='mental_health_consequence'
)

plt.xticks(rotation=45, ha='right')
plt.title('Mental Health Condition by Top 10 Countries')
plt.tight_layout()
plt.show()

#### Questions worth exploring:

#### 1. How does the frequency of mental health illness and attitudes towards mental health vary by geographic location?

- Countries with higher respondent counts (e.g., USA, UK) show more “Yes” and “Maybe” responses, indicating greater awareness and reporting.

- Countries with fewer respondents show fewer “Yes” responses, suggesting possible underreporting or higher stigma.

#### 2. What are the strongest predictors of mental health illness or certain attitudes towards mental health in the workplace?

- Geographic location reflects differences in workplace culture, awareness, and openness toward mental health.

- Higher awareness regions are more likely to report mental health consequences, implying that workplace culture and openness are strong predictors.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the insights derived from the mental health EDA, the following actions are recommended to help the client achieve their business objectives:

1. Improve awareness of mental health benefits:
- Many employees are unaware of available mental health resources. The client should clearly communicate benefits and care options through internal portals, onboarding sessions, and regular reminders.

2. Normalize mental health discussions at the workplace:
- Employees show hesitation in discussing mental health with supervisors. Conduct leadership training and awareness programs to reduce stigma and encourage open conversations.

3. Strengthen wellness programs:
- Employees in organizations with wellness programs are more likely to seek help. The client should introduce or enhance structured mental health wellness initiatives such as counseling sessions, workshops, and stress management programs.

4. Provide anonymous and confidential support options:
- Fear of negative consequences discourages treatment-seeking. Ensuring anonymity in mental health services will increase employee trust and participation.

5. Adopt an inclusive mental health strategy:
- Mental health issues affect employees across all age groups and roles. Programs should be inclusive rather than age- or role-specific.

6. Monitor mental health impact on productivity:
- Since mental health interferes with work performance, the client should regularly assess employee well-being and intervene early to prevent productivity loss.

# **Conclusion**

The exploratory data analysis of the Mental Health in Tech Survey highlights that mental health concerns are prevalent across the tech workforce and significantly impact employee productivity and well-being. While awareness of mental health issues exists, gaps remain in workplace support, communication of benefits, and openness in discussing mental health concerns. The analysis shows that factors such as family history and organizational support play a stronger role in treatment-seeking behavior than age alone. Overall, the findings emphasize the need for inclusive mental health policies, improved awareness programs, and a supportive work culture to reduce stigma and promote employee well-being, ultimately contributing to better organizational performance.