# **Project Name**    -
Mental Health Survey EDA Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name-**   Sayali Sanjay Deshmukh

# **Project Summary -**

Mental health has become a critical concern in today’s fast-paced and high-pressure tech industry. Despite increasing awareness, many employees hesitate to seek help due to stigma, lack of organizational support, and fear of negative workplace consequences. This project focuses on analyzing mental health trends and workplace attitudes in the technology sector using the Mental Health in Tech Survey (2014) dataset.

Various univariate, bivariate, and multivariate analyses were performed using Python libraries such as Pandas, Matplotlib, and Seaborn. Visualizations like bar charts, histograms, pair plots, and correlation heatmaps helped in understanding how factors such as family history, employer benefits, anonymity, and work interference influence mental health treatment decisions.

The analysis reveals that employees with a family history of mental illness and those working in organizations that provide mental health benefits and supportive policies are more likely to seek treatment. The study also highlights that mental health issues are often not treated with the same seriousness as physical health issues in workplaces, and fear of negative consequences remains a major barrier to open discussion.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Problem Statement:**
Mental health issues are increasingly common in the tech industry, yet stigma, lack of awareness, and insufficient workplace support prevent employees from seeking help. This project aims to analyze survey data to understand mental health conditions, workplace attitudes, employer support, and factors influencing treatment-seeking behavior among tech professionals.



#### **Define Your Business Objective?**

**Objectives of the Project**

1. Analyze mental health trends in the tech industry

2. Identify factors affecting treatment-seeking behavior

3. Understand employer support systems

4. Compare mental vs physical health perception

5. Create meaningful visualizations and dashboard

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("/content/survey.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
total_rows = len(df)
total_cols = len(df.columns)
print(f"Total rows = {total_rows}, Total columns = {total_cols}")


### Dataset Information

In [None]:
# Dataset Info
print(df.info())

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows_count = df.duplicated().sum()
print(f"\nTotal count of duplicate rows (excluding first occurrences): {duplicate_rows_count}")



from collections import Counter

counts = Counter(df)
print("Counts of duplicate values in the list:")
for item, count in counts.items():
    if count > 1:
        print(f"Item: {item}, Count: {count}")

total_unique_duplicates = len([item for item, count in counts.items() if count > 1])
print(f"\nTotal number of unique duplicated items: {total_unique_duplicates}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
total_missing = df.isnull().sum().sum()
print(f"Total missing values: {total_missing}")

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6)) # Adjust figure size as needed
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?


**Dataset Description**

**Source:**  Mental Health in Tech Survey (2014)

**Records:** ~1200 respondents

**Type:** Categorical + Numerical

**Target Variable (optional ML):** treatment

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns.tolist())

In [None]:
# Dataset Describe
print(df.describe())

### Variables Description


**Timestamp:**          	Date of survey

**Age:**	Age of respondent

**Gender:**	Gender

**Country:**	Country of respondent

**self_employed:**	Self-employed or not

**family_history:**	Family history of mental illness

**treatment:**	Sought mental health treatment

**work_interfere:**	Mental health interferes with work

**no_employees:**	Company size

**remote_work:**	Works remotely

**tech_company:**	Employer is tech company

**benefits:**	Mental health benefits

**care_options:**	Awareness of care options

**wellness_program:**	Wellness program availability

**seek_help:**	Resources for help

**anonymity:**	Anonymity protection

**leave:**	Ease of medical leave

**mental_health_consequence:**	Fear of negative consequences

**phys_health_consequence:**	Physical health consequences

**coworkers:**	Comfortable discussing with coworkers

**supervisor:**	Comfortable discussing with supervisor

**mental_health_interview:**	Discuss in interview

**phys_health_interview:**	Physical health in interview

**mental_vs_physical:**	Mental vs physical seriousness

**obs_consequence**	Observed negative outcomes

**comments**	Additional notes

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_counts = df.nunique()
print("Count of unique values for each variable:")
print(unique_counts)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.info()
df.describe(include='all')


In [None]:
# removal of duplicate rows
df.drop_duplicates(inplace=True)


In [None]:
# Remove unrealistic age values
df = df[(df['Age'] >= 18) & (df['Age'] <= 65)]

In [None]:
# Standardize Gender Values

def clean_gender(gender):
    gender = gender.lower()
    if gender in ['male', 'm', 'man', 'cis male']:
        return 'Male'
    elif gender in ['female', 'f', 'woman', 'cis female']:
        return 'Female'
    else:
        return 'Other'

df['Gender'] = df['Gender'].apply(clean_gender)

In [None]:
# Replace missing categorical values with 'Unknown'
categorical_cols = df.select_dtypes(include='object').columns
df[categorical_cols] = df[categorical_cols].fillna('Unknown')

In [None]:
# Replace missing numerical values with median
numerical_cols = df.select_dtypes(include='int64').columns
df[numerical_cols] = df[numerical_cols].fillna(df[numerical_cols].median())

In [None]:
#Convert Timestamp to Datetime

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

In [None]:
# Age group classification
df['Age_Group'] = pd.cut(
    df['Age'],
    bins=[18, 25, 35, 45, 55, 65],
    labels=['18-25', '26-35', '36-45', '46-55', '56-65']
)

In [None]:
#Encode Target Variable (For ML)
df['treatment'] = df['treatment'].map({'Yes': 1, 'No': 0})

In [None]:
#Final Clean Dataset Check
df.isnull().sum()
df.head()

### What all manipulations have you done and insights you found?

1. Removes duplicates

2. Cleans invalid age values

3. Standardizes gender categories

4. Handles missing values

5. Converts timestamp

6. Creates age groups

7. Prepares data for EDA and ML

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure()
sns.histplot(df['Age'], bins=30, kde=True)
plt.title("Age Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

To visualise age distribution of respondents.

##### 2. What is/are the insight(s) found from the chart?

The age distribution shows that most respondents belong to the 25–40 years age group, indicating that young to mid-career professionals dominate the tech industry. Very few respondents are below 18 or above 60, which validates the decision to remove unrealistic age values during data cleaning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Mental health concerns are most prevalent among working-age professionals in tech.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure()
df['Gender'].value_counts().head(10).plot(kind='bar')
plt.title("Gender Distribution")
plt.show()


##### 1. Why did you pick the specific chart?

To visualise gender distribution of respondents.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a majority of respondents identify as Male, followed by Female, with a smaller proportion identifying as Other. This highlights a gender imbalance in the tech industry and indicates that mental health studies in tech may be influenced by male-dominated participation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Mental health initiatives should consider inclusivity and gender diversity.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure()
sns.countplot(x='treatment', data=df)
plt.title("Mental Health Treatment")
plt.show()


##### 1. Why did you pick the specific chart?

To visualise mental health treatment distribution.

##### 2. What is/are the insight(s) found from the chart?

The visualization shows that a significant number of respondents have sought treatment, while a comparable portion have not. This indicates that mental health issues are common, but treatment-seeking behavior is still divided due to stigma or lack of awareness.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

There is a need to encourage more employees to seek mental health treatment.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure()
sns.countplot(x='family_history', hue='treatment', data=df)
plt.title("Family History vs Treatment")
plt.show()


##### 1. Why did you pick the specific chart?

To visualise family history versus treatment.

##### 2. What is/are the insight(s) found from the chart?

Respondents with a family history of mental illness are more likely to seek treatment compared to those without a family history. This suggests that awareness and prior exposure strongly influence treatment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Family background plays a major role in mental health awareness and help-seeking behavior.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure()
sns.countplot(x='work_interfere', hue='treatment', data=df)
plt.show()


##### 1. Why did you pick the specific chart?

To visualise work interference versus treatment.

##### 2. What is/are the insight(s) found from the chart?

Employees who reported that mental health frequently interferes with their work are more likely to seek treatment. Those who reported no interference are less likely to do so.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Severity of mental health impact on work directly affects treatment-seeking behavior.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure()
sns.countplot(x='benefits', hue='treatment', data=df)
plt.show()


##### 1. Why did you pick the specific chart?

To visualise benefits versus treatment.

##### 2. What is/are the insight(s) found from the chart?

Respondents working in organizations that provide mental health benefits show a higher likelihood of seeking treatment. Employees without benefits tend to avoid or delay treatment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Employer-provided benefits significantly encourage mental health treatment.


#### Chart - 7

In [None]:
# Chart - 7 visualization code
# pairplot
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
encoded_df = df[['Age', 'treatment', 'family_history', 'benefits']]
for col in encoded_df.columns:
    encoded_df[col] = le.fit_transform(encoded_df[col])

sns.pairplot(encoded_df, hue='treatment')
plt.show()

##### 1. Why did you pick the specific chart?

To visualise pair plot of age, family history, benefits, and treatment.

##### 2. What is/are the insight(s) found from the chart?

The pair plot shows clear separation between treatment categories when combined with variables like family history and benefits. Age alone does not strongly differentiate treatment behavior, but combined factors improve understanding.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Mental health treatment is influenced by multiple interacting factors, not age alone.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10,6))
sns.heatmap(encoded_df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

##### 1. Why did you pick the specific chart?

To understand correlation matrix.

##### 2. What is/are the insight(s) found from the chart?

The heatmap shows a strong positive correlation between treatment and family history, as well as moderate correlation with benefits. Age shows weak correlation, indicating it is not a major deciding factor.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Psychological and workplace support factors influence treatment more than demographic factors.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
**1. Introduce Comprehensive Mental Health Benefits**

Organizations should provide clear and accessible mental health benefits, including counseling services and therapy coverage. The analysis shows that employees with access to benefits are more likely to seek treatment, leading to better mental health and improved workplace performance.

**2. Ensure Anonymity and Confidentiality**

Fear of negative consequences is a major barrier to seeking help. Employers should implement anonymous mental health support systems and clearly communicate confidentiality policies to employees to build trust.

**3. Promote Mental Health Awareness Programs**

Regular workshops, seminars, and wellness programs should be conducted to educate employees about mental health issues. Awareness reduces stigma and encourages early intervention.

**4.  Train Managers and Supervisors**

Supervisors play a critical role in employee well-being. Training managers to recognize early signs of mental health issues and respond empathetically can foster a supportive work environment.

**5. Normalize Mental Health Discussions**

Organizations should treat mental health on par with physical health by openly discussing it during wellness initiatives and internal communications. This helps remove stigma and encourages open dialogue.

**6. Implement Flexible Work Policies**

Providing options such as remote work, flexible hours, and mental health leave can reduce work-related stress and improve employee satisfaction and productivity.

**7. Use Data-Driven Monitoring**

Organizations should regularly collect and analyze employee wellness data to identify trends, measure the effectiveness of mental health initiatives, and continuously improve policies.

Answer Here.

# **Conclusion**

This analysis highlights the need for better mental health policies, anonymity, and awareness programs in tech workplaces. Employers who actively support mental health see higher engagement and treatment-seeking behavior.