# **Project Name**    -



##### **Project Type**    - EDA-Mental Health Analysis in the Tech Industry
##### **Contribution**    - Individual


# **Project Summary -**

Mental health has become a critical concern in modern workplaces, especially within the technology industry, where high workloads, long working hours, and constant performance pressure are common. This project focuses on performing a comprehensive Exploratory Data Analysis (EDA) on a mental health survey dataset to understand the factors influencing mental health awareness and treatment-seeking behavior among tech professionals.

The dataset used in this project is survey-based and consists of 1259 responses with 27 variables. Each record represents an individual working in or related to the technology sector. The dataset includes demographic details such as age, gender, and location, along with workplace-related variables like company size, remote work availability, mental health benefits, anonymity, supervisor support, and the impact of mental health on work performance. The key target variable in the dataset is treatment, which indicates whether a respondent has sought mental health treatment.

The first phase of the project involved data understanding and data wrangling. Columns with limited analytical value or excessive missing data, such as timestamps and free-text comments, were removed. Missing values in important columns were handled using logical placeholders to avoid data loss. Inconsistent categorical values, especially in the gender variable, were standardized to improve data quality. Unrealistic age values were filtered out to ensure the dataset reflected a valid working population. These steps made the dataset clean, consistent, and ready for analysis.

The second phase focused on visual analysis following the UBM framework—Univariate, Bivariate, and Multivariate analysis.
In univariate analysis, individual variables were examined to understand their distributions. Charts such as age distribution, gender distribution, treatment frequency, work interference levels, and availability of mental health benefits provided an overview of the dataset. These visualizations revealed that mental health treatment is almost evenly divided between those who have sought help and those who have not, highlighting ongoing stigma or hesitation around mental health care.

Bivariate analysis explored relationships between two variables at a time. Numerical–categorical analysis, such as age versus treatment and age versus work interference, helped understand how mental health impact varies across age groups. Categorical–categorical analysis, including family history versus treatment and benefits versus treatment, revealed strong associations between workplace support and treatment-seeking behavior. Employees with a family history of mental illness or access to mental health benefits were more likely to seek treatment.

In the multivariate analysis, multiple variables were analyzed simultaneously using pair plots and correlation heatmaps. These analyses showed that mental health outcomes are influenced by a combination of personal, workplace, and organizational factors rather than a single variable. Work interference, mental health benefits, and family history emerged as influential factors when analyzed together.

The project concludes that workplace environment and organizational support play a crucial role in mental health treatment decisions. While demographic factors like age and gender provide context, policies such as mental health benefits, anonymity, and supportive leadership have a stronger impact on employee well-being. The insights gained from this analysis can help organizations design better mental health policies, promote awareness, and create a more supportive work culture.

Overall, this project demonstrates how structured data analysis and visualization can be used to uncover meaningful insights from real-world data, supporting informed decision-making in the area of employee mental health.

# **GitHub Link -**

https://github.com/Vivek9017/Mental-Health-Analysis-in-the-Tech-Industry

# **Problem Statement**


**Mental health issues are common in the tech industry, yet many employees hesitate to seek treatment due to workplace stigma and lack of support. This project aims to analyze mental health survey data to understand how demographic and workplace factors influence treatment-seeking behavior. The goal is to provide data-driven insights that can help organizations improve mental health policies and employee well-being.**

#### **Define Your Business Objective?**

The primary business objective of this project is to leverage data-driven analysis to understand mental health trends and treatment-seeking behavior among employees in the technology industry. By analyzing survey data related to demographic characteristics, workplace environment, and organizational support systems, this project aims to identify the key factors that influence employees’ mental health and their willingness to seek professional treatment.

Specifically, the project seeks to evaluate the impact of workplace policies such as mental health benefits, wellness programs, anonymity, remote work options, and supervisor support on employee mental well-being. Understanding these relationships enables organizations to assess whether their existing mental health initiatives are effective or require improvement. Additionally, the project aims to highlight how work-related stress and mental health interference affect productivity and employee engagement.

From a business perspective, poor mental health can lead to increased absenteeism, reduced performance, higher employee turnover, and long-term operational costs. This analysis helps organizations recognize early warning signs and areas of risk, allowing them to proactively design supportive policies that improve employee satisfaction and retention.

Ultimately, the objective is to provide actionable insights that help organizations create healthier, more inclusive, and sustainable work environments. These insights can support leadership and human resource teams in making informed decisions that balance employee well-being with organizational productivity, thereby creating long-term business value.

If you want:

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
url = "https://raw.githubusercontent.com/Vivek9017/Mental-Health-Analysis-in-the-Tech-Industry/main/survey.csv"
dataset = pd.read_csv(url)


### Dataset First View

In [None]:
dataset.head(5)

### Dataset Rows & Columns count

In [None]:
dataset.shape

### Dataset Information

In [None]:
dataset.info()

#### Duplicate Values

In [None]:
dataset.duplicated().sum()

#### Missing Values/Null Values

In [None]:
dataset.isnull().sum()

In [None]:
sns.heatmap(dataset.isnull(), cbar=False)
plt.title("Missing Values Heatmap")
plt.show()

### What did you know about your dataset?

My dataset is a survey-based dataset focused on mental health in the tech industry.It contains 1259 records and 27 columns,where each row represents a tech employees response.The data includes demographic details,workplace conditions,and mental health related factors such as work interference,company support,and treatment seeking behavior.The key target variable is treatment, which shows whether a person has sought mental health treatment or not.Some columns have missing values,especially comments and state,but important fields like Age and treatment are complete. Overall,the dataset is suitable for meaningful analysis and prediction.

## ***2. Understanding Your Variables***

In [None]:
dataset.columns

In [None]:
dataset.describe()

### Variables Description

-Timestamp
Records the date and time when the survey response was submitted.

-Age
Represents the age of the respondent. It helps in analyzing mental health patterns across different age groups.

-Gender
Indicates the gender identity of the respondent, useful for understanding gender-based mental health differences.

-Country
Shows the country where the respondent works, helping in identifying geographical trends.

-state
Specifies the state of the respondent (mainly applicable to U.S. participants).

-self_employed
Indicates whether the respondent is self-employed or works for an organization.

-family_history
Shows whether the respondent has a family history of mental health issues.

-treatment
Indicates whether the respondent has sought mental health treatment.
This is the main target variable of the dataset.

-work_interfere
Describes how often mental health issues interfere with the respondent’s work.

-no_employees
Represents the size of the company in terms of number of employees.

-remote_work
Indicates whether the respondent works remotely or not.

-tech_company
Shows whether the respondent works in a technology-focused company.

-benefits
Indicates whether the employer provides mental health benefits.

-care_options
Shows whether the respondent is aware of mental health care options provided by the employer.

-wellness_program
Indicates whether the company offers mental wellness programs.

-seek_help
Describes how easy it is for the respondent to seek mental health help at work.

-anonymity
Indicates whether employee privacy is protected when seeking mental health treatment.

-leave
Shows how easy it is for employees to take medical leave for mental health reasons.

-mental_health_consequence
Indicates whether discussing mental health issues may have negative consequences at work.

-phys_health_consequence
Indicates whether discussing physical health issues may have negative consequences at work.

-coworkers
Describes the comfort level of discussing mental health issues with coworkers.

-supervisor
Describes the comfort level of discussing mental health issues with a supervisor.

-mental_health_interview
Indicates whether the respondent would discuss mental health issues in a job interview.

-phys_health_interview
Indicates whether the respondent would discuss physical health issues in a job interview.

-mental_vs_physical
Compares how seriously mental health is treated compared to physical health at the workplace.

-obs_consequence
Indicates whether the respondent has observed negative consequences for others who discussed mental health issues.

-comments
Contains optional open-ended feedback from respondents. Many values are missing.

### Check Unique Values for each variable.

In [None]:
for column in dataset.columns:
    print(f"\nColumn Name: {column}")
    print(f"Number of Unique Values: {dataset[column].nunique()}")


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
dataset_clean = dataset.copy()

dataset_clean.drop(columns=['comments', 'Timestamp'], inplace=True)
dataset_clean['state'].fillna('Not Available', inplace=True)

dataset_clean['work_interfere'].fillna('Unknown', inplace=True)

dataset_clean['self_employed'].fillna('Unknown', inplace=True)

dataset_clean['Gender'] = dataset_clean['Gender'].str.lower()

dataset_clean['Gender'] = dataset_clean['Gender'].replace({
    'male': 'Male',
    'm': 'Male',
    'female': 'Female',
    'f': 'Female'
})

# Any other gender values are grouped as 'Other'
dataset_clean['Gender'] = dataset_clean['Gender'].apply(
    lambda x: x if x in ['Male', 'Female'] else 'Other'
)

dataset_clean = dataset_clean[(dataset_clean['Age'] >= 18) & (dataset_clean['Age'] <= 65)]

dataset_clean.reset_index(drop=True, inplace=True)
dataset_clean.head(5)

### What all manipulations have you done and insights you found?

I cleaned the dataset by removing unnecessary columns like Timestamp and comments, handled missing values in important fields, standardized inconsistent Gender values, and removed unrealistic Age entries. These steps made the data consistent and analysis-ready.

From the cleaned data, I observed that mental health treatment is almost equally split between those who sought help and those who did not. Workplace factors such as mental health benefits, anonymity, and supervisor support strongly influence whether employees seek treatment, while hesitation to discuss mental health at work remains a major concern.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1  Age Distribution of Respondents

In [None]:
# Age Distribution
plt.figure(figsize=(8,5))
sns.histplot(dataset_clean['Age'], bins=40, kde=True)
plt.title("Age Distribution of Respondents")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

I chose a histogram because it is suitable for showing the distribution of a numerical variable like age. It helps in understanding how respondents are spread across different age groups and also shows the overall trend clearly.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals the age range where most respondents belong and shows whether the data is concentrated around certain age groups. It also highlights age groups with very low representation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help businesses focus on the most active age group and design products or marketing strategies accordingly. However, low participation from some age groups may indicate missed market segments, which could lead to limited growth if not addressed.

The chart may also reveal low participation from certain age groups (for example, older or very young users).

#### Chart - 2  Gender Distribution

In [None]:

plt.figure(figsize=(6,4))
sns.countplot(x='Gender', data=dataset_clean)
plt.title("Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

I selected a count plot because gender is a categorical variable, and this chart is the best way to compare the number of respondents in each gender category. It gives a simple and clear view of how the data is distributed between different genders.

##### 2. What is/are the insight(s) found from the chart?

The chart shows the gender-wise distribution of respondents and highlights which gender has higher participation. It helps identify whether the dataset is balanced or skewed toward a particular gender.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help businesses design gender-specific strategies, such as targeted marketing, personalized services, or inclusive product features. Understanding the dominant gender group helps improve engagement and customer satisfaction.

However, if one gender is underrepresented, it may indicate limited outreach or lack of inclusivity. Ignoring this imbalance could result in missed opportunities and restricted growth in certain market segments.

#### Chart - 3   Treatment Distribution

In [None]:

plt.figure(figsize=(6,4))
sns.countplot(x='treatment', data=dataset_clean)
plt.title("Treatment Distribution")
plt.xlabel("Treatment Taken")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

I chose a count plot because the treatment variable is categorical (Yes/No). This chart clearly shows how many respondents have taken treatment versus those who have not, making comparison simple and easy to understand.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals the proportion of respondents who have taken mental health treatment compared to those who have not. It highlights whether treatment acceptance is high or low among respondents and helps understand overall awareness or willingness to seek treatment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help organizations and businesses understand mental health support needs. If many respondents have taken treatment, it indicates awareness and openness, allowing businesses to invest more in support programs or wellness services.

However, if a large portion of respondents have not taken treatment, it may indicate stigma, lack of access, or poor awareness. Ignoring this insight could lead to reduced employee well-being and productivity, which may negatively impact long-term growth.

#### Chart - 4  Family History Distribution

In [None]:

plt.figure(figsize=(6,4))
sns.countplot(x='family_history', data=dataset_clean)
plt.title("Family History of Mental Illness")
plt.xlabel("Family History")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

I selected a count plot because family_history is a categorical variable. This chart is effective for comparing the number of respondents with and without a family history of mental illness, making the distribution easy to understand.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how many respondents have a family history of mental illness versus those who do not. It helps identify whether mental health issues are common within families and how significant this factor is among respondents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help organizations recognize the importance of preventive mental health support. Employees with a family history may be more vulnerable, so businesses can introduce awareness programs, counseling services, and early-intervention initiatives.

On the other hand, if a large number of respondents report a family history and the organization fails to provide proper support, it could lead to higher stress, absenteeism, and reduced productivity, which may negatively affect business growth.

#### Chart - 5  Company Size Distribution

In [None]:

plt.figure(figsize=(10,5))
sns.countplot(y='no_employees', data=dataset_clean)
plt.title("Company Size Distribution")
plt.xlabel("Count")
plt.ylabel("Company Size")
plt.show()


##### 1. Why did you pick the specific chart?

I chose a horizontal count plot because company size is a categorical variable with many levels. A horizontal layout makes the category labels easier to read and allows clear comparison of how many respondents belong to each company size group.

##### 2. What is/are the insight(s) found from the chart?

The chart shows which company size categories have the highest and lowest number of respondents. It helps understand whether the dataset is dominated by employees from small, medium, or large organizations and highlights any imbalance across company sizes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help businesses customize mental health policies and workplace programs based on company size. For example, larger companies may need structured mental health programs, while smaller companies may benefit from flexible support initiatives.

However, if certain company sizes are underrepresented, decisions based only on this data may not apply well to those organizations. Ignoring this imbalance could lead to ineffective strategies and limited impact across different business scales.

#### Chart -6  Age vs Treatment

In [None]:


plt.figure(figsize=(8,5))
sns.boxplot(x='treatment', y='Age', data=dataset_clean)
plt.title("Age vs Treatment")
plt.show()


##### 1. Why did you pick the specific chart?

I chose a box plot because it is suitable for comparing a numerical variable (Age) across a categorical variable (Treatment: Yes/No). This chart helps visualize differences in age distribution, median age, and variability between people who have taken treatment and those who have not.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how age varies between respondents who have taken treatment and those who have not. It highlights differences in median age, spread of age values, and the presence of any outliers. This helps understand whether certain age groups are more likely to seek treatment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help organizations design age-specific mental health initiatives. If treatment is more common in certain age groups, awareness and support programs can be targeted more effectively.

However, if younger or older age groups are less likely to seek treatment, it may indicate lack of awareness or accessibility. Ignoring this insight could result in untreated mental health issues, leading to reduced productivity and higher long-term business costs.

#### Chart - 7    Age vs Work Interfere

In [None]:


plt.figure(figsize=(8,5))
sns.boxplot(x='work_interfere', y='Age', data=dataset_clean)
plt.title("Age vs Work Interference")
plt.show()



##### 1. Why did you pick the specific chart?

I chose a box plot because it is ideal for comparing a numerical variable (Age) across multiple categories of work interference. This chart clearly shows differences in age distribution, medians, and variability for each interference level.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights how work interference due to mental health varies across different age groups. It shows whether certain age ranges experience higher or lower levels of work interference and helps identify any age-related patterns or outliers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help organizations design age-appropriate workplace support strategies, such as flexible schedules or targeted wellness programs, to reduce work interference and improve productivity.

However, if high work interference is observed consistently across certain age groups and remains unaddressed, it could lead to burnout, absenteeism, and reduced performance, which may negatively affect overall business growth.

#### Chart - 8  Gender vs Treatment

In [None]:


plt.figure(figsize=(6,4))
sns.countplot(x='Gender', hue='treatment', data=dataset_clean)
plt.title("Gender vs Treatment")
plt.show()



##### 1. Why did you pick the specific chart?

I chose a count plot with a hue because both Gender and Treatment are categorical variables. Using color (hue) allows a clear comparison of how treatment status differs across genders in a single, easy-to-read chart

##### 2. What is/are the insight(s) found from the chart?

The chart shows how treatment-seeking behavior varies by gender. It highlights which gender is more likely to take treatment and whether there is a noticeable gap in treatment uptake between genders.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help organizations design gender-sensitive mental health initiatives. Understanding differences in treatment uptake allows businesses to tailor awareness programs and support systems to encourage treatment where it is less common.

However, if one gender shows consistently lower treatment rates and the issue is ignored, it could lead to untreated mental health problems, reduced employee well-being, and long-term productivity losses, negatively impacting business growth.

#### Chart - 9   Family History vs Treatment

In [None]:

plt.figure(figsize=(6,4))
sns.countplot(x='family_history', hue='treatment', data=dataset_clean)
plt.title("Family History vs Treatment")
plt.show()



##### 1. Why did you pick the specific chart?

I chose a count plot with a hue because both family_history and treatment are categorical variables. Using color (hue) makes it easy to compare treatment status between respondents with and without a family history of mental illness in a single visualization.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how treatment-seeking behavior differs based on family history. It highlights whether individuals with a family history of mental illness are more likely to seek treatment compared to those without such a history.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help organizations understand the role of genetic or familial factors in mental health awareness. Businesses can use this information to design proactive mental health programs and early support systems for employees who may be at higher risk.

However, if employees with a family history are not adequately supported or encouraged to seek treatment, it may result in higher stress levels, increased absenteeism, and lower productivity, which can negatively affect business performance and growth.

#### Chart - 10  Age + Gender + Treatment

In [None]:

plt.figure(figsize=(8,5))
sns.violinplot(x='Gender', y='Age', hue='treatment', data=dataset_clean, split=True)
plt.title("Age, Gender and Treatment")
plt.show()


##### 1. Why did you pick the specific chart?

I chose a violin plot because it allows comparison of a numerical variable (Age) across two categorical variables (Gender and Treatment) at the same time. This chart not only shows central values but also the distribution and density of age for each group, making it more informative than a simple box plot.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how age distribution differs by gender and treatment status. It highlights which age ranges within each gender are more likely to have taken treatment and where the age density is higher. This helps in identifying patterns such as whether treatment is more common among certain age groups for males or females.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help organizations design targeted mental health programs by considering both age and gender differences. Tailored interventions can improve treatment awareness, employee well-being, and overall productivity.

However, if the chart shows that certain age-gender groups are less likely to seek treatment and this gap is ignored, it could result in unaddressed mental health issues, leading to reduced performance and higher long-term business costs.

#### Chart - 11   Work Interfere + Benefits + Treatment

In [None]:

plt.figure(figsize=(8,4))
sns.countplot(x='work_interfere', hue='treatment', data=dataset_clean)
plt.title("Work Interference, Benefits and Treatment")
plt.show()



##### 1. Why did you pick the specific chart?

I chose a count plot with hue because both work_interfere and treatment are categorical variables. Using hue allows me to compare how treatment status differs across various levels of work interference in a single, clear visualization.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how treatment-seeking behavior changes with the level of work interference. It highlights whether people whose mental health interferes more with work are more likely to seek treatment, and which interference levels have lower or higher treatment uptake.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help organizations identify employees whose work performance is affected by mental health issues and encourage timely treatment or support. Providing early intervention can improve productivity and employee well-being.

However, if high work interference is observed but treatment rates remain low, it indicates a gap in awareness or support systems. Ignoring this issue may lead to continued productivity loss, burnout, and higher absenteeism, negatively impacting business growth.

#### Chart - 12   Company Size + Benefits + Treatment

In [None]:


plt.figure(figsize=(10,5))
sns.countplot(y='no_employees', hue='benefits', data=dataset_clean)
plt.title("Company Size, Benefits and Treatment")
plt.show()



##### 1. Why did you pick the specific chart?

I chose a horizontal count plot with hue because company size (no_employees) and benefits are categorical variables, and there are multiple company size categories. A horizontal layout makes the labels easier to read, while the hue helps compare benefit availability across different company sizes.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how mental health benefits availability varies across different company sizes. It highlights whether larger organizations are more likely to provide benefits compared to smaller ones and identifies company sizes where benefits are limited or inconsistent.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help businesses understand how organizational scale affects employee support policies. Companies can use this information to improve benefit coverage, especially in company sizes where it is lacking, leading to better employee well-being and retention.

However, if smaller or mid-sized companies consistently lack mental health benefits and this issue is not addressed, it may result in lower employee satisfaction, higher turnover, and difficulty attracting talent, which can negatively affect business growth.

#### Chart - 13  Country + Benefits + Treatment

In [None]:


top_countries = dataset_clean['Country'].value_counts().head(10).index
country_df = dataset_clean[dataset_clean['Country'].isin(top_countries)]

plt.figure(figsize=(10,5))
sns.countplot(x='Country', hue='benefits', data=country_df)
plt.title("Country, Benefits and Treatment (Top 10 Countries)")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a count plot with hue because Country and benefits are categorical variables, and the goal is to compare mental health benefit availability across countries. Limiting the visualization to the top 10 countries keeps the chart readable and focused on the most represented regions.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how mental health benefits availability differs across countries. It highlights which countries provide better mental health support and which countries have a lower availability of benefits among respondents. Differences across countries suggest variation in workplace policies and cultural or regulatory approaches.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help multinational organizations benchmark their mental health policies against country-level trends. Businesses can strengthen benefits in regions where support is limited, improving employee well-being and global consistency.

However, countries with consistently low benefit availability may face higher employee stress and lower engagement. If companies ignore these regional gaps, it can lead to reduced productivity, higher turnover, and challenges in attracting talent, negatively affecting long-term growth.

#### Chart - 14 -   Correlation Heatmap

In [None]:

corr_df = dataset_clean.copy()

for col in corr_df.select_dtypes(include='object').columns:
    corr_df[col] = corr_df[col].astype('category').cat.codes


correlation_matrix = corr_df.corr()

plt.figure(figsize=(16, 10))
sns.heatmap(
    correlation_matrix,
    cmap='coolwarm',
    linewidths=0.5,
    annot=False
)

plt.title("Correlation Heatmap of All Variables", fontsize=16)
plt.tight_layout()

plt.show()


##### 1. Why did you pick the specific chart?

I chose a correlation heatmap because it is the most effective way to analyze relationships between multiple variables at once. Since the dataset contains both numerical and categorical variables, converting categorical data into numeric codes allows us to measure how strongly variables are related to each other. The heatmap makes these relationships easy to interpret visually using colors.

##### 2. What is/are the insight(s) found from the chart?

The heatmap shows the strength and direction of relationships between all variables. Strong positive or negative correlations indicate variables that tend to change together, while weak correlations suggest little or no relationship. This helps identify which factors may influence each other, such as relationships between age, treatment, work interference, benefits, and family history.

#### Chart - 15 - Company Size vs Treatment

In [None]:

pairplot_cols = [
    'Age',
    'treatment',
    'work_interfere',
    'benefits',
    'family_history'
]

sns.pairplot(
    dataset_clean[pairplot_cols],
    hue='treatment',
    diag_kind='kde'
)



plt.show()



##### 1. Why did you pick the specific chart?

I chose a pair plot because it allows visualization of multiple relationships at the same time. The selected variables include one numerical variable (Age) and key categorical factors related to mental health (treatment, work_interfere, benefits, family_history). Using a pair plot with treatment as the hue helps compare patterns between those who have taken treatment and those who have not.

##### 2. What is/are the insight(s) found from the chart?

The pair plot shows how age interacts with treatment, work interference, benefits, and family history. It helps identify visible patterns or clustering between treated and non-treated groups. The diagonal KDE plots show the distribution of each variable, while the scatter plots reveal whether any clear relationships or separations exist among variables.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis and visualizations, the client should focus on improving mental health awareness, accessibility, and support within the organization to achieve the business objective of better employee well-being and productivity.

The data shows that mental health directly affects work performance, and factors such as lack of benefits, limited treatment uptake, and high work interference can negatively impact productivity and employee retention. Therefore, the client should introduce or strengthen mental health benefits, promote a supportive and stigma-free work culture, and provide easy access to counseling or treatment services.

Additionally, targeted interventions should be designed by considering age, gender, company size, family history, and regional differences. This data-driven approach will help reduce work interference, improve employee satisfaction, and ultimately lead to higher productivity, lower attrition, and sustainable business growth.

# **Conclusion**

This analysis provides a clear understanding of how mental health factors influence employees and workplace performance. The visualizations show that variables such as age, gender, family history, work interference, treatment, and availability of benefits are closely related to employee well-being. Mental health issues that interfere with work can reduce productivity, while access to treatment and benefits plays a crucial role in improving employee outcomes.

Overall, the study highlights the importance of proactive mental health support in organizations. By addressing gaps in awareness, improving access to benefits, and creating a supportive work environment, businesses can enhance employee satisfaction, reduce productivity loss, and achieve long-term sustainable growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***