# **EDA_Mental Health Analysis**, **By Amit Kharche**
**Follow me** on [Linkedin](https://www.linkedin.com/in/amit-kharche) and [Medium](https://medium.com/@amitkharche14) for more insights on **Data Science** and **AI**

---
# **Table of Contents**
---

**1.** [**Introduction**](#Section1)<br>
**2.** [**Problem Statement**](#Section2)<br>
**3.** [**Installing & Importing Libraries**](#Section3)<br>
**4.** [**Data Acquisition & Description**](#Section4)<br>
**5.** [**Data Pre-processing**](#Section5)<br>
**6.** [**Exploratory Data Analysis**](#Section6)<br>
  - **6.1** [**In what ways does age influence individuals' behaviors and their understanding of their employer’s stance on mental health?**](#Section61)
  - **6.2** [**What does the age feature's distribution look like in terms of density?**](#Section62)
  - **6.3** [**How does the ratio of individuals receiving treatment vary across different genders?**](#Section63)
  - **6.4** [**What is the proportional relationship between receiving treatment and experiencing work interference?**](#Section64)
  - **6.5** [**How is the likelihood of seeking treatment connected to age?**](#Section65)
  - **6.6** [**Is there a link between an employee’s family history of mental health issues and their decision to seek treatment?**](#Section66)
  - **6.7** [**How does the number of employees in a company relate to the treatment-seeking behavior?**](#Section67)
  - **6.8** [**Which countries are the most prominent contributors to mental health data?**](#Section68)
  - **6.9** [**Which U.S. states have the highest representation in mental health-related responses?**](#Section69)
  - **6.10** [**Among all countries, which three have the highest contribution to mental health data?**](#Section610)
  - **6.11** [**What is the distribution of work interference levels among employees in the top three contributing countries?**](#Section611)
  - **6.12** [**How many individuals from the top three countries have reported undergoing treatment?**](#Section612)
  - **6.13** [**What is the gender-wise breakdown of individuals seeking treatment in the top three countries?**](#Section613)
  - **6.14** [**How are perceived consequences of mental health issues related to general attitudes toward them?**](#Section614)

**7.** [**Summarization**](#Section7)<br>

---
<a name = Section1></a>
# **1. Introduction**
---

- For many of us, **work** is a **major part** of our **lives** where we spend much of our time to **make income** and **friends**.

- Having a **fulfilling job** can be **good** for your **mental health** and general wellbeing.

<center><img width=30% src="https://aihms.in/blog/wp-content/uploads/2020/08/mental1.jpg"></center>

- Mental health is the way we **think** and **feel** and our ability to **deal** with **ups and downs** and it is something we all have.

- When we enjoy good mental health, we have a **sense of purpose** and **direction**, the energy to do the things we want to do.

- If you enjoy good mental health, you can:
  - make the most of your potential

  - cope with what life throws at you
  
  - play a full part in your relationships, your workplace, and your community.

- Your **mental health always fluctuates** as circumstances change as you move through different stages in your life.

- **Distress** is a word used to describe times when a **person isn’t coping** – for whatever reason.

- It could be something at home, the pressure of work, or the start of a mental health problem like depression.

- When we feel distressed, we **need a compassionate**, human response.

---
<a name = Section2></a>
# **2. Problem Statement**
---

- We all have times when we feel down, stressed or frightened.

- Most of the time those feelings pass, but **sometimes** they develop into a mental health problem like **anxiety** or **depression**.

- It **impacts our daily lives** but for some people, mental health problems become complex and **require support** and **treatment** for life.

- Factors like poverty, childhood trauma, discrimination, etc. make it more likely that we will develop mental health problems.

- Different mental health problems **affect** people **in different ways** and it’s **key** to **understand** an **individual’s experience**.

<center><img width=60% src="https://www.uni-jena.de/unijenamedia/371096/241111-mhad-2024.avif?height=428&width=760"></center>


**<h4>Scenario</h4>**

- <a href="https://osmihelp.org/">**OSMI**</a>, an organization is working to **help people** to **identify** and **overcome mental health disorders** while working in a tech space.

- They **perform surveys** to **measure attitudes** towards mental health in the tech workplace.

- They **create detailed guides** on how to make the **workplace safe** and **supportive** for mentally stressed people.

- Using these surveys they **examine** the **frequency** of **mental health disorders** among tech workers.

- Also, they help workspaces to **identify** the **best resources** to **support** their **employees**.

- Checkout <a href="https://www.youtube.com/watch?v=NHulgcO_16U&list=PL1MEC8mwrpaIdzYKRidvNB5eYSwWrqFZ3">**Talks at Google**</a> to get better clarity about Mental Health in the Tech Industry.


---
<a name = Section3></a>
# **3. Installing & Importing Libraries**
---

<a name = Section31></a>
### **3.1 Installing Libraries**

In [None]:
!pip install -q datascience            # Package that is required by pandas profiling
!pip install -q pandas-profiling       # Library to generate basic statistics about data

<a name = Section32></a>
### **3.2 Upgrading Libraries**

- **After upgrading** the libraries, you need to **restart the runtime** to make the libraries in sync.

- Make sure not to execute the cell above (3.1) and below (3.2) again after restarting the runtime.

In [None]:
!pip install -q --upgrade pandas-profiling
!pip install -q --upgrade yellowbrick

<a name = Section33></a>
### **3.3 Importing Libraries**

In [None]:
#-------------------------------------------------------------------------------------------------------------------------------
import pandas as pd # Importing for panel data analysis
import numpy as np
pd.set_option('display.max_columns', None) # Unfolding hidden features if the cardinality is high
pd.set_option('display.max_colwidth', None)# Unfolding the max feature width for better clearity
pd.set_option('display.max_rows', None)# Unfolding hidden data points if the cardinality is high
pd.set_option('mode.chained_assignment', None)# Removing restriction over chained assignments operations
pd.set_option('display.float_format', lambda x: '%.5f' % x)
# To suppress scientific notation over exponential values
#-------------------------------------------------------------------------------------------------------------------------------
from collections import Counter    # For counting hashable objects
#-------------------------------------------------------------------------------------------------------------------------------
import matplotlib.pyplot as plt   # Importing pyplot interface using matplotlib
import plotly.graph_objs as go    # For Plotly interfaced graphs
import seaborn as sns             # Importin seaborm library for interactive visualization
%matplotlib inline
#-------------------------------------------------------------------------------------------------------------------------------
import warnings                     # Importing warning to disable runtime warnings
warnings.filterwarnings("ignore")   # Warnings will appear only once

---
<a name = Section4></a>
# **4. Data Acquisition & Description**
---

- This dataset is obtained from a survey in 2014.

- It describes the attitudes towards mental health and frequency of mental health disorders in the tech workplace.

| Records | Features | Dataset Size |
| :-- | :-- | :-- |
| 1259 | 27 | 296 KB|


| Id | Features | Description |
| :-- | :--| :--|
|01|**Timestamp**|Time the survey was submitted.|
|02|**Age**|The age of the person.|
|03|**Gender**|The gender of the person.|
|04|**Country**|The country name where person belongs to.|
|05|**state**|The state name where person belongs to.|
|06|**self_employed**|Is the person self employed or not.|
|07|**family_history**|Does the person's family history had mental illness or not?|
|08|**treatment**|Have you sought treatment for a mental health condition?|
|09|**work_intefere**|If you have a mental health condition, do you feel that it interferes with your work?|
|10|**no_employees**|How many employees does your company or organization have?|
|11|**remote_work**|Do you work remotely (outside of an office) at least 50% of the time?|
|12|**tech_company**|Is your employer primarily a tech company/organization?|
|13|**benifits**|Does your employer provide mental health benefits?|
|14|**care_options**|Do you know the options for mental health care your employer provides?|
|15|**wellness_program**|Has your employer ever discussed mental health as part of an employee wellness program?|
|16|**seek_help**|Does your employer provide resources to learn more about mental health issues and how to seek help?|
|17|**anonymity**|Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?|
|18|**leave**|How easy is it for you to take medical leave for a mental health condition?|
|19|**mental_health_consequence**|Do you think that discussing a mental health issue with your employer would have negative consequences?|
|20|**phy_health_consequence**|Do you think that discussing a physical health issue with your employer would have negative consequences?|
|21|**coworkers**|Would you be willing to discuss a mental health issue with your coworkers?|
|22|**supervisor**|Would you be willing to discuss a mental health issue with your direct supervisor(s)?|
|23|**mental_health_interview**|Would you bring up a mental health issue with a potential employer in an interview?|
|24|**phs_health_interivew**|Would you bring up a physical health issue with a potential employer in an interview?|
|25|**mental_vs_physical**|Do you feel that your employer takes mental health as seriously as physical health?|
|26|**obs_consequence**|Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?|
|27|**comments**|Any additional notes or comments.|

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/amitkharche/exploratory_data_analysis_projects_amit_kharche/refs/heads/main/01.EDA_mental_health_analysis_amit_kharche/survey_mental_health.csv')
print('Data Shape:', data.shape)
data.head()

<a name = Section41></a>
### **4.1 Data Description**

- In this section we will get **information about the data** and see some observations.

In [None]:
data.describe()

**Observations:**

- The **average age** of the person is found to be **79428148 years** and it is **absurd**.

- Around **25%** of people have an **age** less than or equal to **27 years**.

- Around **50%** of people have an **age** less than or equal to **31 years**.

- Around **75%** of people have an **age** less than or equal to **36 years**.

- The **minimum** and **maximum ages** are found to be **negative** and **very large numbers**.

- It implies that there is **something wrong** with our **data**.

<a name = Section42></a>
### **4.2 Data Information**

- In this section we will see the **information about the types of features**.

In [None]:
data.info(verbose=True, memory_usage='deep')

**Observations:**

- At high level, we can observe that there are **missing data** in our dataset.

- The **timestamp** is found to have **inconsistent data type**, requires rectification.

<a name = Section5></a>

---
# **5. Data Pre-Processing**
---

<a name = Section51></a>
### **5.1 Handling of Missing Data**

- In this section, we will identify missing data and check the proportion of it and take appropriate measures.

In [None]:
null_frame = pd.DataFrame(index = data.columns.values)
null_frame['Null Frequency'] = data.isnull().sum().values
percent = data.isnull().sum().values/data.shape[0]
null_frame['Missing %age'] = np.round(percent, decimals = 4) * 100
null_frame.transpose()

**Observations:**

- We can observe that following features are found to have missing values along with the proportions:

|Feature|Object Type|Missing Proportion|Solution|
|:--:|:--:|:--:|:--|
|state|Object|40%|Replace with mode.|
|self_employed|Object|1.43%|Replace with mode.|
|work_interfere|Object|20.97%|Replace with mode.|
|comments|Object|86.97%|Drop the feature.|

**Performing Operations**

In [None]:
data['state'] = data['state'].replace(np.nan, data['state'].mode()[0])
data['self_employed'] = data['self_employed'].replace(np.nan, data['self_employed'].mode()[0])
data['work_interfere'] = data['work_interfere'].replace(np.nan, data['work_interfere'].mode()[0])
data.drop(labels='comments', axis=1, inplace=True)

- Let's verify the integrity of null values again.

In [None]:
null_frame = pd.DataFrame(index = data.columns.values)
null_frame['Null Frequency'] = data.isnull().sum().values
percent = data.isnull().sum().values/data.shape[0]
null_frame['Missing %age'] = np.round(percent, decimals = 4) * 100
null_frame.transpose()

<a name = Section52></a>
### **5.2 Handling of Redundant Data**

- In this section, we will identify redundant data and check the proportion of it and take appropriate measures.

In [None]:
print('Contains Duplicate Rows?', data.duplicated().any())

<a name = Section53></a>
### **5.3 Handling of Inconsistent Data**

- In this section, we will **identify inconsistency** in data and and then **take appropriate measures**.

- Previously, we observed that **Timestamp** feature was **incorrectly indentified** as Object, so, we will rectify it.

- Additionally, we can observe that some features are having data of different format such as Gender.

In [None]:
data['Timestamp'].unique()

In [None]:
# Transforming Timestamp's object type to datetime
data['Timestamp'] = pd.to_datetime(data['Timestamp'])

In [None]:
# Learn more about the variable Gender, which appears not to be standardized with 49 distinct responses.
print("Distinct responses for Gender (Frequency):", len(set(data['Gender'])))
print("Distinct responses for Gender:", set(data['Gender']))

In [None]:
data['Gender'].str.lower().unique()

In [None]:
unique_gender = data['Gender'].str.lower().unique()

# Stratas of Gender category
male_str = ["male", "m", "male-ish", "maile", "mal", "male (cis)", "make",
            "male ", "man","msle", "mail", "malr","cis man", "Cis Male",
            "cis male"]
trans_str = ["trans-female", "something kinda male?", "queer/she/they",
             "non-binary","nah", "all", "enby", "fluid", "genderqueer",
             "androgyne", "agender", "male leaning androgynous", "guy (-ish) ^_^",
             "trans woman", "neuter", "female (trans)", "queer",
             "ostensibly male, unsure what that really means"]
female_str = ["cis female", "f", "female", "woman",  "femake", "female ",
              "cis-female/femme", "female (cis)", "femail"]

# Iterate over rows and replace the inconsistent data with right data
for (row, col) in data.iterrows():
  if str.lower(col['Gender']) in male_str:
    data['Gender'].replace(to_replace=col['Gender'], value='male', inplace=True)
  if str.lower(col['Gender']) in female_str:
    data['Gender'].replace(to_replace=col['Gender'], value='female', inplace=True)
  if str.lower(col['Gender']) in trans_str:
    data['Gender'].replace(to_replace=col['Gender'], value='trans', inplace=True)

# Remove rest of the unnecessary text
stk_list = ['A little about you', 'p']
data = data[~data['Gender'].isin(stk_list)]

# Display the unique value of Gender feature
print(data['Gender'].unique())

**Observation:**

- Now, we **handled inconsistency** of data **manually** for **one feature**, but it would be **impossible** when you have **hundreds of features**.

- In that case, we can **use interactive plots** like plotly to know all the possible values in each feature.

- Next, we will **identify** all the categorical features and render a bar plot to identify the **present values**.

- If we find any inconsistency in the feature, then we will take appropriate measures.

**Note:**

- The **approach followed down** for basic data analysis is **not mandatory**.

- You can **also go feature by feature** and analyze the data to understand the underlying face of data.

- To **make our life easier**, we will be **utilizing a small hack**.


In [None]:
# Initializing empty list
cat_features = []

# Appending all  the categorical features to empty list
for i in data.columns:
  if data[i].dtype == 'object':
    cat_features.append(i)

# Display the categorical features
print(cat_features)

In [None]:
# Initiating a plotly figure
fig = go.Figure()

# Adding first graph of Gender
fig.add_bar(x=data[cat_features[0]], y=data[cat_features[0]].index)

# Adding a button to select different features
button = [dict(method = 'restyle',
               args = [{'x': [data[cat_features[k]], 'undefined'],
                        'y': [data[cat_features[k]].index, 'undefined'],
                        'visible':[True, False]}],
               label = cat_features[k])   for k in range(0, len(cat_features))]

# Updating the layout of the graph
fig.update_layout(title_text='Frequency Distribution of Feature Values',
                  title_x=0.4,
                  width=1000,
                  height=450,
                  updatemenus=[dict(active=0,
                                    buttons=button,
                                    x=1.15,
                                    y=1,
                                    xanchor='left',
                                    yanchor='top')])

# Adding extra annotaions alongside the button
fig.add_annotation(x=1.03,
                   y=0.97,
                   xref='paper',
                   yref='paper',
                   showarrow=False,
                   xanchor='left',
                   yanchor = 'top',
                   text='Feature')

# Display the graph
fig.show()

**Observations:**

- By interacting with the above figure we can safely conclude that rest all **other features** are **having correct values**.

<a name = Section54></a>
### **5.4 Handling of Outliers**

- Next, if you remember our **age** feature was showing us some **absurd numbers** like 329, 999999999999, -1729.

- These are **outliers** and we will **perform capping** over these values such as:
  - All value above 75 will be capped to 75 (on **average 65** is the **retirement** but taking extra buffer).
  - All value below 14 will be capped to 14 (on **average 14** is the **minimum age** for **employment**).

- As a general rule, the FLSA sets 14 years old as the minimum age for employment under limited number of hours.

In [None]:
data['Age'][data['Age'] > 75] = 75
data['Age'][data['Age'] < 14] = 14

In [None]:
data['Age'].describe()

**Observation:**

- Now that we have successfully cleansed our data we are good to go with exploring our data and finding insights.

<a name = Section6></a>

---
# **6. Exploratory Data Analysis**
---

In [None]:
data.head()

<a name = Section61></a>
**<h4>Question:**  In what ways does age influence individuals' behaviors and their understanding of their employer’s stance on mental health?</h4>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Initialize figure of size 15 X 7
fig = plt.figure(figsize=(15, 7))

# Plot countplot of age concerning treatment
sns.countplot(x='Age', hue='treatment', data=data)

# Add some cosmetics
plt.title(label='Age vs Treatment', size=16)
plt.xlabel(xlabel='Age', size=14)
plt.ylabel(ylabel='Frequency', size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)  # This is the correct way to enable grid

# Display the plot
plt.show()


**Observation:**

- We can observe that people of **age 14 - age 22** are **mildly conscious** while people of **age 23 - age 37** are **less conscious** for **treatment**.

- People of **age 38 and above** are **highly conscious** and are **up for treatment**.

<a name = Section62></a>
**<h4>Question:** What does the age feature's distribution look like in terms of density?</h4>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Initialize figure of size 15 x 7
plt.figure(figsize=(15, 7))

# Plot density distribution of age using histplot with KDE
sns.histplot(data['Age'], kde=True, bins=30, color='skyblue')


# Add cosmetics
plt.title('Density Distribution of Age', fontsize=16)
plt.xlabel('Age', fontsize=14)
plt.ylabel('Density', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True)

# Display the plot
plt.show()

**Observation:**

- We can **observe a peak** between **mid-20s to about mid-30s**.

- This implies that **majority of people** are from **mid 20s to mid 30s**.

<a name = Section63></a>
**<h4>Question:** How does the ratio of individuals receiving treatment vary across different genders?</h4>

In [None]:
# Instantiate figure and axes object
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 8))

# Initiate a list of gender values and index
gender = list(data['Gender'].unique())
custom_index = [0, 1, 2]

# Plot donut chart for each gender concerning treatment
for i, j in zip(gender, custom_index):
  data['treatment'][data['Gender']==i].value_counts() \
                                         .plot \
                                         .pie(explode=[0, 0.2],
                                              autopct='%1.1f%%',
                                              wedgeprops=dict(width=0.15),
                                              ax=ax[j], shadow=True)
  # Add some cosmetics
  ax[j].set_title(label= i.capitalize() +' Treatment', size=16)
  ax[j].set_ylabel(ylabel='Treatment', size=14)


# Display the graph
plt.show()

**Observation:**

- We can observe that **female** and **trans** are **more affected** than males in terms of ratio.

<a name = Section64></a>
**<h4>Question:** What is the proportional relationship between receiving treatment and experiencing work interference?</h4>

In [None]:
# Instantiate figure and axes object
f, ax = plt.subplots(nrows=1, ncols=4, figsize=(20, 8))

# Initiate a list of gender values and index
work_intefere = list(data['work_interfere'].unique())
custom_index = [0, 1, 2, 3]

# Plot donut chart for each work inteference concerning treatment
for i, j in zip(work_intefere, custom_index):
  data['treatment'][data['work_interfere']==i].value_counts() \
                                              .plot \
                                              .pie(explode=[0, 0.2],
                                                   autopct='%1.1f%%',
                                                   wedgeprops=dict(width=0.15),
                                                   ax=ax[j], shadow=True)
  # Add some cosmetics
  ax[j].set_title(label='Work Interference - '+i, size=16)
  ax[j].set_ylabel(ylabel='Treatment', size=14)


# Display the graph
plt.show()

**Observation:**

- We can observe that **employees** who are "**Often**" and "**Rarely**" interferred during their work are **more likely** to **have mental health issues**.

- Thus, they tend to go for treatment.

<a name = Section65></a>
**<h4>Question:** How is the likelihood of seeking treatment connected to age?</h4>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Initiate an empty figure of 15 x 7
plt.figure(figsize=(15, 7))

# Plot the violinplot for treatment vs age
sns.violinplot(x="treatment", y="Age", palette="gnuplot", data=data)

# Add cosmetics
plt.title('Age vs Treatment', fontsize=16)
plt.xlabel("Treatment", fontsize=14)
plt.ylabel("Age", fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True)

# Display the figure
plt.show()


**Observation:**

- Although, we can't make a big observation out of the above plot because the distinction is almost the same.

- But we can see that as the **age increases** **chances** to **seek** **treatment** also **increases**.

<a name = Section66></a>
**<h4>Question:** Is there a link between an employee’s family history of mental health issues and their decision to seek treatment?</h4>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Initiate an empty figure of 15 x 7
plt.figure(figsize=(15, 7))

# Plot the countplot
sns.countplot(x='family_history', hue='treatment', data=data)

# Add cosmetics
plt.title('Does family history affect mental health?', fontsize=16)
plt.xlabel('Do they have a family mental health problem?', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True)

# Display the figure
plt.show()

**Observation:**

- We can observe that **employees** who have **family history** are very much **likely** to **go for treatment**.

<a name = Section67></a>
**<h4>Question:** How does the number of employees in a company relate to the treatment-seeking behavior?</h4>

In [None]:
# Initiate an empty figure
figure = plt.figure(figsize=(15, 7))

# Plot the countplot figure
sns.countplot(x='no_employees', hue='treatment', data=data)

# Add some cosmetics
plt.title("Employee count vs Treatment", size=16)
plt.xlabel("Company Size", size=14)
plt.ylabel("Treatment", size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)

# Output the figure
plt.show()

**Observation:**

- The **maximum** number of **people** **gone for treatment** belongs to the **company size 26-100**.

- On the other hand, those who **haven't gone** for treatment belongs to **company size 6-25**.

In [None]:
data.groupby(by=['treatment', 'Gender'])['no_employees'].agg('count')

In [None]:
group_names=['Treatment', 'No Treatment']
group_size=[637, 622]

subgroup_names=['Treatment.Male', 'Treatment.Female', 'Treatment.trans',
                'No Treatment.Male','No Treatment.Female','No Treatment.trans']

subgroup_size = [450, 170, 15, 541, 77, 4]

# Initiating a color palette
a, b, c = [plt.cm.Blues, plt.cm.Reds, plt.cm.Greens]

# Initiate a figure and axes instance
fig, ax = plt.subplots(figsize=(12, 7))
ax.axis('equal')

# Plot the outer ring
mypie, _ = ax.pie(group_size, radius=1.3, labels=group_names,
                  colors=['yellowgreen', 'gold'])
plt.setp(obj=mypie, width=0.3, edgecolor='white')

# Plot the inner ring
mypie2, _ = ax.pie(subgroup_size, radius=1.3-0.3, labels=subgroup_names,
                   labeldistance=0.7, colors=[a(0.5), b(0.4), c(0.5),
                                              a(0.5), b(0.4), c(0.5)])
plt.setp(obj=mypie2, width=0.4, edgecolor='white')

plt.title('Donut plot')

# Output the graph
plt.show()

<a name = Section68></a>
**<h4>Question:** Which countries are the most prominent contributors to mental health data?</h4>

In [None]:
# Get top 10 common countries name and frequency
country_count = Counter(data['Country'].dropna().tolist()).most_common(10)
country_idx = [country[0] for country in country_count]
country_val = [country[1] for country in country_count]

# Initiate an empty figure
fig = plt.figure(figsize=[15, 7])

# Plot the barplot figure
sns.barplot(x=country_val, y=country_idx)

# Add some cosmetics
plt.title(label='Top 10 Countries Contributed Most in Mental Health', size=16)
plt.xlabel(xlabel='Frequency', size=14)
plt.ylabel(ylabel='Country', size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)

# Output the figure
plt.show()

**Observation:**

- We can observe that **US contributed the most** with **751 respondents** and now further exploring the states of US.

<a name = Section69></a>
**<h4>Question:** Which U.S. states have the highest representation in mental health-related responses?</h4>

In [None]:
# Extract states data of US
usa_data = data[data['Country']=='United States']
frequency = usa_data['state'].value_counts()[0:10].values
labels = usa_data['state'].value_counts()[0:10].index

# Initiate an empty figure
fig = plt.figure(figsize=[15, 7])

# Plot the barplot figure
sns.barplot(x=frequency, y=labels)

# Add some cosmetics
plt.title(label='Top 10 States in US Contributed Most in Mental Health', size=16)
plt.xlabel(xlabel='Frequency', size=14)
plt.ylabel(ylabel='State', size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)

# Output the figure
plt.show()

**Observation:**

- **CALIFORNIA** is the state that **contributed** the **most** in the survey with 138 respndents.

<a name = Section610></a>
**<h4>Question:** Among all countries, which three have the highest contribution to mental health data?</h4>

In [None]:
# Create a new dataframe based on top 3 countries
countries = pd.concat([data.loc[data['Country']=='Canada'],
                       data.loc[data['Country']=='United States'],
                       data.loc[data['Country']=='United Kingdom']]).reset_index(drop=True)

# Display the results
print('There number of people that exist from top 3 countries are: ', countries.shape[0])
print('Their proportion from total people surveyed is ',
      np.round(countries.shape[0]/data.shape[0], decimals=2))

<a name = Section611></a>
**<h4>Question:** What is the distribution of work interference levels among employees in the top three contributing countries?</h4>

In [None]:
# Initiate an empty figure
fig = plt.figure(figsize=[15, 7])

# Plot the countplot figure of top 3 countries
sns.countplot(x='work_interfere', data=countries)

# Add some cosmetics
plt.title(label='Frequency Distribution of Work Interference Among Employees for Top 3 Countries', size=16)
plt.xlabel(xlabel='Work Interference', size=14)
plt.ylabel(ylabel='Frequency', size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)

# Output the figure
plt.show()

In [None]:
work_sum = countries['work_interfere'].value_counts().reset_index()
work_sum.columns = ['interference', 'count']  # Rename columns for clarity

# Convert counts to integers just in case
work_sum['count'] = work_sum['count'].astype(int)

# Extract counts based on their labels
more_than_never = work_sum.loc[work_sum['interference'].isin(['Rarely', 'Sometimes', 'Often']), 'count'].sum()

often_count = work_sum.loc[work_sum['interference'] == 'Often', 'count'].values[0]

print('{:.1%}, believe that their mental health condition interferes with their work'.format(
    more_than_never / countries.shape[0]))

print('With {} ({:.1%}) people saying it interferes often'.format(
    often_count, often_count / countries.shape[0]))



<a name = Section612></a>
**<h4>Question:** How many individuals from the top three countries have reported undergoing treatment?</h4>

In [None]:
treatment_count = countries['treatment'].value_counts().reset_index()
# Rename columns for clarity
treatment_count.columns = ['treatment', 'count']

print('Luckily {} ({:.1%}) have sought treatment for their mental health issues'.format(
    treatment_count['count'][0], treatment_count['count'][0]/countries.shape[0]))

<a name = Section613></a>
**<h4>Question:** What is the gender-wise breakdown of individuals seeking treatment in the top three countries?</h4>

In [None]:
male = countries.loc[countries['Gender'] == 'male']
male_treatment = male.loc[male['treatment'] == 'Yes'].reset_index(drop=True)

female = countries.loc[countries['Gender'] == 'female']
female_treatment = female.loc[female['treatment'] == 'Yes'].reset_index(drop=True)

trans = countries.loc[countries['Gender'] == 'trans']
trans_treatment = trans.loc[trans['treatment'] == 'Yes'].reset_index(drop=True)

print('Out of {} males surveyed, {} ({:.1%}) sought treatment'
      .format(male.shape[0],
              male_treatment.shape[0],
              male_treatment.shape[0]/male.shape[0]))
print('Out of {} females surveyed, {} ({:.1%}) sought treatment'
      .format(female.shape[0],
              female_treatment.shape[0],
              female_treatment.shape[0]/female.shape[0]))
print('Out of {} people who identify as anything other than male or female surveyed, {} ({:.1%}) sought treatment'
      .format(trans.shape[0],
              trans_treatment.shape[0],
              trans_treatment.shape[0]/trans.shape[0]))


<a name = Section614></a>
**<h4>Question:** How are perceived consequences of mental health issues related to general attitudes toward them?</h4>

In [None]:
def attitude(x):
  """A custom function to map values in a feature."""

  if x == 'No':
    return 'Positive'
  elif x == 'Yes':
    return 'Negative'
  elif x == 'Maybe':
    return 'Moderate'
  else:
    return x

In [None]:
# Creating a new feature from mental health consequences
data['attitudes'] = data['mental_health_consequence'].apply(attitude)

# Initialize a figure of size 15 X 7
figure = plt.figure(figsize=[15, 7])

# Plot frequency of each category in Attitude
sns.countplot(y='attitudes', data=data)

# Add some cosmetics
plt.title(label='Attitude Concerning Mental Health Consequences', size=16)
plt.xlabel(xlabel='Frequency', size=14)
plt.ylabel(ylabel='Attitude', size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.grid(True)

# Display the plot
plt.show()

<a name = Section7></a>

---
# **7. Summarization**
---

- **<h4>Conclusion</h4>**

  - The mental health survey has **helped** us to **understand** the **mental condition of employees** working in tech firms across countries.

  - A total of **1259 entries were recorded** during the survey out of which **1007 were recorded** from the **top 3 countries**.

  - The **United States leads the chart** in terms of participation in the survey **followed by** the **United Kingdom** and **Canada**.

  - From a **state point of view**, **California leads the chart** when run down the analysis.

  - **48.1%** of **males**, **70%** of **females**, and **88%** of **trans** were found to have **sought treatment** concerning the overall survey.

  - The following set of **parameters** are found to be **affecting mental health** the most and thus requires treatment:
    - Age
    - Family history,
    - Work Interference,
    - Number of employees working in a company,


-  **<h4>Actionable Insights</h4>**

  - There should be an **awareness program** about mental health and its effects.

  - Relationship **Managers** **should be supportive** with the right guidance towards their employees.

  - Managers should be **unbiased** concerning the work and the employees.

  - There should be **appropriate measures** and **support** for the employees suffering from mental health.

  - It is **good to give** an **appreciation** at work **regularly**.