In [1]:
# Ensure your environment is activated and MySQL server is running

import pandas as pd
import pymysql
import getpass
from sqlalchemy import create_engine
from dotenv import dotenv_values


In [2]:
# Prompt for the MySQL password securely
# For testing, assign the password directly
config = dotenv_values(".env")


# SQLAlchemy connection
engine = create_engine(f"mysql+pymysql://root:{config['password']}@127.0.0.1/remote_work_db")

try:
    # Sample query to retrieve data
    query = "SELECT * FROM remote_work_mental_health LIMIT 10;"
    df = pd.read_sql(query, engine)
    display(df)
except Exception as e:
    print(f"Error: {e}")


Unnamed: 0,Employee_ID,Age,Gender,Job_Role,Industry,Years_of_Experience,Work_Location,Hours_Worked_Per_Week,Number_of_Virtual_Meetings,Work_Life_Balance_Rating,...,Access_to_Mental_Health_Resources,Productivity_Change,Social_Isolation_Rating,Satisfaction_with_Remote_Work,Company_Support_for_Remote_Work,Physical_Activity,Sleep_Quality,Region,Job_Role_ID,Department
0,EMP0001,32,Non-binary,HR,Healthcare,13,Hybrid,47,7,2,...,No,Decrease,1,Unsatisfied,1,Weekly,Good,Unknown,1,Unknown
1,EMP0002,40,Female,Data Scientist,IT,3,Remote,52,4,1,...,No,Increase,3,Satisfied,2,Weekly,Good,Unknown,2,Unknown
2,EMP0003,59,Non-binary,Software Engineer,Education,22,Hybrid,46,11,5,...,No,No Change,4,Unsatisfied,5,,Poor,Unknown,3,Unknown
3,EMP0004,27,Male,Software Engineer,Finance,20,Onsite,32,8,4,...,Yes,Increase,3,Unsatisfied,3,,Poor,Unknown,3,Unknown
4,EMP0005,49,Male,Sales,Consulting,32,Onsite,35,12,2,...,Yes,Decrease,3,Unsatisfied,3,Weekly,Average,Unknown,4,Unknown
5,EMP0006,59,Non-binary,Sales,IT,31,Hybrid,39,3,4,...,No,Increase,5,Unsatisfied,1,,Average,Unknown,4,Unknown
6,EMP0007,31,Prefer not to say,Sales,IT,24,Remote,51,7,3,...,Yes,Decrease,5,Neutral,3,Daily,Poor,Unknown,4,Unknown
7,EMP0008,42,Non-binary,Data Scientist,Manufacturing,6,Onsite,54,7,3,...,No,Decrease,5,Satisfied,4,,Average,Unknown,2,Unknown
8,EMP0009,56,Prefer not to say,Data Scientist,Healthcare,9,Hybrid,24,4,2,...,Yes,Decrease,2,Unsatisfied,4,Daily,Poor,Unknown,2,Unknown
9,EMP0010,30,Female,HR,IT,28,Hybrid,57,6,1,...,Yes,Decrease,2,Neutral,1,Weekly,Poor,Unknown,1,Unknown


1. Sample Data (First 10 Rows)

The first 10 rows of the dataset give a glimpse into the distribution of employees by attributes such as Age, Gender, Job Role, Industry, Work Location, and key indicators like Stress Level and Mental Health Condition.

Key observations:

The sample includes diverse job roles like HR, Data Scientist, Software Engineer, and Sales.
Employees are from multiple industries: IT, Healthcare, Education, Finance, etc.
Work locations vary between Hybrid, Remote, and Onsite.


In [3]:
# Query to find null values in key columns
query_2 = """
SELECT
    SUM(CASE WHEN Age IS NULL THEN 1 ELSE 0 END) AS Age_Null,
    SUM(CASE WHEN Gender IS NULL THEN 1 ELSE 0 END) AS Gender_Null,
    SUM(CASE WHEN Job_Role IS NULL THEN 1 ELSE 0 END) AS Job_Role_Null,
    SUM(CASE WHEN Industry IS NULL THEN 1 ELSE 0 END) AS Industry_Null,
    SUM(CASE WHEN Work_Location IS NULL THEN 1 ELSE 0 END) AS Work_Location_Null,
    SUM(CASE WHEN Stress_Level IS NULL THEN 1 ELSE 0 END) AS Stress_Level_Null,
    SUM(CASE WHEN Mental_Health_Condition IS NULL THEN 1 ELSE 0 END) AS Mental_Health_Null
FROM remote_work_mental_health;
"""
df_2 = pd.read_sql(query_2, engine)

# Results
display(df_2)


Unnamed: 0,Age_Null,Gender_Null,Job_Role_Null,Industry_Null,Work_Location_Null,Stress_Level_Null,Mental_Health_Null
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


2. Null Value Analysis

We checked the key columns to see if there were any missing values in Age, Gender, Job Role, Industry, Work Location, Stress Level, and Mental Health Condition.

Result:

No null values were found in the selected columns, ensuring the dataset is complete and usable for further analysis.


In [4]:
# Query to calculate the average hours worked
query_3 = "SELECT AVG(Hours_Worked_Per_Week) AS Average_Hours_Worked FROM remote_work_mental_health;"
df_3 = pd.read_sql(query_3, engine)

# The average hours worked
display(df_3)


Unnamed: 0,Average_Hours_Worked
0,39.6146


3. Average Hours Worked per Week

The average number of hours worked per week across all employees is 39.6 hours.

Key Insight:

The average work week aligns closely with a standard 40-hour week, but variations in hours worked might still impact employee mental health, which we will analyze further.

In [5]:
# Query to group by work location
query_4 = """
SELECT Work_Location, COUNT(*) AS Count
FROM remote_work_mental_health
GROUP BY Work_Location;
"""
df_4 = pd.read_sql(query_4, engine)

# Display the work location count
display(df_4)


Unnamed: 0,Work_Location,Count
0,Hybrid,1649
1,Remote,1714
2,Onsite,1637


4. Distribution of Employees by Work Location

The majority of employees are split between Hybrid (1,649 employees), Remote (1,714 employees), and Onsite (1,637 employees).

Key Insight:

There’s a balanced distribution across work locations, allowing for a fair comparison of mental health and productivity metrics across these groups.


In [6]:
# Query to group by work location and mental health condition
query_5 = """
SELECT Work_Location, Mental_Health_Condition, COUNT(*) AS Count
FROM remote_work_mental_health
GROUP BY Work_Location, Mental_Health_Condition;
"""
df_5 = pd.read_sql(query_5, engine)

# Results
display(df_5)


Unnamed: 0,Work_Location,Mental_Health_Condition,Count
0,Hybrid,Depression,421
1,Remote,Anxiety,443
2,Hybrid,Anxiety,428
3,Onsite,Depression,412
4,Onsite,,376
5,Hybrid,,400
6,Remote,,420
7,Remote,Depression,413
8,Onsite,Burnout,442
9,Hybrid,Burnout,400


5. Work Location and Mental Health Condition

We explored how work location correlates with mental health conditions such as Depression, Anxiety, Burnout, and None.

Key Results:

Hybrid workers show high numbers of depression (421), anxiety (428), and burnout (400).
Remote workers report high anxiety (443), but slightly lower depression (413).
Onsite workers show the highest numbers of burnout (442) and depression (412).
Key Insight:

Work location influences different mental health outcomes. Hybrid workers appear to struggle with a balance of both anxiety and depression, while onsite workers face high burnout levels.


In [7]:
# Query to group by stress level and work location
query_6 = """
SELECT Work_Location, Stress_Level, COUNT(*) AS Count
FROM remote_work_mental_health
GROUP BY Work_Location, Stress_Level;
"""
df_6 = pd.read_sql(query_6, engine)

# The results
display(df_6)


Unnamed: 0,Work_Location,Stress_Level,Count
0,Hybrid,Medium,545
1,Remote,Medium,577
2,Onsite,High,535
3,Hybrid,High,561
4,Remote,Low,547
5,Onsite,Medium,547
6,Hybrid,Low,543
7,Remote,High,590
8,Onsite,Low,555


6. Work Location and Stress Level

We grouped employees by their Stress Level across work locations.

Key Results:

Remote workers have a higher percentage reporting low stress (547) compared to onsite workers.
Hybrid workers have a mix of medium stress (545) and high stress (561).
Key Insight:

Remote work correlates more with lower stress, while hybrid workers exhibit a balanced distribution of stress levels, perhaps due to the fluctuating demands of both home and office environments.

In [8]:
# Query to group by productivity change and work location
query_7 = """
SELECT Work_Location, Productivity_Change, COUNT(*) AS Count
FROM remote_work_mental_health
GROUP BY Work_Location, Productivity_Change;
"""
df_7 = pd.read_sql(query_7, engine)

# The results
display(df_7)


Unnamed: 0,Work_Location,Productivity_Change,Count
0,Hybrid,Decrease,591
1,Remote,Increase,558
2,Hybrid,No Change,544
3,Onsite,Increase,514
4,Onsite,Decrease,558
5,Hybrid,Increase,514
6,Remote,Decrease,588
7,Remote,No Change,568
8,Onsite,No Change,565


7. Work Location and Productivity Change

We looked at how work location relates to Productivity Change.

Key Results:

Remote workers show more cases of increase in productivity (558) compared to onsite or hybrid workers.
Onsite workers show more cases of no change in productivity (565), while hybrid workers have more cases of a decrease in productivity (591).
Key Insight:

Remote workers seem to benefit from increased productivity, whereas hybrid workers are more prone to experience a decline in productivity.


In [10]:
# Stress Levels by Job Role
stress_levels = pd.read_sql("""
    SELECT jr.Job_Role_Name, rw.Stress_Level, COUNT(*) AS num_employees
    FROM remote_work_mental_health rw
    JOIN job_roles jr ON rw.Job_Role_ID = jr.Job_Role_ID
    GROUP BY jr.Job_Role_Name, rw.Stress_Level;
""", engine)
display(stress_levels)

Unnamed: 0,Job_Role_Name,Stress_Level,num_employees
0,HR,Medium,248
1,HR,Low,241
2,HR,High,227
3,Data Scientist,Medium,224
4,Data Scientist,High,242
5,Data Scientist,Low,230
6,Software Engineer,Medium,252
7,Software Engineer,High,234
8,Software Engineer,Low,225
9,Sales,High,253


8. Stress Levels by Job Role:

Key Result: HR and Sales professionals experience the highest levels of stress, with a large proportion of employees in these roles reporting high stress. Software Engineers and Designers report lower levels of stress in comparison.

Key Insight: Certain job roles, especially HR and Sales, are associated with higher stress levels, possibly due to the nature of their responsibilities. This suggests that stress management programs should be particularly targeted at these roles to alleviate pressure and improve well-being.


In [11]:
# Productivity Change by Work Location
productivity_change = pd.read_sql("""
    SELECT rw.Work_Location, rw.Productivity_Change, COUNT(*) AS num_employees
    FROM remote_work_mental_health rw
    GROUP BY rw.Work_Location, rw.Productivity_Change;
""", engine)

display(productivity_change)

Unnamed: 0,Work_Location,Productivity_Change,num_employees
0,Hybrid,Decrease,591
1,Remote,Increase,558
2,Hybrid,No Change,544
3,Onsite,Increase,514
4,Onsite,Decrease,558
5,Hybrid,Increase,514
6,Remote,Decrease,588
7,Remote,No Change,568
8,Onsite,No Change,565


9. Productivity Change by Work Location:
    
Key Result: Remote workers report more cases of increased productivity (558), while hybrid workers have the most cases of decreased productivity (591). Onsite workers are more likely to report no change in productivity.

Key Insight: Remote work appears to benefit productivity, while hybrid work might present more challenges in maintaining or improving productivity. Businesses may need to explore ways to better support hybrid workers to prevent a decline in productivity.


In [12]:
# Work-Life Balance by Work Location and Job Role
work_life_balance = pd.read_sql("""
    SELECT rw.Work_Location, rw.Work_Life_Balance_Rating, jr.Job_Role_Name, COUNT(*) AS num_employees
    FROM remote_work_mental_health rw
    JOIN job_roles jr ON rw.Job_Role_ID = jr.Job_Role_ID
    GROUP BY rw.Work_Location, rw.Work_Life_Balance_Rating, jr.Job_Role_Name;
""", engine)

display(work_life_balance)

Unnamed: 0,Work_Location,Work_Life_Balance_Rating,Job_Role_Name,num_employees
0,Hybrid,2,HR,41
1,Hybrid,1,HR,53
2,Onsite,2,HR,50
3,Remote,3,HR,46
4,Onsite,4,HR,42
...,...,...,...,...
100,Remote,5,Project Manager,39
101,Hybrid,2,Project Manager,48
102,Hybrid,1,Project Manager,48
103,Remote,4,Project Manager,55


10. Work-Life Balance by Work Location and Job Role:
    
Key Result: Work-life balance ratings vary significantly based on work location and job role. Remote workers tend to have better work-life balance, while onsite and hybrid workers, especially those in HR and Project Management, report lower ratings.

Key Insight: The flexibility of remote work seems to contribute to better work-life balance, while roles that require more in-person presence or hybrid arrangements (like HR and Project Managers) may experience greater challenges in maintaining a good work-life balance.


In [13]:
# Mental Health Conditions by Job Role and Work Location
mental_health_conditions = pd.read_sql("""
    SELECT rw.Work_Location, rw.Mental_Health_Condition, jr.Job_Role_Name, COUNT(*) AS num_employees
    FROM remote_work_mental_health rw
    JOIN job_roles jr ON rw.Job_Role_ID = jr.Job_Role_ID
    GROUP BY rw.Work_Location, rw.Mental_Health_Condition, jr.Job_Role_Name;
""", engine)

display(mental_health_conditions)

Unnamed: 0,Work_Location,Mental_Health_Condition,Job_Role_Name,num_employees
0,Hybrid,Depression,HR,65
1,Onsite,,HR,45
2,Remote,Anxiety,HR,73
3,Onsite,Anxiety,HR,65
4,Onsite,Depression,HR,53
...,...,...,...,...
79,Hybrid,Burnout,Project Manager,45
80,Onsite,Anxiety,Project Manager,62
81,Remote,Burnout,Project Manager,69
82,Hybrid,,Project Manager,58


11. Mental Health Conditions by Work Location and Job Role:

Key Result: Anxiety is more prevalent among remote and onsite workers, while depression is more common in hybrid work settings. HR professionals and Project Managers are particularly affected by both anxiety and depression.

Key Insight: The distribution of mental health conditions seems to vary based on the work environment, with remote workers experiencing more anxiety and hybrid workers more depression. HR professionals, who are responsible for employee well-being, seem to be at higher risk for these conditions, highlighting the need for tailored support in their roles.


In [14]:
# Average Hours Worked per Week by Job Role
avg_hours = pd.read_sql("""
    SELECT jr.Job_Role_Name, AVG(rw.Hours_Worked_Per_Week) AS avg_hours
    FROM remote_work_mental_health rw
    JOIN job_roles jr ON rw.Job_Role_ID = jr.Job_Role_ID
    GROUP BY jr.Job_Role_Name;
""", engine)

display(avg_hours)

Unnamed: 0,Job_Role_Name,avg_hours
0,HR,39.6606
1,Data Scientist,38.954
2,Software Engineer,40.2714
3,Sales,39.8608
4,Marketing,39.735
5,Designer,38.8811
6,Project Manager,39.9228


12. Average Hours Worked per Week by Job Role:

Key Result: Software Engineers and Project Managers have the highest average working hours (around 40 hours per week), while Data Scientists work the least, with an average of around 38.9 hours per week.

Key Insight: There seems to be a consistent pattern across job roles, with all roles averaging close to 40 hours per week. Software Engineers tend to work slightly more hours, which could indicate higher demands or longer tasks in their role.


In [15]:
# Stress Level Counts by Work Location
stress_counts = pd.read_sql("""
    SELECT rw.Work_Location, rw.Stress_Level, COUNT(*) AS num_employees
    FROM remote_work_mental_health rw
    GROUP BY rw.Work_Location, rw.Stress_Level;
""", engine)

display(stress_counts)

Unnamed: 0,Work_Location,Stress_Level,num_employees
0,Hybrid,Medium,545
1,Remote,Medium,577
2,Onsite,High,535
3,Hybrid,High,561
4,Remote,Low,547
5,Onsite,Medium,547
6,Hybrid,Low,543
7,Remote,High,590
8,Onsite,Low,555


12. Stress Level Counts by Work Location
    
Key Result:
Remote workers have the highest number of employees reporting low stress levels (590), followed by hybrid workers (543).
Hybrid workers also report a significant number of high stress levels (561), closely followed by remote workers with medium stress levels (577).
Onsite workers have a relatively even distribution of stress levels, with high stress being slightly more common (535).

Key Insight:
Remote work seems to be associated with lower stress levels, while hybrid work presents more mixed results with a noticeable proportion of employees reporting both high and medium stress levels.

Onsite workers tend to experience more high stress, possibly due to less flexibility and a more structured work environment.
This suggests that the work location plays an important role in influencing stress levels, and companies might consider offering additional stress relief options, particularly for onsite and hybrid employees.

# Conclusion

## The SQL analysis reinforces several key points:

Mental Health and Work Location: There’s a notable relationship between work location and mental health outcomes.
Hybrid workers show higher levels of both anxiety and depression, while onsite workers are more prone to burnout.

Stress Levels:
Remote workers report lower stress, with the highest number of employees reporting low stress levels.
Hybrid workers are more evenly split between medium and high stress levels, reflecting the mixed experiences of balancing remote and onsite work.
Onsite workers tend to experience more high stress, suggesting that onsite work environments may be more demanding or less flexible.

Productivity:
Remote work boosts productivity in many cases, with more employees reporting increased productivity compared to hybrid or onsite work.
Hybrid work, while offering flexibility, may lead to decreased productivity in some cases.

Work-Life Balance:
Hybrid workers reported a wide range of work-life balance ratings, with some experiencing lower satisfaction. Onsite workers also showed more mixed work-life balance results, while remote workers showed more consistency in maintaining a balance.


This analysis provides valuable insights that align with the project’s objectives, demonstrating how work location impacts mental health, stress, work-life balance, and productivity. These findings can help guide companies in refining their policies to enhance employee well-being and support, especially in adapting work arrangements to individual needs and reducing stress across different work environments.