# Playground Series - Season 4, Episode 11: Exploring Mental Health Data

![Playground Series - Season 4, Episode 11: Exploring Mental Health Data](images/s4e11.png)

Dataset Description

The dataset for this competition (both train and test) was generated from a deep learning model trained on the [Depression Survey/Dataset](https://www.kaggle.com/datasets/sumansharmadataworld/depression-surveydataset-for-analysis) dataset. Feature distributions are close to, but not exactly the same, as the original. Feel free to use the original dataset as part of this competition, both to explore differences as well as to see whether incorporating the original in training improves model performance.

This dataset was collected as part of a comprehensive survey aimed at understanding the factors contributing to depression risk among adults. It was collected during an anonymous survey conducted between January and June 2023. The survey was conducted across various cities, targeting individuals from diverse backgrounds and professions. Participants, ranging from 18 to 60 years old, voluntarily provided inputs on factors such as age, gender, city, degree, job satisfaction, study satisfaction, study/work hours, and family history among others. Participants were asked to provide inputs without requiring any professional mental health assessments or diagnostic test scores.

The target variable, 'Depression', represents whether the individual is at risk of depression, marked as 'Yes' or 'No', based on their responses to lifestyle and demographic factors. The dataset has been curated to provide insights into how everyday factors might correlate with mental health risks, making it a useful resource for machine learning models aimed at mental health prediction.

This dataset can be used for predictive modeling in mental health research, particularly in identifying key contributors to mental health challenges in a non-clinical setting.

Evaluation 

The evaluation metric for this competition is Accuracy Score.


##  Exploratory Data Analysis (EDA) 🔍

### Import Libraries 📚

In [9]:
import os
import kaggle_config
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Applying a custom color scheme to the plots
import urllib.request
url = "https://raw.githubusercontent.com/h4pZ/rose-pine-matplotlib/main/themes/rose-pine-dawn.mplstyle"
if not os.path.exists("/Users/danipopov/Projects/Kaggle_PlayGround/data/rose-pine-dawn.mplstyle"):
    save_path = "/Users/danipopov/Projects/Kaggle_PlayGround/data/rose-pine-dawn.mplstyle"  # Include the file name
    urllib.request.urlretrieve(url, save_path)
plt.style.use(save_path)

### Load Data 🔄

In [12]:
# Set Kaggle API credentials as environment variables
os.environ['KAGGLE_USERNAME'] = kaggle_config.KAGGLE_USERNAME
os.environ['KAGGLE_KEY'] = kaggle_config.KAGGLE_KEY

# Download the data from Kaggle
base_path = "/Users/danipopov/Projects/Kaggle_PlayGround"
kaggle_data_path = f"{base_path}/s4e11.zip"  # Changed path
data_path = f"{base_path}/data/s4e11"

# Create directory if it doesn't exist
! mkdir -p {data_path}

# Download, making sure we're in the correct directory
! cd {base_path} && kaggle competitions download -c playground-series-s4e11 -p .

# Unzip the data
! unzip -o {kaggle_data_path} -d {data_path}

# Remove the zip file
! rm {kaggle_data_path}

# Load the data
train_df = pd.read_csv(f"{data_path}/train.csv")
test_df = pd.read_csv(f"{data_path}/test.csv")
sample_submission_df = pd.read_csv(f"{data_path}/sample_submission.csv")

playground-series-s4e11.zip: Skipping, found more recently modified local copy (use --force to force download)
unzip:  cannot find or open /Users/danipopov/Projects/Kaggle_PlayGround/s4e11.zip, /Users/danipopov/Projects/Kaggle_PlayGround/s4e11.zip.zip or /Users/danipopov/Projects/Kaggle_PlayGround/s4e11.zip.ZIP.
rm: /Users/danipopov/Projects/Kaggle_PlayGround/s4e11.zip: No such file or directory


FileNotFoundError: [Errno 2] No such file or directory: '/Users/danipopov/Projects/Kaggle_PlayGround/data/s4e11/train.csv'