# Bivariate Analysis

In this notebook, we will perform bivariate analysis to explore the relationships between the target variable (stress level) and other features in the dataset. We will create visualizations to help understand these relationships.

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the cleaned dataset
data = pd.read_csv('../data/processed/sleep_cleaned.csv')

# Display the first few rows of the dataset
data.head()

## Relationship between Stress Level and Other Features
We will visualize the relationship between the target variable (stress level) and other features in the dataset.

In [2]:
# Create a function to plot bivariate relationships
def plot_bivariate_relationships(data, target_variable):
    features = data.columns.drop(target_variable)
    for feature in features:
        plt.figure(figsize=(10, 6))
        sns.boxplot(x=target_variable, y=feature, data=data)
        plt.title(f'Relationship between {target_variable} and {feature}')
        plt.xlabel(target_variable)
        plt.ylabel(feature)
        plt.show()

# Plot bivariate relationships
plot_bivariate_relationships(data, 'stress_level')

## Observations
After visualizing the relationships, we can make the following observations:
- The distribution of sleep duration appears to vary significantly with stress levels, indicating that less sleep may correlate with higher stress.
- Other lifestyle factors such as exercise frequency and caffeine consumption also show notable differences across stress levels.

These insights can guide further analysis and model development.