# Univariate Analysis

In this notebook, we will conduct a univariate analysis of the sleep patterns dataset. We will review the distribution of both categorical and numerical variables individually to understand their characteristics.

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

In [2]:
# Load the cleaned dataset
data = pd.read_csv('../data/processed/sleep_cleaned.csv')

# Display the first few rows of the dataset
data.head()

In [3]:
# Univariate analysis of categorical variables
categorical_vars = data.select_dtypes(include=['object']).columns

for var in categorical_vars:
    plt.figure(figsize=(10, 5))
    sns.countplot(data[var])
    plt.title(f'Distribution of {var}')
    plt.xticks(rotation=45)
    plt.show()

In [4]:
# Univariate analysis of numerical variables
numerical_vars = data.select_dtypes(include=['float64', 'int64']).columns

for var in numerical_vars:
    plt.figure(figsize=(10, 5))
    sns.histplot(data[var], bins=30, kde=True)
    plt.title(f'Distribution of {var}')
    plt.xlabel(var)
    plt.ylabel('Frequency')
    plt.show()

## Conclusions

In this univariate analysis, we have visualized the distributions of both categorical and numerical variables. 
This helps us understand the characteristics of each variable, identify any potential issues such as skewness or outliers, and provides insights into the overall dataset structure.