# 💤 SleepSense — India Edition  
### Predicting Sleep Quality Using Lifestyle and Environmental Factors

---

##  Introduction
Sleep quality plays an important role in human productivity, mental health, and overall well-being.  
This project focuses on predicting the **Sleep Quality Score (0–100)** of individuals using measurable features such as:
- Screen time hours  
- Tea/Coffee consumption  
- Physical activity  
- Stress level  
- Environmental conditions (noise, air quality, etc.)

The study focuses on **Indian lifestyle patterns and urban environmental factors**, which makes the dataset unique.


##  Problem Statement
To develop a supervised machine learning model that predicts an individual's **sleep quality score** based on lifestyle and environmental variables.

This problem is treated as a **Regression Problem** because the target variable `sleep_quality_score` is continuous (ranging from 0 to 100).

##  Learning Objectives
- Understand and apply the **Supervised Learning** approach  
- Implement **Linear Regression** using the `sklearn` library  
- Evaluate model performance using **R²** and **Adjusted R²**  
- Apply **Feature Engineering** to improve accuracy  
- Perform **Regularization** using Ridge and Lasso methods  
- Interpret model coefficients to derive meaningful insights  
- Build a **Streamlit app** for real-time prediction


##  Project Workflow

1. Data Collection and Understanding  
2. Data Preprocessing and Cleaning  
3. Exploratory Data Analysis (EDA)  
4. Feature Engineering  
5. Model Building using Linear Regression  
6. Model Evaluation (R², Adjusted R², RMSE)  
7. Model Tuning and Regularization (Ridge, Lasso)  
8. Final Model Interpretation and Visualization  
9. Streamlit App Development  
10. Summary and Report Generation


##  Mapping with Learned Concepts

| Concept | Implementation Area |
|----------|--------------------|
| Supervised Learning | Complete project (Regression task) |
| Linear Regression | Core model for prediction |
| R², Adjusted R² | Model evaluation metrics |
| Residual Analysis | Diagnostic check |
| Ridge, Lasso | Regularization and overfitting control |
| Feature Engineering | Creating new meaningful variables |
| Model Deployment | Streamlit-based UI |


In [5]:
# Import core libraries (faculty-approved)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print(" Basic libraries loaded successfully")


 Basic libraries loaded successfully


In [2]:
# Load dataset
df = pd.read_csv('../data/raw/SleepSense_India_Full.csv')
print("Dataset loaded successfully. Shape:", df.shape)
df.head()


Dataset loaded successfully. Shape: (12000, 23)


Unnamed: 0,city,age,sex,family_size,work_hours,avg_sleep_hours,screen_time_hours,tea_cups,coffee_cups,late_snack,...,physical_activity_min,bedtime_variability,stress_level,city_noise_dB,light_pollution_index,temperature_night,humidity_night,air_quality_index,screen_brightness_behavior,sleep_quality_score
0,Thrissur,25.3,Female,4,11.6,4.79,3.69,3,1,0,...,0.9,0.25,5.37,65.97,46.18,27.73,53.42,179.7,0.29,38.42
1,Kolkata,49.0,Female,2,12.3,5.82,2.71,0,2,0,...,0.0,0.43,3.41,57.56,57.09,21.27,50.0,144.7,0.31,48.63
2,Varanasi,16.4,Female,4,9.0,6.15,1.21,0,0,1,...,78.9,1.57,3.99,61.37,31.77,28.02,73.11,69.4,0.73,52.9
3,Jaipur,18.9,Female,4,10.4,6.81,2.33,1,3,1,...,51.1,1.48,6.35,66.66,69.69,27.73,34.59,171.3,0.15,45.61
4,Nanded,47.1,Male,5,9.3,6.73,1.91,4,0,0,...,57.5,0.84,3.39,39.77,53.15,32.31,47.88,52.2,0.82,63.37


##  Initial Observation
The dataset contains 12,000 records and multiple features related to personal, lifestyle, and environmental factors.
It will be used to train a regression model that predicts the sleep quality score.


In [6]:

import sklearn
import pandas, numpy, seaborn, matplotlib
print("All libraries loaded successfully!")


All libraries loaded successfully!
