<a href="https://colab.research.google.com/github/MOHITRAJDEO12345/Early-prediction-of-lifestyle-diseases/blob/main/Early_prediction_of_lifestyle_diseases.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

 Below is a list of the most common features tracked by nearly all smartwatches and fitness trackers:

1. Heart Rate (HR)
Description: Continuous monitoring of heart rate in beats per minute (BPM).

Use Case: Detecting anomalies, stress levels, or fitness improvements.

2. Steps Count
Description: Total number of steps taken during the day.

Use Case: Activity level assessment, goal setting, and calorie estimation.

3. Distance Traveled
Description: Total distance covered (in kilometers or miles) based on steps or GPS data.

Use Case: Tracking movement and exercise performance.

4. Calories Burned
Description: Estimated calories burned based on activity level, heart rate, and user profile (age, weight, height).

Use Case: Weight management and fitness tracking.

5. Active Minutes
Description: Time spent in moderate to vigorous physical activity.

Use Case: Measuring exercise intensity and adherence to fitness goals.

6. Sleep Duration
Description: Total time spent sleeping, often broken into light, deep, and REM sleep stages.

Use Case: Sleep quality analysis and health monitoring.

7. Activity Type
Description: Classification of activities (e.g., walking, running, cycling) based on motion and heart rate data.

Use Case: Exercise tracking and performance analysis.

8. Heart Rate Variability (HRV)
Description: Variation in time between heartbeats, often used to measure stress and recovery.

Use Case: Stress monitoring and recovery assessment.

9. Resting Heart Rate (RHR)
Description: Heart rate measured during periods of rest or inactivity.

Use Case: Cardiovascular health monitoring.

10. GPS Data (Location and Route)
Description: Tracks outdoor activities like running or cycling, providing route maps and speed.

Use Case: Performance analysis and route optimization.

11. Floors Climbed
Description: Number of floors or stairs climbed using an altimeter.

Use Case: Tracking elevation gain and activity diversity.

12. Sedentary Alerts
Description: Notifications to move after prolonged periods of inactivity.

Use Case: Encouraging physical activity and reducing sedentary behavior.

13. Workout Metrics
Description: Data specific to workouts, such as duration, pace, speed, and heart rate zones.

Use Case: Exercise performance analysis and goal tracking.

14. Step Cadence
Description: Number of steps per minute, often used for running or walking analysis.

Use Case: Improving running efficiency and form.

15. Stand Hours
Description: Number of hours in which the user stood up and moved for at least a few minutes.

Use Case: Encouraging movement throughout the day.

List 1: Common Lifestyle Inputs
These are additional inputs you can collect from users to complement smartwatch data:

Age: A critical factor in many health conditions.

Gender: Biological sex can influence disease risk.

Weight: Body weight, often used in conjunction with height to calculate BMI.

Height: Used to calculate BMI and assess growth or development.

Body Mass Index (BMI): Derived from weight and height.

Smoking Status: Whether the user smokes, and if so, how frequently.

Alcohol Consumption: Frequency and quantity of alcohol intake.

Dietary Habits: Types of food consumed (e.g., high sugar, high fat, vegetarian).

Physical Activity Level: Beyond smartwatch data, self-reported activity levels.

Sleep Habits: Self-reported sleep duration and quality.

Stress Levels: Self-reported stress or anxiety levels.

Medical History: Past or existing conditions (e.g., diabetes, hypertension).

Family History: Genetic predisposition to certain diseases.

Medication Use: Current medications and supplements.

Water Intake: Daily water consumption.

Occupation: Sedentary vs. active jobs.

Mental Health: Self-reported mental health status (e.g., depression, anxiety).

Here’s a structured breakdown of lifestyle diseases divided into **three categories** based on the data sources required for prediction:  

---

### **1. Diseases Predicted Using Smartwatch Data Only**  
*(Using the 15 common features listed earlier: heart rate, steps, sleep, activity levels, etc.)*  
- **Sleep Disorders**  
  - Insomnia, irregular sleep patterns.  
  - *Key Smartwatch Data*: Sleep duration, sleep stages (light/deep/REM), nighttime heart rate.  
- **Cardiovascular Abnormalities**  
  - Arrhythmias, elevated resting heart rate.  
  - *Key Smartwatch Data*: Heart rate variability (HRV), ECG (if available), resting heart rate.  
- **Overtraining Syndrome**  
  - Physical burnout from excessive exercise.  
  - *Key Smartwatch Data*: Activity intensity, recovery metrics, HRV.  
- **Stress and Anxiety (Basic Detection)**  
  - Acute stress episodes.  
  - *Key Smartwatch Data*: HRV, elevated heart rate, activity/sleep correlation.  
- **Sedentary Behavior Risks**  
  - Poor circulation, muscle atrophy.  
  - *Key Smartwatch Data*: Step count, sedentary alerts, stand hours.  
- **Dehydration Risk (Indirect)**  
  - Abnormal heart rate spikes during low activity.  
  - *Key Smartwatch Data*: Heart rate trends, activity levels.  

---

### **2. Diseases Predicted Using Interface Inputs Only**  
*(Requires user-reported lifestyle data: smoking, alcohol, diet, weight, age, etc.)*  
- **Gout**  
  - *Key Inputs*: Alcohol consumption, diet (high-purine foods), weight.  
- **Osteoporosis**  
  - *Key Inputs*: Age, gender, calcium/vitamin D intake, smoking.  
- **Liver Disease (e.g., Fatty Liver)**  
  - *Key Inputs*: Alcohol consumption, diet, BMI.  
- **Certain Cancers (e.g., Lung Cancer)**  
  - *Key Inputs*: Smoking history, family history, occupational hazards.  
- **Eating Disorders**  
  - *Key Inputs*: Self-reported dietary habits, weight fluctuations, mental health history.  
- **Chronic Kidney Disease (CKD)**  
  - *Key Inputs*: Age, smoking, hypertension history, diet.  

---

### **3. Diseases Requiring Both Smartwatch Data + Interface Inputs**  
*(Combines biometric and lifestyle data for accurate prediction)*  
- **Type 2 Diabetes**  
  - *Key Inputs*: Weight, BMI, diet, family history.  
  - *Smartwatch Data*: Activity levels, sleep quality, heart rate trends.  
- **Hypertension**  
  - *Key Inputs*: Age, smoking, alcohol, stress levels.  
  - *Smartwatch Data*: Resting heart rate, HRV, activity intensity.  
- **Obesity & Metabolic Syndrome**  
  - *Key Inputs*: BMI, diet, physical activity level.  
  - *Smartwatch Data*: Step count, calories burned, sedentary time.  
- **Depression/Anxiety**  
  - *Key Inputs*: Self-reported stress, mental health history.  
  - *Smartwatch Data*: Sleep disruption, HRV, activity reduction.  
- **COPD (Chronic Obstructive Pulmonary Disease)**  
  - *Key Inputs*: Smoking history, occupational exposure.  
  - *Smartwatch Data*: Respiratory rate, SpO2 (if available).  
- **Stroke Risk**  
  - *Key Inputs*: Age, smoking, hypertension history.  
  - *Smartwatch Data*: Irregular heart rhythms (e.g., atrial fibrillation via ECG).  
- **Sleep Apnea**  
  - *Key Inputs*: Weight, alcohol consumption.  
  - *Smartwatch Data*: Sleep disruptions, SpO2 drops (if available).  

---

### **Key Takeaways for Your ML Model**  
1. **Smartwatch-Only Models**: Focus on **real-time biometric patterns** (e.g., heart rate anomalies, sleep disruptions).  
2. **Interface-Only Models**: Prioritize **demographic and behavioral correlations** (e.g., smoking → lung disease).  
3. **Hybrid Models**: Combine **biometric trends** with **lifestyle context** for complex predictions (e.g., diabetes, hypertension).  

This categorization ensures you leverage the strengths of each data type while addressing the limitations of standalone datasets.

Why This Order?
Interface-Only: Build foundational ML skills (classification, regression) with tabular data.

Smartwatch-Only: Introduce time-series analysis and sensor data.

Hybrid: Combine skills to solve complex problems.


Start with Step 1 to gain confidence, then gradually tackle more complex projects. For example, begin by predicting obesity using logistic regression, then progress to detecting sleep disorders with LSTMs. This approach minimizes frustration and ensures steady skill development.***bold text***


In [None]:
df = pd.read_csv('/content/d_sih.csv')

https://github.com/ritvikbhatia/Early-Prediction-of-Lifestyle-Diseases

In [None]:
df.head()

Unnamed: 0,Age,Bmi,Drinking,Excercise,Gender,Junk,Sleep,Smoking,Diabetes,Hypertension,Depression,output
0,46,15,1,1,1,2,1,0,88.12,77.05,44.04,2
1,8,34,1,2,0,1,2,0,21.26,45.6,5.35,3
2,63,15,0,3,0,3,1,0,56.64,37.7,49.0,2
3,42,32,0,2,0,2,2,1,42.72,47.4,52.1,1
4,45,30,0,2,1,1,1,1,44.76,20.06,38.68,2


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 996 entries, 0 to 995
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Age           996 non-null    int64  
 1   Bmi           996 non-null    int64  
 2   Drinking      996 non-null    int64  
 3   Excercise     996 non-null    int64  
 4   Gender        996 non-null    int64  
 5   Junk          996 non-null    int64  
 6   Sleep         996 non-null    int64  
 7   Smoking       996 non-null    int64  
 8   Diabetes      996 non-null    float64
 9   Hypertension  996 non-null    float64
 10  Depression    996 non-null    float64
 11  output        996 non-null    int64  
dtypes: float64(3), int64(9)
memory usage: 93.5 KB
