# **Health and Lifestyle Analysis**

## **Project Overview**

This project aims to analyze a dataset containing various health and lifestyle metrics of individuals. The dataset includes demographic information, sleep patterns, physical activity levels, and health indicators. Through this analysis, we will explore relationships between different variables, and identify trends.


### **Dataset Description**
The dataset contains the following columns:
- Person ID: Unique identifier for each individual.
- Gender: Gender of the person (Male/Female).
- Age: Age of the person in years.
- Occupation: Occupation or profession of the person.
- Sleep Duration (hours): Daily sleep duration in hours.
- Quality of Sleep (scale: 1-10): Subjective rating of sleep quality.
- Physical Activity Level (minutes/day): Daily physical activity in minutes.
- Stress Level (scale: 1-10): Subjective rating of stress level.
- BMI Category: BMI classification (Underweight, Normal, Overweight).
- Blood Pressure (systolic/diastolic): Blood pressure measurement.
- Heart Rate (bpm): Resting heart rate in beats per minute.
- Daily Steps: Number of steps taken per day.
- Sleep Disorder: Indicates whether the individual has a sleep disorder.

---



## **Task 1: Data Exploration**
### **Objective**: Familiarize yourself with the dataset.
Actions:
- Load the dataset into a Pandas DataFrame.
- Display the first few rows of the dataset.
- Summarize the dataset using .describe() to understand basic statistics (mean, median, etc.).
- Convert column names to standard format
- Give your observations

In [86]:
# Import libraries for data manipulation


# Import libraries for data visualization

In [87]:
# Load dataset


In [117]:
# Display the first few row


In [116]:
# Summary Statistics


**Observations**:

-

In [118]:
# Convert column names to standard format

### **Question 1:** How many rows and columns are present in the data?

### **Question 2:** What are the datatypes of the different columns in the dataset? (The info() function can be used)

### **Question 3**: Classify each column into the following categories:

**Qualitative (Categorical)**:
- Nominal: Variables with no inherent order
- Ordinal: Variables with a meaningful order

**Quantitative (Numerical):**
- Discrete: Countable, integer values
- Continuous: Measurable, can take any value within a range

## **Task 2: Data Cleaning and Preprocessing**
### **Objective**: Prepare the dataset for analysis.
Actions:
- Check for missing values and if any, the handle missing values appropriately (e.g., fill with mean/median or drop rows/columns).
  - Use `data.isnull().sum()` or `data.isna().sum()` to identify columns with missing values.
- Ensure that all data types are correct (e.g., convert categorical variables to category type).
- Create new columns if necessary (e.g., extract systolic and diastolic blood pressure from a single column).

In [93]:
# Checking for missing values


### **Question 1**: Are there any duplicates?
- Use `data.duplicated().sum()` to identify the number of duplicate rows.

In [94]:
# Check for duplicates

## **Task 3: Exploratory Data Analysis (EDA)**
Exploratory Data Analysis (EDA) is a critical step in the data analysis process that involves summarizing the main characteristics of a dataset, often using visual methods.

### **Univariate Analysis**
Univariate Analysis is the simplest form of data analysis, where "uni" means "one." This type of analysis examines one variable independently without considering relationships with other variables.

**Objective**: Analyze individual variables to understand their distributions and characteristics.

Actions:
- Perform univariate analysis on key numerical variables such as `Age`, `Sleep Duration`, `Physical Activity Level`, `Daily Steps`, `Blood Pressure`, and `Heart Rate`.
- Visualize distributions of these numerical variables using histograms and box plots to identify patterns, central tendencies, and outliers.
- Conduct univariate analysis on categorical variables such as `Gender`, `Occupation`, `BMI Category`, `Quality of Sleep`, `Stress Level`, and `Sleep Disorder`. Use frequency counts and bar charts to summarize these variables.
- Identify any outliers in numerical data that may require further investigation or cleaning.
- Give your observations.

**Question 1**: What is the distribution of gender in the dataset (percentage of males vs. females)?

**Question 2**: What is the average age of individuals in the dataset?

**Question 3**: What are the most common occupations represented in the dataset?

**Question 4**: What is the average sleep duration reported by participants?

## **Task 4: Anomaly, and Outlier Detection**
### **Objective**: Identify and manage anomalies in the dataset.
Actions:
- Visualize distributions of key numerical variables using boxplots or histograms to identify outliers.
- Use the Interquartile Range (IQR) method to detect outliers in at least two numerical columns (e.g., Sleep Duration, Daily Steps).
  - Calculate the Quartiles first quartile (Q1) and third quartile Q3
  - Compute the $IQR = Q3 - Q1$
  - Determine the Bounds
    - Lower Bound: $Q1 - 1.5 \times IQR$
    - Upper Bound: $Q3 + 1.5 \times IQR$
    - Identify Outliers
- Decide whether to remove or adjust outliers based on your analysis.

- Give your observations.

## **Task 5: Data Normalization**
### **Objective**: Prepare data for comparative analysis.

Actions:
- Select numerical columns that require normalization (e.g., Age, Sleep Duration, Physical Activity Level).
- Apply Min-Max scaling or Z-score normalization to these columns.
- Create a new DataFrame to store normalized values alongside original data for comparison.

- Give your observations.