# Analysis of Weight Categories Among Adults in the United States

## Introduction
In this project, we would explore a dataset about the weight categories of adults aged 20 and over in the United States. This dataset encompasses a range of years and presents data on normal weight, overweight, and obesity percentages among the adult population. This analysis is crucial in understanding the trends and patterns in adult weight categories, which are pivotal for public health planning and interventions.

## Dataset Overview
The dataset titled have several key features, including year, age groups, and estimates of the population percentages in different weight categories. The data provides a rich source for examining changes over time and across different demographic slices.

## Research Questions
The analysis would base on these research questions:
1. How have the percentages of adults in normal weight, overweight, and obese categories changed over the years?
3. Does the data show trends in weight categories based on other demographic characteristics?
4. What is the trend in standard errors (SE) over the years for each weight category, and what does it suggest about data reliability or population health patterns?
5. How does the distribution of weight categories vary over different time periods?
5. What are the top 3 periods with the highest average percentages for obesity?
6. What is the proportion of each weight category in the most recent year available in the dataset?

## Skills and Tools Used
We will employ various data analysis skills and tools, primarily focusing on:
- **Python Programming**: For all aspects of data handling and processing.
- **Pandas Library**: For data manipulation and analysis.
- **Matplotlib and Seaborn Libraries**: For data visualization to aid in interpreting the data and presenting our findings.
- **Exploratory Data Analysis Techniques**: To uncover underlying patterns and insights in the dataset.




# Data Loading

The dataset is loaded using Pandas. The initial few rows are displayed to understand the structure and content of the data.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('weightData.csv')

# Display the first few rows to understand the structure
print(data.head())


# Data Cleaning and Preprocessing

First, we check for any missing values in the dataset. Depending on the nature and amount of missing data, appropriate actions like dropping or filling missing values are taken. Additionally, data type conversions are performed as needed for accurate analysis.


In [None]:

# Checking for missing values
print(data.isnull().sum())


## Research Question 1 How has the percentage of adults in the different weight categories changed over the years?
We will group the data by 'YEAR' and 'PANEL' (which represents weight categories) and calculate the average percentage for each category over the years



In [None]:
# Group by 'YEAR' and 'PANEL' and calculate mean
trends_over_years = data.groupby(['YEAR', 'PANEL']).mean().reset_index()

# Plotting
plt.figure(figsize=(12, 6))
for category in trends_over_years['PANEL'].unique():
    plt.plot(trends_over_years[trends_over_years['PANEL'] == category]['YEAR'], 
             trends_over_years[trends_over_years['PANEL'] == category]['ESTIMATE'],
             label=category)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.title('Trends in Weight Categories Over Years')
plt.legend()
plt.show()

We can see that the trends of people in normal weight is keep declining and the rest are rising over years.


# Research Question 2: Are there significant differences in weight categories across different age groups?
To analyze the differences in weight categories across different age groups, we'll use a function to calculate the average percentage for each weight category within each age group.

In [None]:
def analyze_weight_by_age_group(data):
    age_group_analysis = data.groupby(['AGE', 'PANEL']).mean()['ESTIMATE'].unstack()
    return age_group_analysis

# Applying the function
age_group_analysis = analyze_weight_by_age_group(data)
print(age_group_analysis)



# Research Question 3: What is the trend in standard errors (SE) over the years for each weight category?
This question focuses on the reliability of data over time. We'll examine how the standard error for each weight category has changed over the years.

In [None]:
# Group by 'YEAR' and 'PANEL' for SE
se_trends_over_years = data.groupby(['YEAR', 'PANEL'])['SE'].mean().reset_index()

# Plotting SE trends
plt.figure(figsize=(12, 6))
for category in se_trends_over_years['PANEL'].unique():
    plt.plot(se_trends_over_years[se_trends_over_years['PANEL'] == category]['YEAR'], 
             se_trends_over_years[se_trends_over_years['PANEL'] == category]['SE'],
             label=category)
plt.xlabel('Year')
plt.ylabel('Standard Error')
plt.title('Trends in Standard Errors Over Years by Weight Category')
plt.legend()
plt.show()


# Research Question 4: How does the distribution of weight categories vary over different time periods?
We will create a function to calculate the mean and standard deviation of weight categories for each time period. This statistical analysis will give us an insight into the distribution and variability of weight categories over time

In [None]:
def weight_distribution_over_time(data):
    distribution_over_time = data.groupby(['YEAR', 'PANEL']).agg(['mean', 'std'])['ESTIMATE']
    return distribution_over_time

# Applying the function
time_distribution_analysis = weight_distribution_over_time(data)
print(time_distribution_analysis)

# Research Question 5: What are the top 3 periods with the highest average percentages for obesity?
This analysis aims to identify the years with the highest average obesity rates. We will calculate the average percentage of obesity for each year and then identify the top three years with the highest rates.

In [None]:
def top_years_for_obesity(data):
    obesity_data = data[data['PANEL'] == 'Obesity (BMI greater than or equal to 30.0)']
    top_years = obesity_data.groupby('YEAR')['ESTIMATE'].mean().sort_values(ascending=False).head(3)
    return top_years

# Applying the function
top_obesity_years = top_years_for_obesity(data)
print("Top 3 years with the highest average obesity rates:\n", top_obesity_years)


The result indicates that for the time periods 2015-2018, 2013-2016, and 2009-2012, the average obesity rates were approximately 40.53%, 38.51%, and 38.47% respectively. These are the highest among all the time periods included in your dataset.

# Research Question 6: What is the proportion of each weight category in the most recent year available in the dataset?
To visualize the current distribution of weight categories, we will use a pie chart. This chart will show the proportion of each weight category (normal weight, overweight, obesity) in the most recent year of data.

In [None]:
def latest_year_proportions_filtered(data):
    latest_year = data['YEAR'].max()
    # Filter for the specific categories
    filtered_data = data[data['PANEL'].isin(['Obesity (BMI greater than or equal to 30.0)',
                                             'Overweight or obese (BMI greater than or equal to 25.0)',
                                             'Normal weight (BMI from 18.5 to 24.9)'])]
    latest_data = filtered_data[filtered_data['YEAR'] == latest_year]
    proportions = latest_data.groupby('PANEL').sum()['ESTIMATE']
    return proportions, latest_year

# Applying the function
proportions, latest_year = latest_year_proportions_filtered(data)

# Plotting
plt.figure(figsize=(8, 8))
plt.pie(proportions, labels=proportions.index, autopct='%1.1f%%', startangle=140)
plt.title(f'Proportion of Weight Categories in {latest_year}')
plt.show()


# Conclusion
Through the analysis of the dataset "Normal weight, overweight, and obesity among adults aged 20 and over, by selected characteristics, United States," several key insights have been revealed. These findings are instrumental for public health officials, policymakers, and healthcare providers in understanding and addressing the challenges related to weight management in the adult population. The data underscores the need for targeted health interventions and continued monitoring of weight trends to combat the rising prevalence of overweight and obesity, which are known risk factors for various chronic diseases.

In summary, the dataset not only highlights the current state of adult weight categories in the United States but also serves as a call to action for tailored health initiatives and policies to improve the overall health and well-being of the population.