## Alzheimer's Disease and Healthy Aging in the United States: Project Documentation
## Author: Jose Pantaleon Hernandez-Rodriguez
Date: 06-11-2024

## Project Overview
This project examines the Alzheimer’s Disease and Healthy Aging Data in the United States, with a particular focus on trends in disease prevalence, caregiving demands, and economic costs. The primary objective is to uncover patterns, compare key metrics across different states, and highlight possible health disparities and challenges related to Alzheimer’s disease.

The dataset, sourced from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS), includes data on demographics, health conditions, and caregiving. This analysis also incorporates recent statistics from Perplexity, covering years 2019-2024 for prevalence, caregiving, and economic impacts. The data was accessed from Perplexity’s Search Tool.

## Data Preparation and Cleaning
The initial dataset comprised 214,462 records across 29 columns, detailing yearly statistics, health conditions, geographic locations, and caregiver demographics.

## Steps Taken:

Data Inspection and Validation: Upon loading the data, a thorough inspection was performed to understand data completeness, column relevance, and identify any inconsistencies. Essential columns such as LocationAbbr, Data_Value, YearStart, and stratification categories were preserved for analysis.
Handling Missing Values: To ensure accurate analyses, columns with a high percentage of missing data were removed, such as Sample_Size. For key numerical columns like Low_Confidence_Limit and High_Confidence_Limit, missing values were managed using the errors='coerce' method, which converts any non-numeric or erroneous entries into NaN values.
Data Type Conversion: To facilitate numerical analyses, non-numeric columns were converted as needed. This step was crucial for data operations involving means, groupings, and statistical visualizations, particularly for Data_Value, which formed the core measure across classes.
Data Integration: To enrich the dataset, recent data from Perplexity was used to create a table outlining Alzheimer’s disease statistics from 2019 to 2024. This external dataset provided an updated view of key Alzheimer’s metrics, including prevalence, mortality, caregiving, and associated economic costs.
Analytical Techniques
To reveal trends and gain insights from the data, several exploratory data analysis (EDA) techniques and visualizations were employed using Matplotlib, Seaborn, and Plotly. Key analyses included:

## 1. Exploratory Data Analysis (EDA)
Descriptive Statistics: Summary statistics, including means, medians, and ranges, were generated for Data_Value across the dataset’s categories. This allowed us to evaluate the variability within each category, such as mental health or caregiving needs, across different U.S. states.
Categorical Analysis: The dataset’s categorical variables (Class, StratificationCategory1, etc.) were analyzed to determine how each demographic segment was affected by Alzheimer’s. Value counts and frequency distributions provided insight into the data’s structure and identified the most commonly observed data points.
## 2. Data Visualization Techniques
Visualizations were essential for understanding trends, comparing categories, and identifying geographic disparities:

Distribution Analysis: Histograms were created to analyze Data_Value distributions across categories like Caregiving and Cognitive Decline. Box plots were also utilized to reveal outliers and the spread within categories across states.
Correlation Analysis: A heatmap of the correlation matrix was generated to investigate relationships among numerical variables, such as caregiving hours, cognitive decline scores, and health status. This highlighted potential connections between physical health and Alzheimer’s risk factors.
Time-Series Visualization: Trends over time were illustrated using line plots for key metrics, such as total care costs and the number of unpaid caregivers, revealing gradual increases over the years.
Grouped Bar Plots and Comparison by Location: Data was segmented by state and demographic class (e.g., age, gender), enabling us to compare states with the highest and lowest scores in caregiving, cognitive decline, and mental health indicators.
Challenges Encountered
Throughout this project, various challenges were encountered that required careful handling to ensure accurate and meaningful results:

## Data Completeness and Consistency:
Missing values across several key metrics posed a significant challenge. Some data points, such as mortality rates or unpaid caregiving hours, were not available for all years, requiring adjustments to ensure consistent analysis.
Complexity of Multi-Dimensional Data: The dataset included multiple levels of stratification (e.g., age, gender, state, and category), which made it challenging to streamline analyses. Group-by operations and data pivoting techniques helped manage this complexity, allowing for more manageable and interpretable insights.
## Data Visualization and Presentation Quality: 
To ensure clarity, various visual parameters like figure size, label orientation, and color palette needed adjustment. This iterative process was essential to present the findings effectively, making it easy for stakeholders to interpret data insights.
Data Sources

## This project utilized two primary sources of data:

CDC’s Behavioral Risk Factor Surveillance System (BRFSS): The core dataset, containing demographic and health-related data for the U.S. population. The data is organized by year, state, and health-related categories, allowing for comprehensive analysis.
Perplexity Data (2019-2024): Additional, up-to-date statistics on Alzheimer’s prevalence, mortality, caregiving statistics, and economic costs for the years 2019 to 2024, enriching the analysis with current information.
Summary of Findings
## Increasing Alzheimer’s Prevalence:

The prevalence of Alzheimer’s among Americans aged 65 and older has steadily increased, with a recorded rise from 5.8 million in 2020 to 6.9 million in 2024. This trend is likely driven by the aging population and an increase in life expectancy.
## Economic and Caregiving Burden: 

Both the economic costs and the demands on unpaid caregivers have risen significantly. In 2023, for example, the estimated value of unpaid caregiving hours reached $346.6 billion, reflecting the substantial support provided by family and friends.
## Geographic Disparities in Health Metrics: 

Analysis highlighted notable geographic disparities in Alzheimer’s-related metrics. Southern states like Alabama and Mississippi exhibited higher average scores in cognitive decline and caregiving needs, indicating regional disparities in Alzheimer’s risk factors.

## Correlations with Lifestyle Factors:

Lifestyle-related indicators, such as physical activity and smoking, showed a relationship with Alzheimer’s metrics. States with lower activity rates and higher smoking rates tended to show increased Alzheimer’s-related issues, suggesting areas for potential public health intervention.
Conclusion

This analysis underscores the pressing impact of Alzheimer’s disease on health, caregiving demands, and economic costs across the United States. The findings provide insights that can inform public health strategies, helping prioritize areas with the highest needs and develop preventive initiatives to alleviate Alzheimer’s impact on families and the healthcare system.

This document supplements the presentation with detailed descriptions of data sources, analytical techniques, and visualization strategies, reflecting the comprehensive effort invested in the project. For further exploration of findings, interactive visualizations, and additional analyses, please refer to the project repository and presentation slides.

