# Healthy Aging

The topic of aging and death is one that pervades conversations and societies alike. Our relationship with death is one of both acceptance and prolonging.

The Department of Health and Human Services have provided the public with a dataset of information collected on human activity and health from 2015 to 2021. I will be analyzing this data to see if I can find any interesting patterns or answer any questions.

In [1]:
from matplotlib import pyplot as plt
from scipy.stats import chi2_contingency

import pandas as pd
import numpy as np
import seaborn as sns
import itertools

In [4]:
aging_data = pd.read_csv('Alzheimer_s_Disease_and_Healthy_Aging_Data.csv', encoding='utf-8', low_memory=False)

In [5]:
aging_data.head()

Unnamed: 0,RowId,YearStart,YearEnd,LocationAbbr,LocationDesc,Datasource,Class,Topic,Question,Response,...,QuestionID,ResponseID,LocationID,StratificationCategoryID1,StratificationID1,StratificationCategoryID2,StratificationID2,StratificationCategoryID3,StratificationID3,Report
0,BRFSS~2015~2015~9002~Q43~TOC11~AGE~RACE,2015,2015,MDW,Midwest,BRFSS,Overall Health,Arthritis among older adults,Percentage of older adults ever told they have...,,...,Q43,,9002,AGE,65PLUS,RACE,NAA,,,
1,BRFSS~2015~2015~66~Q43~TOC11~AGE~GENDER,2015,2015,GU,Guam,BRFSS,Overall Health,Arthritis among older adults,Percentage of older adults ever told they have...,,...,Q43,,66,AGE,5064,GENDER,FEMALE,,,
2,BRFSS~2015~2015~9002~Q43~TOC11~AGE~RACE,2015,2015,MDW,Midwest,BRFSS,Overall Health,Arthritis among older adults,Percentage of older adults ever told they have...,,...,Q43,,9002,AGE,AGE_OVERALL,RACE,BLK,,,
3,BRFSS~2015~2015~16~Q27~TMC03~AGE~GENDER,2015,2015,ID,Idaho,BRFSS,Mental Health,Lifetime diagnosis of depression,Percentage of older adults with a lifetime dia...,,...,Q27,,16,AGE,5064,GENDER,MALE,,,
4,BRFSS~2015~2015~18~Q43~TOC11~AGE~OVERALL,2015,2015,IN,Indiana,BRFSS,Overall Health,Arthritis among older adults,Percentage of older adults ever told they have...,,...,Q43,,18,AGE,AGE_OVERALL,OVERALL,OVERALL,,,


This is a relatively daunting dataset, but we can work with it pretty well. The important columns to note are the following:

1) YearStart, YearEnd - These columns indicate the year in which that row of data was collected.
2) Class, Topic, Question - These three columns provide more specificity for what a given row of data is measuring.
3) DataValue columns - These columns provide numeric answers (and units) to the questions asked using the above columns.
4) ConfidenceLimit columns - These columns tell us the upper and lower bounds that HHS has set on the "true" answers to the questions asked*. This may be confusing to some readers - I will explain the statistics behind this below.
5) Stratification columns - These columns give demographical specifications for who is being sampled with the question in the row.

**Statistical Explanation for 4)**
In statistics, we often describe upper and lower bounds for the "actual" numeric answer to a question when we're dealing with a sample of answers. This is good for studying the US population because it is difficult to ask every single person we are trying to study within such a large population. As such, we describe the bounds, which come together to form what is known as a Confidence Interval.

Confidence Intervals often come with percentage specifications, which tell us how *confident* we are that the actual answer falls in that *interval*. When the percentage is not given, such as in this data set, it is custom to assume a value of 95%. Thus, we will treat the confidence intervals provided as 95% confidence intervals (CIs). This means that we say with 95% confidence that the true answer falls in the given interval.

The true answer, of course, being the answer for the entire population being studied, as opposed to the data collected, which applies to the sample that was studied.

## -First Question-