# Research Question

## How does the prevalence of asthma and Chronic Obstructive Pulmonary Disease (COPD) vary by state in the United States, and are there any significant geographic patterns or disparities?

# Problem Statement

The prevalence of asthma and Chronic Obstructive Pulmonary Disease (COPD) presents significant public health concerns in the United States, with variations observed across different states. Understanding the geographic disparities in these respiratory conditions can provide insights into environmental, socioeconomic, and healthcare access factors that contribute to their prevalence. Given the increasing burden of respiratory diseases on health systems, it is essential to identify patterns that could inform targeted interventions and policy changes. This analysis hypothesizes that certain states will exhibit higher rates of asthma and COPD due to factors such as air quality, socioeconomic status, and access to healthcare services.

# Data Description:

United States Chronic Disease Indicators (CDI)
Last Updated: March 9, 2024
Source: CDC's Division of Population Health

The U.S. Chronic Disease Indicators dataset provides a set of 115 public health indicators developed through a collaborative effort involving the CDC, the Council of State and Territorial Epidemiologists, and the National Association of Chronic Disease Directors. These indicators enable consistent collection, reporting, and analysis of chronic disease data at the state and territorial levels. The dataset is designed to support public health practice, offering state-specific data while serving as a gateway to additional health-related data and resources.

The dataset is publicly accessible and available in multiple formats, including CSV, RDF, JSON, and XML, and I'll be accessing the CSV dataset that's available.

# Code
## Examine the dataset
Describe Methodology. Load the Dataset, check the data types, see the missing values, drop the columns are empty, make an analysis of the data
### Import libraries

In [None]:
# Import the libraries
import numpy as np                  # Numerical Python
import pandas as pd                 # Data Analysis
import matplotlib.pyplot as plt     # Plotting
import seaborn as sns               # Statistical Data Visualization

# Let's make sure pandas returns all the rows and columns for the dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Force pandas to display full numbers instead of scientific notation
#pd.options.display.float_format = '{:.0f}'.format

# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')

## Read the dataset

In [None]:
# Read the dataset
df = pd.read_csv('U.S._Chronic_Disease_Indicators.csv')

## Understanding the dataset

In [None]:
# Checking first elements of the DataFrame with `.head( )` method
df.head()

# Checking how many columns are in the dataframe
print('There are ' + str(df.columns))

# Checking last elements of the DataFrame with `.tail( )` method
df.tail()

In [None]:
# display the dimensions of the data
# This is the number of rows and columns in the data
# Syntax: DataFrame.shape
df.shape

In [None]:
# Let's check the basic information about the dataset
# Syntax: DataFrame.info()
df.info()

## Observations of the Data Set
