In [8]:
import pandas as pd
import plotly.express as px
import numpy as np

data = pd.read_csv('Big_Cities_Health_Data_Inventory.csv')


In [9]:
import re
def extract_state(place):
    match = re.search(r'\b[A-Z]{2}\b', place)
    return match.group(0) if match else None

data['State'] = data['Place'].apply(extract_state)

# Filter data for total HIV/AIDS cases by state (ignoring entries for 'U.S. Total')
data1 = data[data['State'].notna() & (data['Place'] != 'U.S. Total')]


Here we define a function extract_state using a regular expression to pull the two-letter state abbreviations from the Place field of our dataset. This function searches for patterns matching two consecutive uppercase letters. The resulting state codes are added to the dataset as a new column named State. This cell also filters out rows without a state code and entries labeled as 'U.S. Total', ensuring the dataset only contains relevant entries for individual states.

In [10]:
# Filter the dataset to extract the states for each of the three indicators and find the common states
indicator_1 = "Percent of Adults Over Age 65 Who Received Pneumonia Vaccine"
indicator_2 = "Percent of Adults Who Received Seasonal Flu Shot"
indicator_3 = "Percent of Children Who Received Seasonal Flu Shot"

# Extracting states for each indicator
states_1 = set(data1[data1['Indicator'] == indicator_1]['State'].unique())
states_2 = set(data1[data1['Indicator'] == indicator_2]['State'].unique())
states_3 = set(data1[data1['Indicator'] == indicator_3]['State'].unique())

# Finding common states across all three indicators
common_states = states_1.intersection(states_2, states_3)
common_states


{'AZ', 'CA', 'CO', 'IL', 'NY', 'WA'}

Here we focus on data filtering based on specific health indicators relating to vaccination rates. We identify states with available data for three specific indicators: "Percent of Adults Over Age 65 Who Received Pneumonia Vaccine", "Percent of Adults Who Received Seasonal Flu Shot", and "Percent of Children Who Received Seasonal Flu Shot". We extract states for each indicator, find common states across all three, and store these states in the variable common_states.

In [11]:
# Filtering data for the required indicators
indicators = [
    "Percent of Adults Over Age 65 Who Received Pneumonia Vaccine",
    "Percent of Adults Who Received Seasonal Flu Shot",
    "Percent of Children Who Received Seasonal Flu Shot"
]

# Prepare the data
data_filtered = data1[data1['Indicator'].isin(indicators)]
data_filtered=data_filtered[data_filtered['State'].isin(common_states)]
pivot_data = data_filtered.pivot_table(index='State', columns='Indicator', values='Value', aggfunc='mean')


Here we further refine our dataset by focusing on the common states identified previously and specific health indicators. The data is filtered to include only those rows corresponding to our target states and indicators. We then pivot the data to create a new table organized by state and indicator, with the mean values for each combination. This pivot table is designed to facilitate easy analysis and comparison of vaccination rates across the selected states for the given health indicators.

In [12]:
import plotly.express as px

# Create the heatmap
fig = px.imshow(pivot_data,
                labels=dict(x="Indicator", y="State", color="Vaccination Rate"),
                x=['Old', 
                   'Adult(Under 65)', 
                   'Children'],
                title="Vaccination Rates by State and Indicator"
               )

fig.update_xaxes(side="bottom")
fig.show()



The heatmap shows the vaccination rates for influenza and pneumonia across various U.S. states, segmented by specific demographic groups: the elderly, adults under 65, and children. The color gradient from purple to yellow is employed effectively to denote a range from lower to higher vaccination rates, respectively. It's evident that the elderly population generally shows higher vaccination rates, likely due to targeted health initiatives aimed at protecting this more vulnerable group from severe respiratory illnesses. States such as Colorado and Arizona, as shown by the darker shades in certain sections, might be lagging in vaccination efforts for specific groups, such as children in Colorado, pointing to potential areas for public health intervention. This visualization not only reflects the effectiveness of state-level health policies in promoting vaccination but also underscores the varying healthcare access and public health prioritization across states. Such insights are critical for policymakers and health officials aiming to improve coverage and protect populations from these potentially deadly diseases.

In [13]:

# Filter and process data as needed
data_filtered = data1[    (data1['State'].isin(common_states)) &
    (data1['Indicator Category'] == 'Infectious Disease') &
    (data1['Indicator'] == 'Pneumonia and Influenza Mortality Rate (Age-Adjusted; Per 100,000 people)')
]
data_filtered['State'] = data_filtered['Place'].apply(lambda x: x.split(', ')[-1])

# Aggregate data by state
state_data = data_filtered.groupby('State')['Value'].mean().reset_index()

# Create the bubble chart


bubble_chart = px.scatter(
    state_data,
    x='State',
    y='Value',
    color='State',
    size='Value',
    hover_name='State',
    title='Pneumonia and Influenza Mortality Rate by State (Per 100,000 people)'
)
bubble_chart.update_layout(xaxis_title="State", yaxis_title="Average Mortality Rate")
bubble_chart.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



This scatter plot visualizes the average mortality rates from pneumonia and influenza across selected U.S. states, measured per 100,000 people. Each state is represented by a distinct colored dot, placed according to its respective mortality rate on the y-axis. The state of Washington (WA) notably has the lowest mortality rate, depicted by the light blue dot near the bottom of the graph, suggesting effective health interventions or higher population immunity levels. In contrast, Colorado (CO) represented by the green dot, exhibits the highest mortality rate, indicating potential areas for improvement in public health strategies or healthcare access. Other states like Arizona (AZ) and Illinois (IL) also show substantial rates, albeit less than Colorado, pointing towards varying degrees of success in managing these respiratory diseases. The visualization underscores the geographical disparities in health outcomes, which could be influenced by factors such as state-specific healthcare policies, vaccination rates, and the prevalence of underlying health conditions among the population. This type of data is crucial for health officials and policymakers to identify trends, allocate resources, and tailor public health initiatives effectively to reduce mortality from these preventable diseases.