---------------------------------------------------------------------------------------------------


# 0 Introduction and Setup
### Scenario
HailMary Roofing Company, LLC is looking to expand their business into new states. Repair and replacement from hail damage is their specialty and makes up a significant portion of their business. They've hired a data analyst to identify the best US states for their new roofing teams.

### Summary
The NOAA Storm Events Database is a comprehensive dataset maintained by the National Oceanic and Atmospheric Administration (NOAA). It records severe weather events across the United States, including hail, tornadoes, floods, and other significant storms. The dataset provides detailed event information, such as date, location (state and county), event magnitude (e.g., hail size), and reported damages. With data spanning multiple years, it serves as a valuable resource for analyzing historical weather patterns and assessing the frequency and impact of severe storms.
The NOAA Storm Events Database was chosen for this analysis because it provides detailed records of hail events across the United States. Hail damage is a major driver of roofing repairs and replacements, making this dataset highly relevant to identifying expansion opportunities for the roofing company. The dataset includes state- and county-level geographic data, allowing for targeted analysis of high-risk areas. Additionally, its historical coverage enables trend analysis to identify regions with consistent hail activity, helping the company make informed business decisions about where to expand.

The American Community Survey (ACS) is a nationwide survey conducted by the U.S. Census Bureau that provides detailed demographic, social, economic, and housing data every year. The housing data from the ACS includes statistics on housing occupancy, types of housing units, home values, mortgage status, rent, and housing costs. This data is essential for understanding housing conditions and trends at the local level, such as counties and cities. This data can be used to help select potential locations that would support business expansion. For counties with small populations, the data may be suppressed or not reported due to data reliability concerns or because the sample size is too small to produce statistically valid results. Generally, geographies with a population threshold (e.g., at least 65,000 people) are included in the detailed ACS reporting, while smaller areas may have limited or no data. For this reason, this analysis looks at only selected counties. Additional information about the dataset can be found here: https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html

The counties geojson provided by https://gis-txdot.opendata.arcgis.com/datasets/texas-county-boundaries/explore.


# 1 Setup

### Import visualization libraries

In [None]:
# Import libraries
import pandas as pd          # Data manipulation
import numpy as np           # Numerical operations
import matplotlib.pyplot as plt  # Plotting
import seaborn as sns        # Data visualization
import os                    # File path operations
import sweetviz as sv
import folium
import json
from folium import Choropleth

In [None]:
# Define path to data folder
data_path = r"C:\Users\nsmith\OneDrive - Georgia Poultry Laboratory Network\CareerFoundry\02 - Data Immersion\Achievement 6\01 Data"

In [None]:
county_geo = os.path.join(data_path, "Texas_County_Boundaries.geojson")

In [None]:
df_path = os.path.join(data_path, "merged_data.csv")

In [None]:
# Load dataframe
df = pd.read_csv(df_path)

## 2 Examine Data

In [None]:
df.columns

In [None]:
# Load your GeoJSON file
with open(county_geo, "r") as f:
    texas_geo = json.load(f)

In [None]:
# Print county names from GeoJSON
for feature in texas_geo['features']:
    print(feature['properties']['CNTY_NM'])

## 4 Wrangle Data

#### Clean the county name column in the housing/storm data

In [None]:
# Clean county names in df (housing and storm data)
df['County'] = df['County'].str.strip().str.upper()

In [None]:
# Sorted list of unique counties
print(df['County'].unique())

#### Clean the county name in the geojson

In [None]:
# Clean county names in the GeoJSON
for feature in texas_geo['features']:
    feature['properties']['CNTY_NM'] = feature['properties']['CNTY_NM'].strip().upper()

In [None]:
# Print county names from GeoJSON
for feature in texas_geo['features']:
    print(feature['properties']['CNTY_NM'])

# 5 Create a Map

#### Create a choropleth map with the joined hail storm counts

In [None]:
# Create a folium map centered on Texas
m = folium.Map(location=[31.0, -99.0], zoom_start=6)

# Add the choropleth layer
Choropleth(
    geo_data=texas_geo,
    name="choropleth",
    data=df,
    columns=["County", "Total_Storm_Events"],  # Replace with the column you want to map
    key_on="feature.properties.CNTY_NM",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Number of hail storm events",
).add_to(m)

# Add layer control
folium.LayerControl().add_to(m)

# Show the map
m

The choropleth map above shows the total count of reported hail storm events for each county over 5 years (2020-2024). The analysis shows that many counties in Texas do not have reported hail events, and there are clusters of counties with reported events. These clusters may correspond with populated counties, such as those surrounding Houston and Dallas. Further analysis using the housing data below can help answer this question.

In [None]:
# Create a folium map centered on Texas
m = folium.Map(location=[31.0, -99.0], zoom_start=6)

# Add the choropleth layer
Choropleth(
    geo_data=texas_geo,
    name="choropleth",
    data=df,
    columns=["County", "HOUSING OCCUPANCY Total housing units"],  # Replace with the column you want to map
    key_on="feature.properties.CNTY_NM",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Number of housing units",
).add_to(m)

# Add layer control
folium.LayerControl().add_to(m)

# Show the map
m

The analysis above helps answer our previous research question, which is whether some counties impacted by hail events have a higher number of housing units than others. This information will help to identify those counties in Texas which have both a high number of hail events and a high number of single family homes, which is our next research question.