In [18]:
import pandas as pd

This notebook collects the data from three FEMA datasets into one large dataset to be merged with the others.

In [19]:
dec_df = pd.read_csv('DisasterDeclarationsSummaries 5.26.2025.csv', low_memory=False)
owners_df = pd.read_csv('HousingAssistanceOwners.csv', low_memory=False)
renters_df = pd.read_csv('HousingAssistanceRenters.csv', low_memory=False)

In [20]:
#obtain the incident date and year to merge with the other FEMA datasets and the hurricane dataset
dec_df['Date']=pd.to_datetime(dec_df['incidentBeginDate'],yearfirst=True)
dec_df['year']=dec_df['Date'].dt.year
dec_df.drop(columns=['incidentBeginDate'],inplace=True)

In [21]:
#filter only disaster declarations for hurricanes in the timeframe of interest
hur_df = dec_df[(dec_df['incidentType']=='Hurricane')&(dec_df['year']>=2002)&(dec_df['declarationType']=='DR')]

In [22]:
#obtain count of hurricane declarations by state.  This reflects total number of counties declared for hurricanes during the timeframe
hur_count = hur_df.groupby(by='state').count()

In [23]:
#obtain a list of states with the most hurricane declarations
top = hur_count['incidentType'].sort_values(ascending=False).head(10)
top_ten = top.index.tolist()

In [24]:
hur_df_final = hur_df[['disasterNumber','declarationTitle','state','Date','fipsStateCode','fipsCountyCode','placeCode','year','designatedArea']]

In [25]:
#obtain dataframe for major damage to houses that are owned by occupants
owners_df_final=owners_df[['disasterNumber','state','county','validRegistrations',
                           'femaInspectedDamageBetween1And10000','femaInspectedDamageBetween10001And20000',
                           'femaInspectedDamageBetween20001And30000','femaInspectedDamageGreaterThan30000']]

In [26]:
#obtain dataframe for major damage to homes that are rented by occupants
renters_df_final=renters_df[['disasterNumber','state','county','validRegistrations','totalWithModerateDamage',
                             'totalWithMajorDamage','totalWithSubstantialDamage']]

In [27]:
#sum number of houses with damage by county and damage level
owners_group = owners_df_final.groupby(by=['disasterNumber','state','county']).agg(validReg_own=('validRegistrations','sum'),
                                                                                  DamageBetween1and1000=('femaInspectedDamageBetween1And10000','sum'),
                                                                                  DamageBetween10001and2000=('femaInspectedDamageBetween10001And20000','sum'),
                                                                                  DamageBetween20001and3000=('femaInspectedDamageBetween20001And30000','sum'),
                                                                                  DamageGreaterThan3000=('femaInspectedDamageGreaterThan30000','sum'))

In [28]:
#sum number of rental units by county and damage level
renters_group=renters_df_final.groupby(by=['disasterNumber','state','county']).agg(validReg_rent=('validRegistrations','sum'),
                                                                                  ModerateDamage_rent=('totalWithModerateDamage','sum'),
                                                                                  MajorDamage_rent=('totalWithMajorDamage','sum'),
                                                                                  SubstDamage_rent=('totalWithSubstantialDamage','sum'))

In [29]:
renters_group.reset_index(inplace=True)
owners_group.reset_index(inplace=True)

In [30]:
#merge owners and renters dataframes
assess_combined = pd.merge(owners_group, renters_group, how='outer', on=['disasterNumber','state','county'])

In [31]:
#change 'designated area' to 'county' to merge with owner and renter damage dataframes
hur_df_final.rename(columns={'designatedArea':'county'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hur_df_final.rename(columns={'designatedArea':'county'}, inplace=True)


In [32]:
#merge based on disaster number to ensure damage is associated with the correct storm
fema_df = pd.merge(hur_df_final, assess_combined, how='inner', on=['disasterNumber','state','county'])

In [33]:
#create 'fipsCode' to merge with other dataframes later
fema_df['fipsCode']=None
for i in range(len(fema_df)):
    statecode=str(fema_df['fipsStateCode'].iloc[i])
    countycode=str(fema_df['fipsCountyCode'].iloc[i])
    if len(statecode)<2:
        statecode='0'+statecode
    if len(countycode)<3:
        if len(countycode)<2:
            countycode='00'+countycode
        else:
            countycode='0'+countycode
    fema_df['fipsCode'].iloc[i] = statecode+countycode

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  fema_df['fipsCode'].iloc[i] = statecode+countycode
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fema_df['fip

In [34]:
#obtain latitude and longitude for center of population for later use with determining distance from hurricane
geo_df=pd.read_csv('geo_data.txt')
geo_df.head()

Unnamed: 0,STATEFP,COUNTYFP,COUNAME,STNAME,POPULATION,LATITUDE_county,LONGITUDE_county
0,1,1,Autauga,Alabama,58805,32.500194,-86.487813
1,1,3,Baldwin,Alabama,231767,30.537396,-87.761478
2,1,5,Barbour,Alabama,25223,31.843981,-85.301306
3,1,7,Bibb,Alabama,22293,33.032236,-87.136826
4,1,9,Blount,Alabama,59134,33.954604,-86.592667


In [35]:
#create a fipsCode variable to merge the geographic information with FEMA data
geo_df['fipsCode']=None
for i in range(len(geo_df)):
    statecode=str(geo_df['STATEFP'].iloc[i])
    countycode=str(geo_df['COUNTYFP'].iloc[i])
    if len(statecode)<2:
        statecode='0'+statecode
    if len(countycode)<3:
        if len(countycode)<2:
            countycode='00'+countycode
        else:
            countycode='0'+countycode
    geo_df['fipsCode'].iloc[i] = statecode+countycode
geo_df.head()

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  geo_df['fipsCode'].iloc[i] = statecode+countycode
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  geo_df['fipsC

Unnamed: 0,STATEFP,COUNTYFP,COUNAME,STNAME,POPULATION,LATITUDE_county,LONGITUDE_county,fipsCode
0,1,1,Autauga,Alabama,58805,32.500194,-86.487813,1001
1,1,3,Baldwin,Alabama,231767,30.537396,-87.761478,1003
2,1,5,Barbour,Alabama,25223,31.843981,-85.301306,1005
3,1,7,Bibb,Alabama,22293,33.032236,-87.136826,1007
4,1,9,Blount,Alabama,59134,33.954604,-86.592667,1009


In [36]:
#create final df to export for combination with other variables for supervised learning
export_df=pd.merge(fema_df,geo_df,how='inner',on=['fipsCode'])

In [37]:
export_df.drop(columns=['placeCode','STATEFP','COUNTYFP','COUNAME','STNAME','POPULATION'],inplace=True)

In [38]:
#create a csv of combined FEMA data
export_df.to_csv('fema_final.csv', index=False)

In [39]:
#census_df=pd.read_excel("census_data.xlsx")