# Galveston Testbed Notebook

# Background:
Galveston Island is a barrier island located southeast of Houston, TX. The region has been repeatedly impacted by coastal storms and flood hazards, and has a population that is racially and ethnically diverse, with a wide income distribution. This testbed was created to provide an opportunity to: 

a)	Investigate the multi-hazard surge, wave, inundation, and wind hazards in coastal settings.

b)	Consider interdependent infrastructure systems including buildings, transportation, and power.

c)	Leverage historical social-scientific data, informing population dislocation and recovery modeling. 

d)	Evaluate hybrid metrics of community resilience, such as those that require coupled modeling between social and physical systems.


<img src="Figures for Pyincore Notebook/Galveston.png">
<h1><center>Galveston Island, Texas, USA</center></h1> 

The current notebook is a **WORK-IN-PROGRESS** that consists of the following modules:

a)	Flood Surge, Wave, and Inundation Models 

b)	Galveston Building Damage Analysis 

c)	Galveston Household Unit Allocation

d)	Galveston Population Dislocation Model based on Hurricane IKE

Other modules such as road and bridge damage analysis, power system analysis, and network analysis to investigate the connectivity of building clusters to emergency services and power will be added to the Notebook in the near future, as associated models are deployed in INCORE.


## Galveston Building Damage

In [1]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted

from pyincore import IncoreClient, Dataset, FragilityService, MappingSet, DataService
from pyincore.analyses.buildingdamage.buildingdamage import BuildingDamage

In [2]:
# Check package versions - good practice for replication
print("Python Version ",sys.version)
print("pandas version: ", pd.__version__)
print("numpy version: ", np.__version__)

Python Version  3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:33:30) 
[Clang 9.0.1 ]
pandas version:  1.0.5
numpy version:  1.19.0


In [3]:
# Check working directory - good practice for relative path access
os.getcwd()

'/Users/mo/dev/pyincore_app'

In [4]:
client = IncoreClient()

Connection successful to IN-CORE services. pyIncore version detected: 0.9.0


# Flood Surge, Wave, and Inundation Models:

Galveston Island was struck by Hurricane Ike in September, 2008, with maximum windspeeds of 49 m/s (95 kts) and storm surge elevations reaching at least +3.5 m (NAVD88) on Galveston Island. A full hindcast of Hurricane Ike’s water levels, and wave conditions along with 2%, 1%, and 0.2% Annual Exceedance Probabilities (AEP) were created using the dynamically coupled versions of the Advanced Circulation (ADCIRC) and Simulating Waves Nearshore (SWAN) models. The hindcast simulation was performed using a high-resolution unstructured mesh of the Texas coast, with coverage of the entire Gulf of Mexico basin, having more than 3.3 million nodes and 6.6 million mesh elements. The data are available in terms of 100-m rasterized files with the following IDs in INCORE:

- Hurricane Ike Hindcast: ID:5fa5a228b6429615aeea4410

- 2% Annual Exceedance Probability (50-yr return period): ID: 5fa5a83c7e5cdf51ebf1adae

- 1% Annual Exceedance Probability (100-yr return period): ID: 5fa5a9497e5cdf51ebf1add2

- 0.2% Annual Exceedance Probability  (500-yr return period): ID: 5fa5aa19b6429615aeea4476

There is also a hazard scenario generated based on historical data with the following ID: 5f15cd627db08c2ccc4e3bab


# Building Inventory

The building inventory for Galveston consists of 18962 individual residential households. This inventory is also mappable to housing unit info of 32501 individual households explained later in this notebook. It should be noted that the reason that the building and household data are different in terms of numbers is that each individual building can be composed of a few households. The building inventory consists of three major parameters that are used to estimate the fragility of buildings explained shortly later in this notebook. The three parameters are: 

a)	Elevation of the lowest horizontal structural member

b)	Age group of the building (1, 2,3, and 4 representing age group pre-1974, 1974–1987, 1987–1995, and 1995– 2008, respectively) 

c)	Elevation of the building with respect to the ground


# Building Fragility

The fragility model used to estimate failure probability during storm surge events is extracted from:

Tomiczek, T. Kennedy, A, and Rogers, S., 2013. Collapse limit state fragilities of wood-framed residences from storm surge and waves during Hurricane Ike. Journal of Waterway, Port, Coastal, and Ocean Engineering, 140(1), pp.43-55.

This empirical fragility model was developed based on Hurricane Ike surveys of almost 2000 individual wood-frame buildings coupled with high resolution hindcast of the hurricane. For this study two states of damage, “Collapse” and “Survival” were considered.
________________________________________
The input parameters to the fragility model are:

1) Surge: surge level (m) coming from hazard data

2) Hs: Significant wave height (m) coming from hazard data

3) LHSM: Elevation of the lowest horizontal structural member (ft) coming from building inventory

4) age_group: Age group of the building (1, 2,3, and 4 representing age group pre-1974, 1974–1987, 1987–1995, and 1995– 2008, respectively) coming from building Inventory

5) G_elev: Elevation of the building with respect to the ground (m) coming from building inventory
________________________________________
Output:
Pf: probability of failure
________________________________________
In order to calculate the probability of failure, first we need to estimate the relative surge height compared to the ground level from:
𝑑𝑠=𝑆𝑢𝑟𝑔𝑒−𝐺𝑒𝑙𝑒𝑣ds

Subsequently, we need to calculate the following parameter

𝐹𝐵ℎ𝑠=−(𝑑𝑠+0.7∗𝐻𝑠−𝐿𝐻𝑆𝑀∗0.3048)
Note: 0.3048 is to convert ft to m as the inventory data are in ft.

Then:

For FB_hs>= -2.79*Hs the probability of failure is calculated as:
𝑃𝑓=Φ(−3.56+1.52∗𝐻𝑠−1.73∗𝐻𝑠∗𝐹𝐵ℎ𝑠−0.31∗𝐹𝐵2ℎ𝑠−0.141∗𝑎𝑔𝑒2𝑔𝑟𝑜𝑢𝑝)

and for FB_hs< -2.79*Hs
𝑃𝑓=Φ(−3.56+1.52∗𝐻𝑠+2.42∗𝐹𝐵2ℎ𝑠−0.141∗𝑎𝑔𝑒2𝑔𝑟𝑜𝑢𝑝)
Where Φ denotes the Cumulative Density Function (CDF) of standard normal distribution.
________________________________________
Example:
If Surge=3 m, Hs =2 m, LHSM=9 ft, age_group=4; G_elev =1 m
Then Pf= 0.2620


In [5]:
hazard_type = "hurricane"
hazard_id = "5f15cd627db08c2ccc4e3bab"

bldg_dataset_id = "60354b6c123b4036e6837ef7"
# Hurricane Building Fragility Mapping
#mapping_id = "602c381a1d85547cdc9f0675" # prod

#fragility_service = FragilityService(client)  # loading fragility mapping
#mapping_set = MappingSet(fragility_service.get_mapping(mapping_id))

In [6]:
bldg_dmg = BuildingDamage(client)

bldg_dmg.load_remote_input_dataset("buildings", bldg_dataset_id)
#bldg_dmg.set_input_dataset("dfr3_mapping_set", mapping_set)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...


In [7]:
result_name = "Galveston_bldg_dmg_result"

bldg_dmg.set_parameter("result_name", result_name)
bldg_dmg.set_parameter("hazard_type", hazard_type)
bldg_dmg.set_parameter("hazard_id", hazard_id)
bldg_dmg.set_parameter("num_cpu", 4)

True

### Run Building Damage

In [8]:
#bldg_dmg.run_analysis()

In [9]:
# Retrieve result dataset
#building_dmg_result = bldg_dmg.get_output_dataset('ds_result')

In [10]:
# Convert dataset to Pandas DataFrame
#bdmg_df = result.get_dataframe_from_csv(low_memory=False)

# Display top 5 rows of output data
#bdmg_df.head()

## Galveston Housing Unit Allocation (HUA)

Housing Unit Allocation using Galveston Oregon Housing Unit Inventory

Here we link high-resolution spatial data on 32501 individual household and housing unit characteristics to residential buildings. Critical for linking socio-economic data within IN-CORE. For evacuation example HUA is required to identify the people that may not evacuate after event.
The models come from: Rosenheim, Nathanael, Roberto Guidotti, Paolo Gardoni & Walter Gillis Peacock. (2019). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure. doi.org/10.1080/23789689.2019.1681821


The Housing Unit Allocation Algorithm can be reviewed within IN-CORE on GitHub. Notebook from https://github.com/IN-CORE/pyincore/tree/develop/pyincore/analyses/housingunitallocation


In [11]:
from pyincore.analyses.housingunitallocation import HousingUnitAllocation

## Initial Interdependent Community Description - Galveston, Texas

Explore building inventory and social systems. Specifically look at how the building inventory connects with the housing unit inventory using the housing unit allocation.
The housing unit allocation method will provide detail demographic characteristics for the community allocated to each structure.

To run the HUA Algorithm, three inventory files are required:

1.	Housing Unit Inventory - Based on 2010 US Census Block Level Data

2.	Address Point Inventory - A list of all possible residential/business address points in a community. Address points are the link between buildings and housing units.

3.	Building Inventory - A list of all buildings within a community.


In [12]:
# Galveston, TX Housing unit inventory
housing_unit_inv = "5fc6ab1cd2066956f49e7a03"

# Galveston, TX Address point inventory
address_point_inv = "5fc6aadcc38a0722f563392e"

# Galveston, TX Building inventory
building_inv = "60354b6c123b4036e6837ef7"

### Run Housing Unit Allocation
https://github.com/IN-CORE/incore-docs/blob/master/notebooks/housingunitallocation.ipynb

Rosenheim, Nathanael, Roberto Guidotti, Paolo Gardoni & Walter Gillis Peacock. (2019). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure. doi.org/10.1080/23789689.2019.1681821

In [13]:
# Create housing allocation 
hua = HousingUnitAllocation(client)

# Load input dataset
hua.load_remote_input_dataset("housing_unit_inventory", housing_unit_inv)
hua.load_remote_input_dataset("address_point_inventory", address_point_inv)
hua.load_remote_input_dataset("buildings", building_inv)

# Specify the result name
result_name = "Galveston_HUA"

seed = 1238
iterations = 1

# Set analysis parameters
hua.set_parameter("result_name", result_name)
hua.set_parameter("seed", seed)
hua.set_parameter("iterations", iterations)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...


True

In [14]:
# Run Housing unit allocation analysis
hua.run_analysis()

True

In [15]:
# Retrieve result dataset
hua_result = hua.get_output_dataset("result")

# Convert dataset to Pandas DataFrame
hua_df = hua_result.get_dataframe_from_csv(low_memory=False)

# Display top 5 rows of output data
hua_df.head()

Unnamed: 0,addrptid,strctid,archetype,struct_typ,year_built,no_stories,a_stories,b_stories,bsmt_type,sq_foot,...,race,hispan,hispan_flag,vacancy,gqtype,incomegroup,randincome,randomhu,aphumerge,geometry
0,XREF0628-0065-0000-000AP014,XREF0628-0065-0000-000,,W1,,,,,,,...,2.0,0.0,2.0,0,0,15.0,142028.8,0.391309,both,POINT (-94.79252 29.3092)
1,XREF0628-0065-0000-000AP012,XREF0628-0065-0000-000,,W1,,,,,,,...,1.0,0.0,1.0,0,0,17.0,325475.7,0.414422,both,POINT (-94.79252 29.3092)
2,XREF0628-0065-0000-000AP004,XREF0628-0065-0000-000,,W1,,,,,,,...,1.0,0.0,1.0,0,0,17.0,260189.9,0.927559,both,POINT (-94.79252 29.3092)
3,XREF0628-0065-0000-000AP005,XREF0628-0065-0000-000,,W1,,,,,,,...,1.0,0.0,1.0,0,0,17.0,347593.4,0.090391,both,POINT (-94.79252 29.3092)
4,XREF0628-0065-0000-000AP010,XREF0628-0065-0000-000,,W1,,,,,,,...,1.0,0.0,1.0,0,0,17.0,558643.5,0.284164,both,POINT (-94.79252 29.3092)


### Explore results from Housing Unit Allocation

Keep observations that are matched to a building.

In [16]:
hua_df = hua_df.loc[hua_df['aphumerge'] == 'both']

In [17]:
hua_df['Race Ethnicity'] = "0 Vacant HU No Race Ethnicity Data"
hua_df['Race Ethnicity'].notes = "Identify Race and Ethnicity Housing Unit Characteristics."

hua_df.loc[(hua_df['race'] == 1) & (hua_df['hispan'] == 0),'Race Ethnicity'] = "1 White alone, Not Hispanic"
hua_df.loc[(hua_df['race'] == 2) & (hua_df['hispan'] == 0),'Race Ethnicity'] = "2 Black alone, Not Hispanic"
hua_df.loc[(hua_df['race'].isin([3,4,5,6,7])) & (hua_df['hispan'] == 0),'Race Ethnicity'] = "3 Other Race, Not Hispanic"
hua_df.loc[(hua_df['hispan'] == 1),'Race Ethnicity'] = "4 Any Race, Hispanic"
hua_df.loc[(hua_df['gqtype'] >= 1),'Race Ethnicity'] = "5 Group Quarters no Race Ethnicity Data"

# Check new variable
table_title = "Confirm housing unit characteristic by Race and Ethnicity."
pd.crosstab(hua_df['Race Ethnicity'], hua_df['race'], 
            margins=True, margins_name="Total").style.set_caption(table_title)

race,1.0,2.0,3.0,4.0,6.0,7.0,Total
Race Ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"1 White alone, Not Hispanic",10912,0,0,0,0,0,10912
"2 Black alone, Not Hispanic",0,3607,0,0,0,0,3607
"3 Other Race, Not Hispanic",0,0,26,607,4,145,782
"4 Any Race, Hispanic",2738,25,57,6,1520,184,4530
Total,13650,3632,83,613,1524,329,19831


In [18]:
# Check new variable
table_title = "Confirm housing unit characteristic by Race and Ethnicity."
pd.crosstab(hua_df['Race Ethnicity'], hua_df['hispan'], 
            margins=True, margins_name="Total").style.set_caption(table_title)

hispan,0.0,1.0,Total
Race Ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"1 White alone, Not Hispanic",10912,0,10912
"2 Black alone, Not Hispanic",3607,0,3607
"3 Other Race, Not Hispanic",782,0,782
"4 Any Race, Hispanic",0,4530,4530
Total,15301,4530,19831


In [19]:
table_title = "Table 1. Housing Unit Characteristics by Race and Ethnicity"
table1 = pd.pivot_table(hua_df, values='numprec', index=['Race Ethnicity'],
                              margins = True, margins_name = 'Total',
                              aggfunc=[len, np.sum], 
                              fill_value=0).reset_index().rename(
                                                            columns={'len': 'Housing Unit',
                                                                     'sum' : 'Population',
                                                                     'numprec': 'Count'})

varformat = {('Housing Unit','Count'): "{:,}", ('Population','Count'): "{:,}"}

In [20]:
table1.style.set_caption(table_title).format(varformat).set_table_styles([
    dict(selector='th', props=[('text-align', 'center')]),])

Unnamed: 0_level_0,Race Ethnicity,Housing Unit,Population
Unnamed: 0_level_1,Unnamed: 1_level_1,Count,Count
0,0 Vacant HU No Race Ethnicity Data,12657,0
1,"1 White alone, Not Hispanic",10912,21220
2,"2 Black alone, Not Hispanic",3607,8302
3,"3 Other Race, Not Hispanic",782,1698
4,"4 Any Race, Hispanic",4530,13424
5,5 Group Quarters no Race Ethnicity Data,13,240
6,Total,32501,44884


### Validate the Housing Unit Allocation has worked
Notice that the population count totals for the community should match (pretty closely) data collected for the 2010 Decennial Census.
This can be confirmed by going to data.census.gov

https://data.census.gov/cedsci/table?q=DECENNIALPL2010.P1&g=1600000US4828068,4837252&tid=DECENNIALSF12010.P1
    
Differences in the housing unit allocation and the Census count may be due to differences between political boundaries and the building inventory. See Rosenheim et al 2019 for more details.

The housing unit allocation, plus the building dresults will become the input for the dislocation model.

In [21]:
# Save cleaned HUA file as CSV
hua_df.to_csv(result_name+str(seed)+'_cleaned.csv')

## Galveston Population Dislocation

In [22]:
from pyincore.analyses.populationdislocation import PopulationDislocation

In [23]:
# housing_unit_alloc = "602d5279b1db9c28aeede1ca" # dev
bg_data = "603545f2dcda03378087e708"  # IN-CORE_BGMAP_2021-01-19_GalvestonTX
value_loss = "60354810e379f22e16560dbd"

In [24]:
pop_dis = PopulationDislocation(client)

In [25]:
pop_dis.load_remote_input_dataset("block_group_data", bg_data)
pop_dis.load_remote_input_dataset("value_poss_param", value_loss)
#pop_dis.load_remote_input_dataset("housing_unit_allocation", housing_unit_alloc)

#pop_dis.set_input_dataset("building_dmg", building_dmg_result)
pop_dis.set_input_dataset("housing_unit_allocation", hua_result)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...


True

In [26]:
result_name = "galveston-pop-disl-results"
seed = 1111

In [27]:
pop_dis.set_parameter("result_name", result_name)
pop_dis.set_parameter("seed", seed)

True

### Run Population Dislocation

In [28]:
#pop_dis.run_analysis()

In [29]:
# Retrieve result dataset
#result = pop_dis.get_output_dataset("result")

# Convert dataset to Pandas DataFrame
#pd_df = result.get_dataframe_from_csv(low_memory=False)

# Display top 5 rows of output data
#pd_df.head()