# BayesFrag - Tutorial 3: Computation of GMM estimates using OpenQuake

<a target="_blank" href="https://colab.research.google.com/github/bodlukas/BayesFrag/blob/main/Tutorial3.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

BayesFrag is a tool to perform Bayesian parameter estimation for empirical seismic fragility models. The tool accounts for uncertainty in the ground motion intensity measures (IMs) which caused the observed damage. The methodology is presented in

> Bodenmann L., Baker J. , Stojadinovic B. (2023): "Accounting for ground motion uncertainty in empirical seismic fragility modeling". INCLUDE LINK

This notebook and further supporting codes are published on [GitHub](https://github.com/bodlukas/BayesFrag) and Zenodo (doi: LINK)

To avoid any additional dependency on a specific ground motion model (GMM) library, the GMM estimates for the IM of interest are computed outside of BayesFrag and prior to the actual fragility model estimation. Conditional on earthquake rupture characteristics, $\mathbf{rup}$, we consider the log-transformed IM at site $i$ to be normally distributed, i.e., 

$p(\ln im_i|\mathbf{rup}) = \mathcal{N}\left(\mu_i\, , \, \sqrt{\tau_i^2 + \phi_i^2}\right)$ ,

where $\mu_i$ is the mean, $\tau_i$ is the standard deviation of between-event residuals, and $\phi_i$, is the standard deviation of within-event residuals. To perform fragility model estimation, BayesFrag requires that $\mu_i$, $\tau_i$, and $\phi_i$ are available at the sites of: (1) seismic network stations, and (2) surveyed buildings. This tutorial explains how to perform these computations using [OpenQuake](https://github.com/gem/oq-engine#openquake-engine) as the GMM library and based on the example of the 2009 L'Aquila (Italy) earthquake. 

Note that OpenQuake - or any other GMM library - is not a part of BayesFrag and has to be installed separately, for example in a different virtual environment. Alternatively, users can open this tutorial on a hosted Jupyter notebook service (e.g., Google Colab). 

## Import Packages

If the notebook is opened on google colab, we install [OpenQuake](https://github.com/gem/oq-engine#openquake-engine) and clone the [BayesFrag repository](https://github.com/bodlukas/BayesFrag) for data access. This may take a few minutes. 

In [1]:
%%capture
import os
if os.getenv("COLAB_RELEASE_TAG"): # Check whether notebook runs on colab.
  !pip install openquake.engine>=3.15.0
  !git clone https://github.com/bodlukas/BayesFrag.git
  %cd BayesFrag

In [2]:
import numpy as np
import pandas as pd
import json
import openquake.hazardlib as oq

## Specify settings

Specify the IM of interest (for PGA use 'PGA', for SA(T=0.3s) use 'SAT0_300'), and the GMM. This tutorial covers the two GMMs used in the manuscript: 'BindiEtAl2011' and 'ChiouYoungs2014Italy'. Analysts can use any other GMM that is available in OpenQuake, with slight changes to this notebook that will be explained throughout. 

In [8]:
path_data = os.path.join('data', 'tutorial2', '')

args = {
    'IM': 'SAT0_300',
    'GMM': 'BindiEtAl2011',
        }

## Specify earthquake rupture 

Here we specify the rupture characteristics from the 2009 L'Aquila earthquake as obtained from the Engineering Strong Motion database, [ESM](https://esm-db.eu/#/event/IT-2009-0009). 

In [9]:
Point = oq.geo.point.Point
PlanarSurface = oq.geo.surface.planar.PlanarSurface
MultiSurface = oq.geo.surface.multi.MultiSurface
BaseRupture = oq.source.rupture.BaseRupture

f = open(path_data + 'rupture.json')
rup_temp = json.load(f)
f.close()
rup_geom_json = rup_temp['features'][0]['geometry']
rup_geom = np.array(rup_geom_json['coordinates'][0][0])[:-1,:]

rupture_surface = PlanarSurface.from_corner_points(
    top_left = Point(rup_geom[0, 0], rup_geom[0, 1], rup_geom[0, 2]),
    top_right = Point(rup_geom[1, 0], rup_geom[1, 1], rup_geom[1, 2]),
    bottom_right = Point(rup_geom[2, 0], rup_geom[2, 1], rup_geom[2, 2]),
    bottom_left = Point(rup_geom[3, 0], rup_geom[3, 1], rup_geom[3, 2]),
)
rupture = BaseRupture(mag = 6.1, rake = -90.0, 
                    tectonic_region_type = 'Active Shallow Crust', 
                    hypocenter = Point(longitude = 13.380, 
                                        latitude = 42.342,
                                        depth = 8.3),
                    surface = rupture_surface)

## Import site information

**Station data**

Below, we import the station data file and print the available attributes, which are:
- id: Station identifier
- Longitude, Latitude in decimal degrees
- vs30: time-averaged shear wave velocity in m/s
- vs30measured: A boolean flag, whether vs30 was measured or deduced from other informations

Besides this information, the station data also contains observed intensity measures (IMs) as processed from the ground motion recordings. In this example, we have data for four IMs: PGA, SA(0.2s), SA(0.3s), and SA(0.6s). Each of these IMs were processed according to two IM definitions: the geometric mean and the rotD50. GMMs are derived for a specific IM definition and if we aim to include the station observations we should extract the correct definition of the employed GMM. This is discussed below.

In [10]:
dfstations = pd.read_csv(path_data + 'stations.csv')
print(dfstations.columns.values)

['id' 'Longitude' 'Latitude' 'vs30' 'vs30measured' 'rotD50_logPGA'
 'geoM_logPGA' 'rotD50_logSAT0_200' 'geoM_logSAT0_200'
 'rotD50_logSAT0_300' 'geoM_logSAT0_300' 'rotD50_logSAT0_600'
 'geoM_logSAT0_600']


**Damage survey data**

Below, we import the station data file and print the available attributes, which are:
- id: Station identifier
- Longitude, Latitude in decimal degrees
- vs30: time-averaged shear wave velocity in the upper-most 30 meters of soil in m/s
- BuildingClass: Used for fragility function estimation (-> see Tutorials 1 and 2)
- DamageState: Used for fragility function estimation (-> see Tutorials 1 and 2)

In [11]:
dfsurvey = pd.read_csv(path_data + 'survey.csv')
print(dfsurvey.columns.values)

['id' 'Longitude' 'Latitude' 'vs30' 'BuildingClass' 'DamageState']


## Compute GMM estimates

In [12]:
# Import OpenQuake GMMs: Modify this to include another GMM.
if args['GMM'] == 'BindiEtAl2011':
    gmm = oq.gsim.bindi_2011.BindiEtAl2011()
elif args['GMM'] == 'ChiouYoungs2014Italy':
    gmm = oq.gsim.chiou_youngs_2014.ChiouYoungs2014Italy()

# Extract whether IM is defined for the geometric mean or RotD50
im_definition = gmm.DEFINED_FOR_INTENSITY_MEASURE_COMPONENT.value
if im_definition == 'Average Horizontal':
    obs_str = 'geoM_log' + args['IM']
elif im_definition == 'Average Horizontal (RotD50)':
    obs_str = 'rotD50_log' + args['IM']

# Import OpenQuake IMs
if args['IM'] == 'PGA':
    im_list = [oq.imt.PGA()]
else:
    T = float('.'.join( args['IM'][3:].split('_') )) 
    im_list = [oq.imt.SA(T)]

**Wrapper functions**

In [13]:
def get_epiazimuth(rupture, sites_mesh):
    '''
    Computes epicentral azimuth which is required for the spatial 
    correlation model of BodenmannEtAl2023. See also the corresponding
    documentation in bayesfrag/spatialcorrelation.py 
    '''
    lon, lat = rupture.hypocenter.longitude, rupture.hypocenter.latitude
    lons, lats = sites_mesh.lons, sites_mesh.lats    
    return oq.geo.geodetic.fast_azimuth(lon, lat, lons, lats)

def get_RuptureContext(rupture, sites_mesh, sites_vs30, 
                sites_vs30measured=None, sites_z1pt0=None):
    '''
    Compute the required source and site inputs required for the specified 
    GMMs. This includes the source-to-site distances.
    This may have to be modified if you want to include other GMMs.
    '''
    rctx = oq.contexts.RuptureContext()
    rctx.rjb = rupture.surface.get_joyner_boore_distance(sites_mesh)
    rctx.rrup = rupture.surface.get_min_distance(sites_mesh)
    rctx.vs30 = sites_vs30
    rctx.mag = rupture.mag * np.ones_like(rctx.rjb)
    rctx.rake = rupture.rake * np.ones_like(rctx.rjb)
    rctx.rx = rupture.surface.get_rx_distance(sites_mesh)
    rctx.ztor = rupture.surface.get_top_edge_depth() * np.ones_like(rctx.rjb)
    rctx.dip = rupture.surface.get_dip() * np.ones_like(rctx.rjb)
    if sites_z1pt0 is None:
        rctx.z1pt0 = -7.15/4 * np.log( (sites_vs30**4 + 571**4) / (1360**4 + 571**4) )
    else: 
        rctx.z1pt0 = sites_z1pt0
    if sites_vs30measured is None:
        rctx.vs30measured = False
    else:
        rctx.vs30measured = sites_vs30measured
    return rctx    

**Main function**

In [18]:
def get_gmm_estimates(gmm, im_list, rupture, df, stations=False):
    n = len(df) # Number of sites
    nim = len(im_list) # Number of IMs: Here 1!

    sites_mesh = oq.geo.mesh.Mesh(df['Longitude'].values, 
                                df['Latitude'].values, depths=None)
    
    if 'vs30measured' not in df.columns.values: df['vs30measured'] = False
    
    rupture_context = get_RuptureContext(rupture, sites_mesh, 
                                sites_vs30 = df['vs30'].values, 
                                sites_vs30measured = df['vs30measured'].values)
    
    mean = sigma = tau = phi = np.zeros([nim, n])
    gmm.compute(rupture_context, im_list, mean, sigma, tau, phi)
    res = {'mu_logIM': mean.squeeze(), 'tau_logIM': tau.squeeze(), 'phi_logIM': phi.squeeze()}

    if args['GMM'] == 'ChiouYoungs2014Italy':
        # Epicentral azimuth is required for spatial correlation model of BodenmannEtAl2023.
        # This correlation model is used together with the GMM of ChiouYoungs2014Italy.
        res['epiazimuth'] = get_epiazimuth(rupture, sites_mesh).squeeze()

    if stations: res['obs_logIM'] = df[obs_str].values
    return res

**Station and survey data sites**

In [20]:
# GMM estimates at sites of seismic network stations
res = get_gmm_estimates(gmm, im_list, rupture, dfstations, stations = True)
filepath = 'stations_im_' + args['IM'] + '_gmm_' + args['GMM'] + '.npz'
np.savez(path_data + filepath, **res)

# GMM estimates at sites of surveyed buildings
res = get_gmm_estimates(gmm, im_list, rupture, dfsurvey, stations = False)
filepath = 'survey_im_' + args['IM'] + '_gmm_' + args['GMM'] + '.npz'
np.savez(path_data + filepath, **res)

**Gridded sites for map visualizations**

For visualization purposes we also compute the GMM estimates at gridded sites over a specified region of interest.

In [21]:
dfgridmap = pd.read_csv(path_data + 'gridmap.csv')
res = get_gmm_estimates(gmm, im_list, rupture, dfgridmap, stations = False)
filepath = 'gridmap_im_' + args['IM'] + '_gmm_' + args['GMM'] + '.npz'
np.savez(path_data + filepath, **res)

## Licence information

The OpenQuake Engine is released under the [GNU Affero Public License 3](https://github.com/gem/oq-engine/blob/master/LICENSE). Neither this tutorial nor OpenQuake are distributed with BayesFrag.