# Inspector

The `Inspector` class uses the [Pandas](https://pandas.pydata.org/) library to aggregate information during a seismic inversion from ASDFDataSets. Some pre-defined functions allow quick access to useful inversion information, while the underlying DataFrame object provides all the power of the Pandas library to be used for assessing misfit and time windows for an inversion.

## Initializing
The `Inspector` requires present ASDFDataSets to read from. These are usually created during a seismic inversion using larger workflow tools. Here we'll manually fill a dataset to illustrate the `Inspector` initialization routine.

In [1]:
import os
import obspy
from pyasdf import ASDFDataSet
from pyatoa import Config, Manager, Inspector, logger
logger.setLevel("INFO")

# Read in test data
inv = obspy.read_inventory("../tests/test_data/test_dataless_NZ_BFZ.xml")
cat = obspy.read_events("../tests/test_data/test_catalog_2018p130600.xml")
event = cat[0]
st_obs = obspy.read("../tests/test_data/test_obs_data_NZ_BFZ_2018p130600.ascii")
st_syn = obspy.read("../tests/test_data/test_syn_data_NZ_BFZ_2018p130600.ascii")

# Fill up the test dataset with data from a single source-receiver pair
ds_fid = "../tests/test_data/test_ASDFDataSet.h5"
os.remove(ds_fid)
with ASDFDataSet(ds_fid) as ds:
    cfg = Config(iteration=1, step_count=0)
    mgmt = Manager(ds=ds, config=cfg, inv=inv, event=event, st_obs=st_obs, st_syn=st_syn)
    mgmt.write()
    mgmt.flow()

[2020-08-11 17:45:43] - pyatoa - INFO: standardizing streams
[2020-08-11 17:45:43] - pyatoa - INFO: preprocessing observation data
[2020-08-11 17:45:43] - pyatoa - INFO: adjusting taper to cover time offset
[2020-08-11 17:45:43] - pyatoa - INFO: preprocessing synthetic data
[2020-08-11 17:45:43] - pyatoa - INFO: adjusting taper to cover time offset
[2020-08-11 17:45:43] - pyatoa - INFO: running Pyflex w/ map: default
[2020-08-11 17:45:43] - pyatoa - INFO: 1 window(s) selected for comp Z
[2020-08-11 17:45:43] - pyatoa - INFO: 1 window(s) selected for comp N
[2020-08-11 17:45:43] - pyatoa - INFO: 1 window(s) selected for comp E
[2020-08-11 17:45:43] - pyatoa - INFO: 3 window(s) total found
[2020-08-11 17:45:43] - pyatoa - INFO: 0.007 misfit for comp Z
[2020-08-11 17:45:43] - pyatoa - INFO: 1.786 misfit for comp N
[2020-08-11 17:45:43] - pyatoa - INFO: 0.389 misfit for comp E
[2020-08-11 17:45:43] - pyatoa - INFO: total misfit 2.182


The `Inspector` class will automatically search for data with the `discover` function. An optional `tag` is used for output filenames.

In [2]:
insp = Inspector(tag="test_inspector", verbose=True)
insp.discover(path="../tests/test_data")
print(insp)

test_ASDFDataSet.h5       000/001...done
1    event(s)
1    station(s)
1    iteration(s)
1    evaluation(s)


## Accessing the Inspector

We can access event and station metadata, as well as time windows, using te attributes of the Inspector. 

### Source and receiver metadata

A list of event ids and station names can be accessed through the `events` and `stations` attributes. Metadata, including locations and source information like magnitude and origint time are accesible through the `sources` and `receivers` attributes.

The `calculate_srcrcv` function will calculating great-circle-distance and backazimuth between each source and receiver pair and return a new DataFrame.

In [3]:
insp.events

array(['test_ASDFDataSet'], dtype=object)

In [4]:
insp.stations

array(['BFZ'], dtype=object)

In [5]:
insp.sources

Unnamed: 0_level_0,time,magnitude,depth_km,latitude,longitude
event_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
test_ASDFDataSet,2018-02-18T07:43:48.127644Z,5.156706,20.594599,-39.948975,176.299515


In [6]:
insp.receivers

Unnamed: 0_level_0,Unnamed: 1_level_0,latitude,longitude
network,station,Unnamed: 2_level_1,Unnamed: 3_level_1
NZ,BFZ,-40.679647,176.246245


In [7]:
insp.calculate_srcrcv()

Unnamed: 0,event,network,station,distance_km,backazimuth
0,test_ASDFDataSet,NZ,BFZ,81.260637,3.211526


### Time windows

Misfit windows can be accessed using the `windows` attribute.

In [8]:
insp.windows

Unnamed: 0,event,iteration,step,network,station,channel,component,misfit,length_s,dlnA,window_weight,max_cc_value,relative_endtime,relative_starttime,cc_shift_in_seconds,absolute_starttime,absolute_endtime
0,test_ASDFDataSet,i01,s00,NZ,BFZ,HHE,E,0.0072,45.9,-0.711305,3.997104,0.870829,77.07,31.17,1.11,2018-02-18T07:43:59.297644Z,2018-02-18T07:44:45.197644Z
1,test_ASDFDataSet,i01,s00,NZ,BFZ,HHN,N,0.0072,39.21,-0.830848,3.883489,0.990433,77.07,37.86,1.92,2018-02-18T07:44:05.987644Z,2018-02-18T07:44:45.197644Z
2,test_ASDFDataSet,i01,s00,NZ,BFZ,HHZ,Z,0.0072,23.4,-0.886651,2.320872,0.991825,42.96,19.56,0.0,2018-02-18T07:43:47.687644Z,2018-02-18T07:44:11.087644Z


## Isolating categories

It is typically handy to isolate certain categories, e.g. retrieving time windows for only the 'Z' component. Although this is directly possible using Pandas syntax, the `Inspector` comes with an `isolate` function to simplify these calls.

In [9]:
insp.isolate(comp="Z")

Unnamed: 0,event,iteration,step,network,station,channel,component,misfit,length_s,dlnA,window_weight,max_cc_value,relative_endtime,relative_starttime,cc_shift_in_seconds,absolute_starttime,absolute_endtime
2,test_ASDFDataSet,i01,s00,NZ,BFZ,HHZ,Z,0.0072,23.4,-0.886651,2.320872,0.991825,42.96,19.56,0.0,2018-02-18T07:43:47.687644Z,2018-02-18T07:44:11.087644Z


## Misfit information

The `Inspector` also has two useful function, one for calculating the misfit for various levels (per function evaluation, per station, per event), the other function calculating the number of misfit windows for various levels. These can be called, respectively, using the `misfits` and `nwin` functions.

### Misfits

Misfit for a single earthquake and for each iteration are defined by Equations 6 and 7, respectively from [Tape et al., (2010)](https://academic.oup.com/gji/article/180/1/433/600143). 

In [10]:
insp.misfits()

Unnamed: 0_level_0,Unnamed: 1_level_0,n_event,summed_misfit,misfit
iteration,step,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
i01,s00,1,0.0036,0.0036


In [11]:
insp.misfits(level="station")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,unscaled_misfit,n_win,misfit
iteration,step,event,station,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
i01,s00,test_ASDFDataSet,BFZ,0.0216,3,0.0072


In [12]:
insp.misfits(level="event")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,unscaled_misfit,n_win,misfit
iteration,step,event,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
i01,s00,test_ASDFDataSet,0.0216,3,0.0036


### Number of measurements

Not only is the number of measurements provided, but also the total length of all measurements. 

> **__NOTE__:** Because we only have one source receiver pair, these values are all the same.

In [13]:
insp.nwin()

Unnamed: 0_level_0,Unnamed: 1_level_0,n_win,length_s
iteration,step,Unnamed: 2_level_1,Unnamed: 3_level_1
i01,s00,3,108.51


In [14]:
insp.nwin(level="station")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n_win,length_s
iteration,step,station,Unnamed: 3_level_1,Unnamed: 4_level_1
i01,s00,BFZ,3,108.51


In [15]:
insp.nwin(level="event")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n_win,length_s
iteration,step,event,Unnamed: 3_level_1,Unnamed: 4_level_1
i01,s00,test_ASDFDataSet,3,108.51


## Plotting