## LQTMOMENT Tutorial 1: Creating LQTMOMENT Format Catalog with Catalog Builder

Like many domain-specific data processing tools, `lqtmoment` requires you to follow its native input format in order for the program to process your data with your specific parameters. In this tutorial, we will focus on how to prepare the `lqtmoment-formatted` catalog, which the program will use not only for moment magnitude calculation but also for data analysis.

> **ℹ️ INFO ℹ️**
> 
> Before using `lqtmoment` for magnitude calculation, you need to prepare the accepted catalog input format using lqtmoment's `catalog_builder` module. You absolutely can use your own method to generate the lqtmoment catalog,  as long as it follows all required columns and structure.
>
> When creating lqtmoment catalog, you need to make sure the consistency and integrity of the `source_id`, the `source_id` should be a unique, sequential number with no duplicates. Since wave/seismogram processing will be indexed using this `source_id`, you need to be careful in managing this aspect.
> 
>

### 1. Programmatic Approach

#### A. Import Catalog Builder Function

For building lqt moment catalog we can use `build_catalog` function from `lqtmoment` package.

In [1]:
from lqtmoment import build_catalog
import pandas as pd

#### B. Initialize Input/Output File/Dir

In [2]:
# Initialize directories object
dirs = {
    "hypo_dir": r"F:..\tests\sample_tests_data\data\catalog\hypo_catalog.xlsx",
    "pick_dir": r"F:..\tests\sample_tests_data\data\catalog\picking_catalog.xlsx",
    "station_dir": r"..\tests\sample_tests_data\data\station\station.xlsx",
    "output_dir": r"..\tests\sample_tests_data\results\lqt_catalog"
}

#### C. Input Format

To generate an lqtmoment catalog, you need to prepare your `hypocenter catalog`, `picking catalog`, and `station` data in the following formats :

**1. Hypocenter Catalog Format**

In [3]:
# Load hypocenter catalog
hypo_df = pd.read_excel(dirs['hypo_dir'])
hypo_df.head()

Unnamed: 0,source_id,lat,lon,depth_m,year,month,day,hour,minute,t_0,source_err_rms_s,n_phases,gap_degree,x_horizontal_err_m,y_horizontal_err_m,z_depth_err_m,remarks
0,1001,38.088368,126.596433,1252.26,2024,5,11,15,30,35.91,0.009189,12,334.187,1081.373654,830.062076,526.348287,
1,1002,38.085685,126.59125,1035.99,2024,5,11,16,33,28.42,0.01757,8,336.028,1036.87045,1110.157682,572.148446,
2,1003,38.084107,126.597537,705.16,2024,5,27,1,19,6.78,0.013334,22,311.137,632.833706,581.182105,573.935185,
3,1004,38.084155,126.602059,770.66,2024,5,27,1,20,4.05,0.011108,26,336.717,1061.478766,1026.88642,602.837872,
4,1005,38.088481,126.597389,1004.5,2024,5,27,1,21,12.38,0.013789,18,341.582,1126.28733,1141.011113,693.784318,


>
> **⚠️ CAUTION ⚠️**
>
> In practice, not all of these fields in the `hypocenter catalog` format are available in your own data, so it’s okay not to include them in your catalog. However, you cannot leave out the following list of fields:
> 
> List of Required Data Fields in `hyponcenter catalog` Format:
> - **source_id**
> - **lat**
> - **lon**
> - **depth_m**
> - **year**
> - **month**
> - **day**
> - **hour**
> - **minute**
> - **t_0**
>

**2. Picking Catalog**

In [4]:
# Load picking catalog
picking_df = pd.read_excel(dirs['pick_dir'])
picking_df.head(10)

Unnamed: 0,source_id,station_code,year,month,day,hour_p,minute_p,p_arr_sec,p_polarity,p_onset,hour_s,minute_s,s_arr_sec,hour_coda,minute_coda,sec_coda
0,1001,KJ06,2024,5,11,15,30,36.652054,+,I,15,30,37.180698,15,30,44.180698
1,1001,KJ14,2024,5,11,15,30,36.706322,+,E,15,30,37.257805,15,30,44.257805
2,1001,KJ11,2024,5,11,15,30,36.727074,+,I,15,30,37.323666,15,30,44.323666
3,1001,KJ04,2024,5,11,15,30,36.809074,-,E,15,30,37.316196,15,30,44.316196
4,1001,KJ10,2024,5,11,15,30,36.997971,-,E,15,30,37.776321,15,30,44.776321
5,1001,KJ05,2024,5,11,15,30,37.322792,+,I,15,30,38.323971,15,30,45.323971
6,1002,KJ06,2024,5,11,16,33,29.055487,+,I,16,33,29.58934,16,33,36.58934
7,1002,KJ11,2024,5,11,16,33,29.148645,+,I,16,33,29.712334,16,33,36.712334
8,1002,KJ14,2024,5,11,16,33,29.15432,+,I,16,33,29.654936,16,33,36.654936
9,1002,KJ04,2024,5,11,16,33,29.165812,-,E,16,33,29.672575,16,33,36.672575


> **⚠️ CAUTION ⚠️**
>
> In practice, not all of these fields in the `picking catalog` format are available in your own data, so it’s okay not to include them in your catalog. However, you cannot leave out the following list of fields:
> 
> List of Required Data Fields in `picking catalog` Format:
> - **source_id**
> - **station_code**
> - **year**
> - **month**
> - **day**
> - **hour_p**
> - **minute_p**
> - **p_arr_sec**
> - **hour_s**
> - **minute_s**
> - **s_arr_sec**
> - **hour_coda** (**)
> - **minute_coda** (**)
> - **sec_coda** (**)
>
> ** *For `hour_coda`, `minute_coda`, and `sec_coda` if you don't have the data, you can leave the column blank, but the field or column header must always be included in the picking catalog.*

**3. Station Data**

In [5]:
# Load station data
station_df = pd.read_excel(dirs['station_dir'])
station_df.head()

Unnamed: 0,network_code,station_code,lat,lon,elev_m
0,KJ,KJ01,38.125223,126.563253,1120
1,KJ,KJ02,38.097281,126.566326,1496
2,KJ,KJ03,38.110387,126.569118,1335
3,KJ,KJ04,38.096023,126.572559,1571
4,KJ,KJ05,38.121308,126.561752,1150


#### D. Build LQT Format Catalog

In [6]:
catalog_df = build_catalog(dirs["hypo_dir"],
                           dirs["pick_dir"],
                           dirs["station_dir"])

catalog_df.head(10)

Unnamed: 0,source_id,source_lat,source_lon,source_depth_m,network_code,station_code,station_lat,station_lon,station_elev_m,source_origin_time,...,s_p_lag_time_sec,coda_time,source_err_rms_s,n_phases,gap_degree,x_horizontal_err_m,y_horizontal_err_m,z_depth_err_m,earthquake_type,remarks
0,1001,38.088368,126.596433,1252.26,KJ,KJ06,38.095082,126.585931,1396,2024-05-11 15:30:35.909999,...,0.528645,2024-05-11 15:30:44.180698,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
1,1001,38.088368,126.596433,1252.26,KJ,KJ14,38.102954,126.577039,1398,2024-05-11 15:30:35.909999,...,0.551483,2024-05-11 15:30:44.257804,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
2,1001,38.088368,126.596433,1252.26,KJ,KJ11,38.107482,126.587313,1312,2024-05-11 15:30:35.909999,...,0.596592,2024-05-11 15:30:44.323665,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
3,1001,38.088368,126.596433,1252.26,KJ,KJ04,38.096023,126.572559,1571,2024-05-11 15:30:35.909999,...,0.507122,2024-05-11 15:30:44.316196,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
4,1001,38.088368,126.596433,1252.26,KJ,KJ10,38.11491,126.565193,1220,2024-05-11 15:30:35.909999,...,0.778351,2024-05-11 15:30:44.776321,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
5,1001,38.088368,126.596433,1252.26,KJ,KJ05,38.121308,126.561752,1150,2024-05-11 15:30:35.909999,...,1.001178,2024-05-11 15:30:45.323970,0.009189,12.0,334.187,1081.373654,830.062076,526.348287,very_local_earthquake,
6,1002,38.085685,126.59125,1035.99,KJ,KJ06,38.095082,126.585931,1396,2024-05-11 16:33:28.420000,...,0.533854,2024-05-11 16:33:36.589340,0.01757,8.0,336.028,1036.87045,1110.157682,572.148446,very_local_earthquake,
7,1002,38.085685,126.59125,1035.99,KJ,KJ11,38.107482,126.587313,1312,2024-05-11 16:33:28.420000,...,0.563688,2024-05-11 16:33:36.712333,0.01757,8.0,336.028,1036.87045,1110.157682,572.148446,very_local_earthquake,
8,1002,38.085685,126.59125,1035.99,KJ,KJ14,38.102954,126.577039,1398,2024-05-11 16:33:28.420000,...,0.500616,2024-05-11 16:33:36.654935,0.01757,8.0,336.028,1036.87045,1110.157682,572.148446,very_local_earthquake,
9,1002,38.085685,126.59125,1035.99,KJ,KJ04,38.096023,126.572559,1571,2024-05-11 16:33:28.420000,...,0.506763,2024-05-11 16:33:36.672574,0.01757,8.0,336.028,1036.87045,1110.157682,572.148446,very_local_earthquake,


> **⚠️ CAUTION ⚠️**
>
> Again, not all of these fields in the `lqtmoment` catalog format are available in your own data, so it’s okay not to include them in your catalog. However, you cannot leave out the following list of fields:
> 
> List of Required Data Fields in `lqtmoment` Format Catalog:
> - **source_id**
> - **source_lat**
> - **source_lon**
> - **source_depth_m**
> - **network_code**
> - **station_code**
> - **station_lat**
> - **station_lon**
> - **station_elev_m**
> - **source_origin_time**
> - **p_arr_time**
> - **s_arr_time**
> - **s_p_lag_time_sec**
> - **coda_time** (**)
> - **earthquake_type**
>
> ** *For `coda_time`, if you don't have the data, you can leave the column blank, but the field or column header must always be included in the catalog.*
>
> As you can see from the dataframe above, the `build_catalog` function will automatically create all the new catalog columns necessary for lqtmoment to calculate moment magnitude. For more details you can check the full format [here](https://github.com/bgjx/lqt-moment-magnitude/blob/main/tests/sample_tests_data/results/lqt_catalog/lqt_catalog.xlsx).
>
> `source_origin_time`, `p_arr_time`, `s_arr_time`, and `coda_time` field must a datetime format otherwise `lqtmoment` will be unable to parse the data. `coda_time` field is necessary if you want to apply `dynamic` trimming (check lqt_tutor_3 for more details) with `coda_time` as primary reference but since `coda_time` not always provided in catalog you can leave this field blank.

#### E. Save the Catalog to Results Dir

This formatted catalog will be used throughout calculation processes.

In [7]:
from pathlib import Path
catalog_df.to_csv(Path(dirs["output_dir"])/"lqt_catalog_test.csv", index=False)

> **⚠️ CAUTION ⚠️**
>
> The lqtmoment accepts both `.xlsx` and `.csv` format as input. However it is highly recommended to save it in `.csv` format for your processing pipelines to keep preserving the plain text format.




### 2. Command-Line Interface Approach

`lqtmoment` package also includes **Command-Line Interface (CLI)** capabilities. If the input format is well defined (following formats above), you can build lqtmoment catalog simply by entering a command line in terminal, as shown bellow (ensure that the `lqtmoment` package is correctly installed in your working environment beforehand):

> `$ lqtcatalog --hypo-file dir/hypo_catalog.xlsx --pick-file dir/picking_catalog.xlsx --station-file dir/station.xlsx --output-format csv`

> **ℹ️ INFO ℹ️**
>
> type `$ lqtcatalog --help` for more details.


