# Template Extraction from Continuous Seismic Data

## Overview
This script extracts templates from continuous seismic waveform data based on selected events from a short event catalog. It processes continuous data files to isolate signals corresponding to the largest magnitude events, creating templates that can be used for further analysis, such as matched filtering.

## Methodology
1. **Load Data**: 
   - The script reads continuous waveform data from a specified directory and loads a short event catalog from an Excel file.

2. **Select Events**: 
   - It selects the largest 20 magnitude events from the catalog, sorting the events by their magnitude.

3. **Process Continuous Data**:
   - For each continuous data file, the script reads the data stream, applies a band-pass filter (1-200 Hz), and extracts templates based on the event details in the catalog.
   - For each event, it creates a time window corresponding to the events origin time, trims the stream to this window, normalizes the extracted data, and appends the data as a template.

4. **Save Templates**: 
   - The extracted templates are saved as numpy (.npy) files in a specified directory structure based on the event details.

## Key Parameters
- **NUMBER_OF_TEMPLATES**: Number of templates to extract (default: 20).
- **TEMPLATE_DURATION**: Duration of each template in seconds (default: 4 seconds).

## Dependencies
This script requires the following Python libraries:
- `obspy` for seismic data processing
- `numpy` for numerical operations
- `pandas` for data handling
- `glob` and `os` for file operations

Make sure to have these libraries installed in your Python environment before running the script.


In [1]:
import obspy
import glob
import pandas as pd
import os
import numpy as np

# Constants
NUMBER_OF_TEMPLATES = 20  # Number of templates to extract
TEMPLATE_DURATION = 4      # Duration of each template (in seconds)

# Read continuous waveform data from the specified directory
data_paths = glob.glob('data/*.mseed')

# Load the short event catalog from an Excel file
catalog = pd.read_excel('Catalog_JGR_OneDay.xlsx')

# Select the largest 20 magnitude events from the catalog
catalog = catalog.sort_values(by=['Magnitude Mw'], ascending=False).head(NUMBER_OF_TEMPLATES)

# Loop over each continuous data file (currently set to process the first file only)
for data_file in data_paths:  # Adjust to iterate over multiple files if necessary
    # Initialize a list to store station templates
    station_templates = []
    
    # Print the current data file being processed and the number of events to process
    print(f"Processing Data File: {data_file}, Number of Events to Process: {len(catalog)}")

    # Loop through the selected events in the catalog
    for index in range(len(catalog)):
        # Read the continuous data stream
        st = obspy.read(data_file)
        
        # Apply a band-pass filter to the data (1-200 Hz)
        st.filter('bandpass', freqmin=1, freqmax=200, corners=4, zerophase=False)
        
        # Extract the station name from the stream
        station_name = st[0].stats.station
        
        

        # Create the extraction time based on the event details in the catalog
        event_time_str = f"2016-{int(catalog.iloc[index]['Month']):02d}-{int(catalog.iloc[index]['Day']):02d}T{int(catalog.iloc[index]['Hour']):02d}:{int(catalog.iloc[index]['Minute']):02d}:{int(catalog.iloc[index]['Second']):02d}.000000Z"
        event_time = obspy.UTCDateTime(event_time_str)

        # Formulate the file name for storing the templates
        file_name = f"EV{int(catalog.iloc[index]['Month']):02d}{int(catalog.iloc[index]['Day']):02d}_{int(catalog.iloc[index]['Hour']):02d}{int(catalog.iloc[index]['Minute']):02d}{int(catalog.iloc[index]['Second']):02d}"

        # Create a directory to save the templates if it doesn't exist
        output_dir = f"database/{file_name}"
        os.makedirs(output_dir, exist_ok=True)

        # Trim the stream to the specified time window for the template
        st_trimmed = st.trim(event_time, event_time + TEMPLATE_DURATION)

        # Extract data from the first channel of the stream
        data_array = st_trimmed[0].data
        
        # Normalize the data to have a maximum absolute value of 1
        data_array /= np.max(np.abs(data_array))
        
        # Append the station data to the list of templates
        station_templates.append(data_array)

    # Save the array of station templates to a .npy file
    np.save(os.path.join(output_dir, station_name), station_templates)


Processing Data File: data/1177.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1114.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1158.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1130.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1178.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1113.5B.mseed, Number of Events to Process: 20


  data_array /= np.max(np.abs(data_array))


Processing Data File: data/1155.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1159.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1171.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1187.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1188.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1215.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1138.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1127.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1210.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1175.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1192.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1167.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1131.5B.mseed, Number of Events to Process: 20
Processing Data File: data/1122.5B.mse