### Cleaning/Processing Steps
The data will be demeaned similar to Zali et al., using a more conservative downsample rate of 20 Hz, given volcanic tremor is typically between 1-9 Hz.

*Read in mseed files <br>
*Check all are the same length <br>
*Demean and detrend <br>
*Anti-alias filter and downsample data to 8 Hz from 100 Hz

In [10]:
import warnings

# Suppress the specific warning from obspy
warnings.filterwarnings(
    "ignore",
    message="The encoding specified in trace.stats.mseed.encoding does not match the dtype of the data.*",
    category=UserWarning,
    module="obspy.io.mseed.core"
)

In [11]:
import os
from obspy import read
import glob
import numpy as np

# Define the folder paths
input_folder = os.getcwd() + '/data/raw'
processed_folder = os.getcwd() + '/data/processed'
# number of files in data/raw

files_to_process = len(glob.glob(f'{input_folder}/*.mseed'))
count = 1

# Display folder paths for confirmation
# display(input_folder)
# display(processed_folder)

os.makedirs(processed_folder, exist_ok=True)  # Create output directory if it doesn't exist

# Target sampling rate after downsampling
target_sampling_rate = 8  # Hz

# Initialize a list to store trace lengths
lengths = []

# Process each mseed file in the folder
for file_path in glob.glob(f"{input_folder}/*.mseed"):
    try:
        # Read the file
        st = read(file_path)
        st_check = st.copy()
        
        # for tr in st_check:
        #     if np.isnan(tr.data).any():
        #         print(f"NaNs detected after merging in trace {tr.id}. Filling NaNs with zeros.")
        #         tr.data = np.nan_to_num(tr.data, nan=0.0)  # Replace NaNs with zeros
        
        # Process each trace
        for tr in st:
            # Preprocessing steps
            tr.detrend("demean")  # Remove mean
            tr.detrend("linear")  # Remove linear trend
            tr.filter("lowpass", freq=target_sampling_rate / 2)  # Anti-aliasing filter
            tr.resample(target_sampling_rate)  # Downsample to 8 Hz
            
            tr.normalize()  # Normalize the trace to between -1 and 1
            
            # Record trace length
            lengths.append(len(tr.data))
        
        # Save the processed data
        output_file = os.path.join(processed_folder, os.path.basename(file_path))
        st.write(output_file, format="MSEED")
        print(f'Processed file {count}/{files_to_process}: {os.path.basename(file_path)}')
    
    except Exception as e:
        print(f"Error processing file {file_path}: {e}")
        continue

    count += 1

# Convert lengths to a NumPy array
lengths = np.array(lengths)
print(f"Lengths of processed traces: {lengths}")


Processed file 1/28: 24_20210402_9fnuph.mseed
Processed file 2/28: 14_20210323_9fnuph.mseed
Processed file 3/28: 15_20210324_9fnuph.mseed
Processed file 4/28: 23_20210401_9fnuph.mseed
Processed file 5/28: 21_20210330_9fnuph.mseed
Processed file 6/28: 11_20210320_9fnuph.mseed
Processed file 7/28: 10_20210319_9fnuph.mseed
Processed file 8/28: 07_20210317_9fnuph.mseed
Processed file 9/28: 05_20210315_9fnuph.mseed
Processed file 10/28: 17_20210326_9fnuph.mseed
Processed file 11/28: 22_20210331_9fnuph.mseed
Processed file 12/28: 08_20210317_9fnuph.mseed
Processed file 13/28: 29_20210407_9fnuph.mseed
Processed file 14/28: 18_20210327_9fnuph.mseed
Processed file 15/28: 02_20210313_9fnuph.mseed
Processed file 16/28: 06_20210316_9fnuph.mseed
Processed file 17/28: 28_20210406_9fnuph.mseed
Processed file 18/28: 25_20210403_9fnuph.mseed
Processed file 19/28: 09_20210318_9fnuph.mseed
Processed file 20/28: 30_20210408_9fnuph.mseed
Processed file 21/28: 03_20210314_9fnuph.mseed
Processed file 22/28: 

~~This is the correct number of points (+1) for 24 hours of data sampled at 25 Hz. The single extra point is a product of the decimation, will be removed in the preparing ai ready data notebook.~~