# Make an MTH5 from LEMI data

This notebook provides an example of how to read in LEMI (.TXT) files into an MTH5.  

In [1]:
from mth5.mth5 import MTH5
from mth5.io.lemi import LEMICollection
from mth5.clients import MakeMTH5

from mth5_test_data import get_test_data_path

lemi_path = get_test_data_path("lemi")

### LEMI Collection

We will use the `LEMICollection` to assemble the *.txt* files into a logical order by schedule action or run. The output LEMI files include all data for each channel.

**IMPORTANT:** `LEMICollection` assumes the given file path is for a single station. 

*Metadata:* we need to input the `station_id` and the `survey_id` to provide minimal metadata when making an MTH5 fild.  

The `LEMICollection.get_runs()` will return a two level ordered dictionary (`OrderedDict`).  The first level is keyed by station ID.  These objects are in turn ordered dictionaries by run ID.  Therefore you can loop over stations and runs.  

**Note**: `n_samples` is an estimate based on file size not the data.  To get an accurate number you should read in the full file. 

In [2]:
zc = LEMICollection(lemi_path)
zc.station_id = "mt001"
zc.survey_id = "test"
runs = zc.get_runs(sample_rates=[1])
print(f"Found {len(runs)} station with {len(runs[list(runs.keys())[0]])} runs")

Found 1 station with 2 runs


In [3]:
for run_id, run_df in runs[zc.station_id].items():
    display(run_df)

Unnamed: 0,survey,station,run,start,end,channel_id,component,fn,sample_rate,file_size,n_samples,sequence_number,dipole,coil_number,latitude,longitude,elevation,instrument_id,calibration_fn
0,test,mt001,sr1_0001,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,1,"bx,by,bz,e1,e2,temperature_e,temperature_h",C:\Users\peaco\OneDrive\Documents\GitHub\mth5_...,1.0,18238,120,0,,,,,,LEMI424,


Unnamed: 0,survey,station,run,start,end,channel_id,component,fn,sample_rate,file_size,n_samples,sequence_number,dipole,coil_number,latitude,longitude,elevation,instrument_id,calibration_fn
1,test,mt001,sr1_0002,2020-10-02 00:00:00+00:00,2020-10-02 00:01:59+00:00,1,"bx,by,bz,e1,e2,temperature_e,temperature_h",C:\Users\peaco\OneDrive\Documents\GitHub\mth5_...,1.0,18240,120,0,,,,,,LEMI424,


## Build MTH5

Now that we have a logical collection of files, lets load them into an MTH5.  We will simply loop of the stations, runs, and channels in the ordered dictionary.

There are a few things that to keep in mind:  

- The LEMI raw files come with very little metadata, so as a user you will have to manually input most of it. 
- The output files from a LEMI are already calibrated into units of nT and mV/km (I think), therefore there are no filter to apply to calibrate the data. 
- Since this is a MTH5 file version 0.2.0 the filters are in the `survey_group` so add them there.  

In [4]:
mth5_path = MakeMTH5.from_lemi424(lemi_path, "test", "mt01")

[1m2026-01-05T21:39:52.726196-0800 | INFO | mth5.mth5 | _initialize_file | line: 678 | Initialized MTH5 0.2.0 file c:\Users\peaco\OneDrive\Documents\GitHub\mth5_tutorial\src\make_mth5\from_lemi424.h5 in mode w[0m
[1m2026-01-05T21:39:57.983802-0800 | INFO | mth5.mth5 | close_mth5 | line: 772 | Flushing and closing c:\Users\peaco\OneDrive\Documents\GitHub\mth5_tutorial\src\make_mth5\from_lemi424.h5[0m


#### MTH5 Structure

Have a look at the MTH5 structure and make sure it looks correct.

In [5]:
with MTH5() as m:
    m = m.open_mth5(mth5_path)
    print(m)
    channel_df = m.channel_summary.to_dataframe()
    run_df = m.run_summary
    experiment = m.to_experiment()

/:
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: test
            --------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                    |- Group: fap
                    -------------
                    |- Group: fir
                    -------------
                    |- Group: time_delay
                    --------------------
                    |- Group: zpk
                    -------------
                |- Group: Reports
                -----------------
                |- Group: Standards
                -------------------
                    --> Dataset: summary
                    ......................
   

### Channel Summary

Have a look at the channel summary and make sure everything looks good.

In [7]:
channel_df


Unnamed: 0,survey,station,run,latitude,longitude,elevation,component,start,end,n_samples,sample_rate,measurement_type,azimuth,tilt,units,has_data,hdf5_reference,run_hdf5_reference,station_hdf5_reference
0,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,bx,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
1,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,by,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
2,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,bz,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
3,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,e1,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,electric,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
4,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,e2,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,electric,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
5,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,temperature_e,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,auxiliary,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
6,test,mt01,sr1_0001,34.080647,-107.214075,2202.4,temperature_h,2020-10-01 00:00:00+00:00,2020-10-01 00:01:59+00:00,120,1.0,auxiliary,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
7,test,mt01,sr1_0002,34.080647,-107.214075,2202.4,bx,2020-10-02 00:00:00+00:00,2020-10-02 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
8,test,mt01,sr1_0002,34.080647,-107.214075,2202.4,by,2020-10-02 00:00:00+00:00,2020-10-02 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
9,test,mt01,sr1_0002,34.080647,-107.214075,2202.4,bz,2020-10-02 00:00:00+00:00,2020-10-02 00:01:59+00:00,120,1.0,magnetic,0.0,0.0,,True,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>


## Run Summary

In [8]:
run_df

Unnamed: 0,channel_scale_factors,duration,end,has_data,input_channels,mth5_path,n_samples,output_channels,run,sample_rate,start,station,survey,run_hdf5_reference,station_hdf5_reference
0,"{'bx': 1.0, 'by': 1.0, 'bz': 1.0, 'e1': 1.0, '...",119.0,2020-10-01 00:01:59+00:00,True,"[bx, by]",c:/Users/peaco/OneDrive/Documents/GitHub/mth5_...,120,"[bz, e1, e2]",sr1_0001,1.0,2020-10-01 00:00:00+00:00,mt01,test,<HDF5 object reference>,<HDF5 object reference>
1,"{'bx': 1.0, 'by': 1.0, 'bz': 1.0, 'e1': 1.0, '...",119.0,2020-10-02 00:01:59+00:00,True,"[bx, by]",c:/Users/peaco/OneDrive/Documents/GitHub/mth5_...,120,"[bz, e1, e2]",sr1_0002,1.0,2020-10-02 00:00:00+00:00,mt01,test,<HDF5 object reference>,<HDF5 object reference>


## Experiment Metadata

The `experiment` object contains all the metadata contained within the MTH5 file.

In [6]:
experiment

Experiment Contents
--------------------
Number of Surveys: 1
  Survey ID: test
  Number of Stations: 1
  Number of Filters: 0
  --------------------
    Station ID: mt01
    Number of Runs: 2
    --------------------
      Run ID: sr1_0001
      Number of Channels: 7
      Recorded Channels: bx, by, bz, e1, e2, temperature_e, temperature_h
      Start: 2020-10-01T00:00:00+00:00
      End:   2020-10-01T00:01:59+00:00
      --------------------
      Run ID: sr1_0002
      Number of Channels: 7
      Recorded Channels: bx, by, bz, e1, e2, temperature_e, temperature_h
      Start: 2020-10-02T00:00:00+00:00
      End:   2020-10-02T00:01:59+00:00
      --------------------

In [11]:
experiment.surveys[0].stations[0].runs[0]

{
    "run": {
        "acquired_by.author": "",
        "channels_recorded_auxiliary": [
            "temperature_e",
            "temperature_h"
        ],
        "channels_recorded_electric": [
            "e1",
            "e2"
        ],
        "channels_recorded_magnetic": [
            "bx",
            "by",
            "bz"
        ],
        "data_logger.firmware.author": "",
        "data_logger.firmware.name": "",
        "data_logger.firmware.version": "",
        "data_logger.manufacturer": "LEMI",
        "data_logger.model": "LEMI424",
        "data_logger.power_source.voltage.end": 12.98,
        "data_logger.power_source.voltage.start": 12.99,
        "data_logger.timing_system.drift": 0.0,
        "data_logger.timing_system.type": "GPS",
        "data_logger.timing_system.uncertainty": 0.0,
        "data_type": "BBMT",
        "id": "sr1_0001",
        "metadata_by.author": "",
        "provenance.archive.name": "",
        "provenance.creation_time": "1980-01-01T0