# Example Usage

Given a .ghg file, this script can parse all relevant metadata and extract raw data for further processing

## Inputs

filepath: 

* path to GHG filepath

mode: 

* 1 - only parse metadata, returned as a nested dictionary
* 2 - parse metadata **and** dump raw data to time-stamped pandas data frame(s)


depth:

* base - default behavior, only parse files in root of GHG, this is sufficient in most use cases
* full - parse all files, including nested subdirectories, gives access to config files and eddypro files where present

## Example 1: Parse all metadata




In [2]:
import importlib
import parseGHG
import yaml
import time

importlib.reload(parseGHG)

T1 = time.time()
# path to a .ghg file
filepath = r"example_Data\2022-09-04T080000_smart3-00495.ghg"
print('Parsing: ',filepath)
# declare the class instance
pGHG = parseGHG.parseGHG()
# call the parse function (mode = 1, depth='full') to just extract metadata for **all** files
pGHG.parse(filepath,mode=1,depth='full')

print('Time to complete: ',time.time()-T1)
print()

# write these Metadata dict to a human readable yaml format
outName = filepath.split('.')[0]+'.yml'
with open(outName,'w') as outFile:
    print('Saving metadata as: ',outName)
    yaml.dump(pGHG.Metadata,outFile,sort_keys=False)

Parsing:  example_Data\2022-09-04T080000_smart3-00495.ghg
Time to complete:  0.05124402046203613

Saving metadata as:  example_Data\2022-09-04T080000_smart3-00495.yml


## Example 2: Read the essential metadata + raw data files 

* Raw high frequency data and biomet data (where present)

In [11]:

importlib.reload(parseGHG)

T1 = time.time()
filepath = r"example_Data\2024-08-01T120000_AIU-2264.ghg"
print('Parsing: ',filepath)
# declare the class instance
pGHG = parseGHG.parseGHG()
# call the parse function (mode = 2) to just extract metadata for just the base files
pGHG.parse(filepath,mode=2,depth='base')
print('Time to complete: ',time.time()-T1)


# write these Metadata dict to a human readable yaml format
outName = filepath.split('.')[0]+'.yml'
with open(outName,'w') as outFile:
    print('Saving metadata as: ',outName)
    yaml.dump(pGHG.Metadata,outFile,sort_keys=False)

# The raw data are saved to a dict called "Data"
# This sill include up to 3 dataframes
# 1) pGHG.Data['data'] > raw highfrequency data
# 2) pGHG.Data['biometdata'] > raw biomet data (if present)
# 3) pGHG.Data['li7700status'] > high frequency status data from LI7700 (if present)
for name,data in pGHG.Data.items():
    print(name,' file:')
    print()
    print(data.head())
    

Parsing:  example_Data\2024-08-01T120000_AIU-2264.ghg
Time to complete:  0.4376838207244873
Saving metadata as:  example_Data\2024-08-01T120000_AIU-2264.yml
data  file:

                        DATAH     Seconds  Nanoseconds  Sequence Number  \
2024-08-01 12:00:00.000  DATA  1722542400            0         23103297   
2024-08-01 12:00:00.050  DATA  1722542400     50000000         23103312   
2024-08-01 12:00:00.100  DATA  1722542400    100000000         23103327   
2024-08-01 12:00:00.150  DATA  1722542400    150000000         23103342   
2024-08-01 12:00:00.200  DATA  1722542400    200000000         23103357   

                         Diagnostic Value  Diagnostic Value 2        Date  \
2024-08-01 12:00:00.000              8191                   1  2024-08-01   
2024-08-01 12:00:00.050              8191                   1  2024-08-01   
2024-08-01 12:00:00.100              8191                   1  2024-08-01   
2024-08-01 12:00:00.150              8191                   1  2024-08-