# Example Usage

Given a .ghg file, this script can parse all relevant metadata and extract raw data for further processing

## Inputs

filepath: 

* path to GHG filepath

mode: 

* 1 - only parse metadata, returned as a nested dictionary
* 2 - parse metadata **and** dump raw data to time-stamped pandas data frame(s)


depth:

* base - default behavior, only parse files in root of GHG, this is sufficient in most use cases
* full - parse all files, including nested subdirectories, gives access to config files and eddypro files where present

## Example 1: Parse all metadata




In [15]:
import importlib
import parseGHG
import yaml
import time

importlib.reload(parseGHG)

T1 = time.time()
# path to a .ghg file
filepath = r"example_Data\2022-09-04T080000_smart3-00495.ghg"
print('Parsing: ',filepath)
# declare the class instance
pGHG = parseGHG.parseGHG()
# call the parse function (mode = 1, depth='full') to just extract metadata for **all** files
pGHG.parse(filepath,mode=1,depth='full')

print('Time to complete: ',time.time()-T1)
print()

# write these Metadata dict to a human readable yaml format
outName = filepath.split('.')[0]+'.yml'
with open(outName,'w') as outFile:
    print('Saving metadata as: ',outName)
    yaml.dump(pGHG.Metadata,outFile,sort_keys=False)

Parsing:  example_Data\2022-09-04T080000_smart3-00495.ghg
Time to complete:  0.10768914222717285

Saving metadata as:  example_Data\2022-09-04T080000_smart3-00495.yml


## Example 2: Read the essential metadata + raw data files 

* Raw high frequency data and biomet data (where present)

In [27]:

importlib.reload(parseGHG)

T1 = time.time()
filepath = r"example_Data\2024-08-01T120000_AIU-2264.ghg"
print('Parsing: ',filepath)
# declare the class instance
pGHG = parseGHG.parseGHG()
# call the parse function (mode = 2) to just extract metadata for just the base files
pGHG.parse(filepath,mode=2,depth='base')
print('Time to complete: ',time.time()-T1)

# write these Metadata dict to a human readable yaml format
outName = filepath.split('.')[0]+'.yml'
with open(outName,'w') as outFile:
    print('Saving metadata as: ',outName)
    yaml.dump(pGHG.Metadata,outFile,sort_keys=False)

# The raw data are saved to a dict called "Data"
# This sill include up to 3 dataframes
# 1) pGHG.Data['data'] > raw highfrequency data
# 2) pGHG.Data['biometdata'] > raw biomet data (if present)
# 3) pGHG.Data['li7700status'] > high frequency status data from LI7700 (if present)
print(pGHG.Data.keys())

pGHG.Data['data'].head()
    

Parsing:  example_Data\2024-08-01T120000_AIU-2264.ghg
Time to complete:  0.7627401351928711
Saving metadata as:  example_Data\2024-08-01T120000_AIU-2264.yml
dict_keys(['data', 'biometdata', 'li7700status'])


Unnamed: 0,DATAH,Seconds,Nanoseconds,Sequence Number,Diagnostic Value,Diagnostic Value 2,Date,Time,CO2 Absorptance,H2O Absorptance,...,CH4 (mmol/m^3),CH4 Temperature,CH4 Pressure,CH4 Signal Strength,CH4 Thermocouple Input 1,CH4 Thermocouple Input 2,CH4 Thermocouple Input 3,CH4 Diagnostic Value,CH4 Drop Rate (%),CHK
2024-08-01 12:00:00.000,DATA,1722542400,0,23103297,8191,1,2024-08-01,12:00:00:000,0.126657,0.086289,...,0.082148,28.0731,101.833,64.1732,9999.99,9999.99,9999.99,143,0,72
2024-08-01 12:00:00.050,DATA,1722542400,50000000,23103312,8191,1,2024-08-01,12:00:00:050,0.12656,0.086205,...,0.081934,28.0629,101.83,64.0824,9999.99,9999.99,9999.99,143,0,222
2024-08-01 12:00:00.100,DATA,1722542400,100000000,23103327,8191,1,2024-08-01,12:00:00:100,0.12652,0.086165,...,0.08199,28.0651,101.827,65.7794,9999.99,9999.99,9999.99,143,0,61
2024-08-01 12:00:00.150,DATA,1722542400,150000000,23103342,8191,1,2024-08-01,12:00:00:150,0.126529,0.086097,...,0.082136,28.0664,101.828,65.0839,9999.99,9999.99,9999.99,143,0,6
2024-08-01 12:00:00.200,DATA,1722542400,200000000,23103357,8191,1,2024-08-01,12:00:00:200,0.126527,0.086117,...,0.081938,28.0688,101.83,65.1074,9999.99,9999.99,9999.99,143,0,47
