# Converted Fast File Standardization

This step is to be completed after following the CardConvert of the protocol outlined in Protocol.md

The first thing that this notebook does is take the output files from CardConvert and process them into a more uniform format for EddyPro.

In EddyPro, the following things need to be consistent across time:
* Data column order
* Number of header lines
* Number of "text" columns to be ignored
* Seamless boundaries between files

However, EddyPro can handle the following, which will be addressed later:
* Changes in instrument location and orientation
* Changes in instrument type
* Changes in acquisiton frequency and file size

The program will also combine data from the same site. For example, at the BB-NF site, "Fast" data is collected at both a 3m tower and at a 17m tower. This data will have to be combined.

In [30]:
from pathlib import Path
import re

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## User inputs

Please input the following information in the next cell:
* path to the directory containing the "Converted" files of interest
* the time-span represented by each file, in minutes (usually 30)
* the acquisition frequency of the files, in Hz (usually 10)
* the start date of the timeseries, in yyyy-mm-dd HH:MM format. This should be the timestamp of the **first** line of the **first** file of interest, rounded **down** to the nearest half-hour.
* the end date of the timeseries, in yyyy-mm-dd HH:MM format. This should be the timestamp of the **first** line of the **last** file of interest, rounded **down** to the nearest half-hour.
* The name of the site (either BB-SF, BB-NF or BB-UF, or something else that the system recognizes). This will point the program towards the correct metadata file containing information on instruments and history for the site.

The program will use this information to correctly format the "Fast" files.

In [82]:
# User input: Change me!

converted_dir = "/Volumes/TempData/Bretfeld Mario/Chimney-Park-Reprocessing-Sandbox/Alex Work/Bad/Chimney/EC Processing/BB-NF/Converted"
file_length = 30  # minutes
acq_freq = 10  # Hz
start_time = "2021-03-06 15:30"
end_time = "2021-03-30 11:00"
site_name = "BB-NF"

In [83]:
# Processing step: don't edit me!

# find files
converted_path = Path(converted_dir)
fns_3m = converted_path.glob("*3m*.dat")
fns_17m = converted_path.glob("*17m*.dat")

In [87]:
for fn in fns_3m: print(str((fn))

SyntaxError: unexpected EOF while parsing (<ipython-input-87-854c0c1d2913>, line 1)

In [88]:
# compute file size
file_rows = file_length * acq_freq * 60
file_start_times = pd.date_range(file_start_time, end_time, freq=f"{file_length}min")
n_files = len(file_start_times)
print(n_files)

rawfile_metadata_3m = pd.DataFrame(data=np.zeros((len(list(fns_3m)), 9)), columns=['File_name', 'Encoding', 'Station_name', 'Datalogger_model', 'Datalogger_serial_number', 'Datalogger_OS_version', 'Datalogger_program_name', 'Datalogger_program_signature', 'Table_name'])

for fn in fns_3m: print(fn)
for i, t_start, fn in zip(range(n_files), file_start_times, fns_3m):
    print(i)
    # example:
    # TOA5_10365.CP_EC_BBNF3m_10Hz665_2021_03_25_0900.dat
    local_name = fn.name
    
    # get file id
    # [TOA5_10365.CP_EC_BBNF3m_10, 665_2021_03_25_0900.dat]
    file_id = local_name.split("Hz")[1]  # second half of file name gives
    
    # get timestamp
    # [665, 2021, 03, 25, 0900, dat] ---> 202103250900
    timestamp_str = "".join(re.split("_|\.", file_id)[1:-1])
    
    # lines are as follows:
    # file metadata
    # colnames
    # colunits
    # record type (avg, sample,  total, etc...)
    dat = pd.read_csv(fn, ',', header=[1,2], skiprows=[3])
    file_metadata = list(pd.read_csv(fn, ',',  header=[0], nrows=0))
    print(i)
    print(rawfile_metadata_3m.iloc[i])
    rawfile_metadata_3m.iloc[i] = [fn] + file_metadata
    print(rawfile_metadata_3m.iloc[i])
    
                            
    
    
    

1144


In [89]:
[fn] + file_metadata

[PosixPath('/Volumes/TempData/Bretfeld Mario/Chimney-Park-Reprocessing-Sandbox/Alex Work/Bad/Chimney/EC Processing/BB-NF/Converted/._TOA5_10365.CP_EC_BBNF3m_10Hz646_2021_03_06_1559.dat'),
 'TOA5',
 '10365',
 'CR3000',
 '10365.1',
 'CR3000.Std.32.03',
 'CPU:CPk_BBNF_EC3m_20190519.CR3',
 '43204',
 'TimeSeriesData']

In [85]:

rawfile_metadata_3m.iloc[0] = [fn] + file_metadata

In [71]:
rawfile_metadata_3m

Unnamed: 0,File_name,Encoding,Station_name,Datalogger_model,Datalogger_serial_number,Datalogger_OS_version,Datalogger_program_name,Datalogger_program_signature,Table_name
0,/Volumes/TempData/Bretfeld Mario/Chimney-Park-...,TOA5,10365,CR3000,10365.1,CR3000.Std.32.03,CPU:CPk_BBNF_EC3m_20190519.CR3,43204,TimeSeriesData
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...
1138,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1139,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1140,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1141,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [53]:
pd.read_csv?

[0;31mSignature:[0m
[0mpd[0m[0;34m.[0m[0mread_csv[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfilepath_or_buffer[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mForwardRef[0m[0;34m([0m[0;34m'PathLike[str]'[0m[0;34m)[0m[0;34m,[0m [0mstr[0m[0;34m,[0m [0mIO[0m[0;34m[[0m[0;34m~[0m[0mT[0m[0;34m][0m[0;34m,[0m [0mio[0m[0;34m.[0m[0mRawIOBase[0m[0;34m,[0m [0mio[0m[0;34m.[0m[0mBufferedIOBase[0m[0;34m,[0m [0mio[0m[0;34m.[0m[0mTextIOBase[0m[0;34m,[0m [0m_io[0m[0;34m.[0m[0mTextIOWrapper[0m[0;34m,[0m [0mmmap[0m[0;34m.[0m[0mmmap[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msep[0m[0;34m=[0m[0;34m<[0m[0mobject[0m [0mobject[0m [0mat[0m [0;36m0x7fedad5420d0[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdelimiter[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mheader[0m[0;34m=[0m[0;34m'infer'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnames[0m[0;34m=[0m[0;32mNone[0m

In [27]:
for fn in fns_3m:
    print(fn.name)

TOA5_10365.CP_EC_BBNF3m_10Hz665_2021_03_25_0900.dat
TOA5_10365.CP_EC_BBNF3m_10Hz650_2021_03_10_2100.dat
TOA5_10365.CP_EC_BBNF3m_10Hz662_2021_03_22_0230.dat
TOA5_10365.CP_EC_BBNF3m_10Hz663_2021_03_23_1630.dat
TOA5_10365.CP_EC_BBNF3m_10Hz654_2021_03_14_2100.dat
TOA5_10365.CP_EC_BBNF3m_10Hz654_2021_03_14_1700.dat
TOA5_10365.CP_EC_BBNF3m_10Hz668_2021_03_28_0600.dat
TOA5_10365.CP_EC_BBNF3m_10Hz652_2021_03_12_2000.dat
TOA5_10365.CP_EC_BBNF3m_10Hz656_2021_03_16_1230.dat
TOA5_10365.CP_EC_BBNF3m_10Hz650_2021_03_10_2200.dat
TOA5_10365.CP_EC_BBNF3m_10Hz651_2021_03_11_1130.dat
TOA5_10365.CP_EC_BBNF3m_10Hz665_2021_03_25_0300.dat
TOA5_10365.CP_EC_BBNF3m_10Hz649_2021_03_09_1730.dat
TOA5_10365.CP_EC_BBNF3m_10Hz660_2021_03_20_0530.dat
TOA5_10365.CP_EC_BBNF3m_10Hz658_2021_03_18_2200.dat
TOA5_10365.CP_EC_BBNF3m_10Hz668_2021_03_28_2330.dat
TOA5_10365.CP_EC_BBNF3m_10Hz655_2021_03_15_0230.dat
TOA5_10365.CP_EC_BBNF3m_10Hz662_2021_03_22_0200.dat
TOA5_10365.CP_EC_BBNF3m_10Hz647_2021_03_07_1330.dat
TOA5_10365.C