# Example of Reading an EFDC Output Binary File

This notebook provides an example of reading an EFDC output binary file "DYECON.bin". The python module "binary_reader.py" from LimnoTech's proprietary Python package "LTPy" was used in this example. When the "EFDCBinaryReader" object is initialize it requires a csv file named "bin_config.csv" to be in a directory named "data" if no directory for the config file is provided. The "bin_config.csv" file has been placed in the "data" directory for this example.

In [1]:
from datetime import datetime
from binary_reader import EFDCBinaryReader
from dask.dataframe import from_pandas

## EFDC Model Files Names
The name of the EFDC main input deck needs to be provided. In almost all cases this will be "efdc.inp". The name of the EFDC output binary file type that will be read also needs to be provided. In this example the dye concentration file "DYECON.bin" will be read.

In [2]:
EFDC_INP_FILE = '\\\\efdc.inp'
EFDC_DYE_FILE = '\\\\DYECON.bin'

## Additional EFDC Simulation Specific Items
The EFDC model files do not store an actual date and time of when the simulation begins; the model zero date/time (data_begin) is user defined seperate of any EFDC input files. As a side note LimnoTech's proprietary post-processing software "WinModel" does have an associated database where zero date/time is stored, but this "*_EFDC.mdb" database may not be present for a given EFDC simulation directory. In addition to date_begin, the EFDC simulation directory (efdc_root) will also need to be provided.

The DYECON.bin file for this example is over 100 MB (along with several other output files from this simulation) which exceeds the github upload limit. These EFDC binary output files will be excluded from github but they may be found on the LimnoTech network at: I:\2ERDC12\Task5_ClearWater\EFDC_Example\20110125_2007Baseline_1992

In [3]:
date_begin = datetime(1991,12,20)
efdc_root = r'C:\\Users\\jrutyna\\githubDesktop\\ClearWater-riverine\\examples\\Read_EFDC_Example\\20110125_2007Baseline_1992'

## Use EFDC Binary Reader
A pandas dataframe will be created when the EFDC binary reader is called

In [4]:
reader = EFDCBinaryReader(date_begin=date_begin, path_efdcinp=efdc_root + EFDC_INP_FILE)
df_dye = reader.process_bin_file(efdc_root + EFDC_DYE_FILE)

Processing file {0} DYECON.bin
Field name: DYE_mgL
Assumed binary file units: mg/L
Output units: mg/L
Applying conversion factor: 1 mg/L = 1 mg/L


## Check if the dataframe was created
The reader will provide the model results by EFDC "grid_no" and then by all of the output timesteps corrected to an actual data/time format based on "date_begin". These results can be mapped by "grid_no" with the provided "EFDC.shp" shape file in the "shapefiles" directory.

In [5]:
df_dye.head

<bound method NDFrame.head of           grid_no            datetime  DYE_mgL
0               1 1991-12-20 00:00:00      0.0
1               1 1991-12-20 01:00:00      0.0
2               1 1991-12-20 02:00:00      0.0
3               1 1991-12-20 03:00:00      0.0
4               1 1991-12-20 04:00:00      0.0
...           ...                 ...      ...
28023403     3089 1992-12-31 19:00:00      0.0
28023404     3089 1992-12-31 20:00:00      0.0
28023405     3089 1992-12-31 21:00:00      0.0
28023406     3089 1992-12-31 22:00:00      0.0
28023407     3089 1992-12-31 23:00:00      0.0

[28023408 rows x 3 columns]>

## Export pandas dataframe as parquet data files
Since the DYECON.bin file exceeded github upload limits the pandas dataframe will be exported as several parquet data files. There are over 3,000 EFDC model cells in this example and the parquet partitions will be created by rounding the EFDC grid number to the nearest thousand. The pandas dataframe will be converted to a dask dataframe since the dask "to_parquet" method has a "partition_on" parameter that is easy to use.

In [6]:
df_dye['CellGroup'] = df_dye.grid_no.round(decimals=-3)
dd_dye = from_pandas(df_dye, npartitions=1)
dd_dye.to_parquet(
   path='C:/Users/jrutyna/githubDesktop/ClearWater-riverine/examples/Read_EFDC_Example/daskExport',
   engine='pyarrow',
   compression='gzip',
   partition_on=['CellGroup']
)
dd_dye.head

<bound method _Frame.head of Dask DataFrame Structure:
              grid_no        datetime  DYE_mgL CellGroup
npartitions=1                                           
0               int32  datetime64[ns]  float32     int32
28023407          ...             ...      ...       ...
Dask Name: from_pandas, 1 tasks>