# Overview
The **objective** of this notebook is to present the flow chart of conducting data assimilation on [PFLOTRAN](https://www.pflotran.org/) by using [DART](https://www.image.ucar.edu/DAReS/DART/). Briefly, the procedures are as follows:
- [x] [Configuration](#parameter): define directories, file locations, and other parameters
- [x] [PFLOTRAN preparation](#pflotran_prepare): generate PFLOTRAN input files
- [x] [PFLOTRAN model spin-up](#pflotran_spinup): conduct model spin-up
- [x] [DART files preparation](#dart_prepare): add new DART quantities, prepare DART input namelists, prepare DART prior data, prepare observations in DART format, and check ```model_mod``` interface
- [x] [Generate all the executable files](#dart_executables): generate all the executables, convert observations in DART format, check ```model_mod``` interface, and test the filter
- [ ] [Run DART and PFLOTRAN](#run_dart_pflotran): run the shell script for integrating DART filter and PFLOTRAN model

Here, we perform inverse modeling on a 1D thermal model for illustration. The model assimilates temperature observation to update its parameters (i.e., flow flux, porosity, and thermal conductivity). For now, the ensemble Kalman filter (EnKF) is used for assimilation.

<a id='parameter'></a>
# Configuration

In [1]:
import os
import re
import shutil
import pickle
from math import floor
from datetime import datetime, timedelta

In [2]:
# MPI settings
mpi_exe  = '/usr/local/bin/mpirun'  # The location of mpirun
ncore_da = 1                        # The number of MPI cores used for DART
ncore_pf = 1                        # The number of MPI cores used for PFLOTRAN

# PFLOTRAN executable
pflotran_exe  = '/Users/jian449/Codes/pflotran/src/pflotran/pflotran'

# Main directory names
temp_app_dir = os.path.abspath("../template" )          # The template for application folder
app_dir      = os.path.abspath("../1dthermal")          # The application folder name
dart_dir     = os.path.abspath("../../../../")
dart_pf_dir  = os.path.join(dart_dir, "models/pflotran")     # The dart pflotran utitlity folder name

directories = {"APP_DIR": app_dir, "DART_DIR": dart_dir}

****************
**Prepare the application directory**


In [3]:
# Create the application directory if it does not exists
if not os.path.isdir(app_dir):
    shutil.copytree(temp_app_dir, app_dir)

****************
**Define subdirectories under the application and DART directories**

In [4]:
# Application directories
obs_type_dir    = os.path.join(app_dir, 'obs_type')        # Directory for defining default DART generic quantity  
app_work_dir    = os.path.join(app_dir, 'work')            # Directory for the compiled shell scripts
pflotran_in_dir = os.path.join(app_dir, 'pflotran_input')  # Directory for saving PFLOTRAN input files
pflotran_out_dir= os.path.join(app_dir, 'pflotran_output') # Directory for saving PFLOTRAN output files
dart_data_dir   = os.path.join(app_dir, 'dart_inout')      # Directory for saving DART in-out files

# DART directories
obs_kind_dir    = os.path.join(dart_pf_dir, 'obs_kind')       # Directory for mapping observation variable with DART generic quantity
utils_dir       = os.path.join(dart_pf_dir, 'utils')          # Directory for utility files
dart_work_dir   = os.path.join(dart_pf_dir, 'work')           # Directory for compiling and running DART files


****************
**Define the required file names**

In [5]:
# DART file names
def_obs_kind   = os.path.join(obs_kind_dir, 'DEFAULT_obs_kind_mod.f90')   # The default DART generic quantity file
obs_type       = os.path.join(obs_type_dir, 'obs_def_pflotran_mod.f90')   # The map between DART generic quantities and observation variables
input_nml      = os.path.join(app_work_dir, 'input.nml')                  # The input namelists used by DART programs
input_nml_template= os.path.join(dart_work_dir, 'input.nml.template')                  # The input namelists used by DART programs
config_pickle  = os.path.join(app_work_dir, 'config.p')                 # The pickle file for saving input namelists in dictionary format

# PFLOTRAN file names
pflotran_sh   = os.path.join(utils_dir, 'pflotran.sh')                 # Script for running PFLOTRAN
pflotran_in   = os.path.join(pflotran_in_dir, 'pflotran.in')           # PFLOTRAN input deck file
pflotran_para = os.path.join(pflotran_in_dir, 'parameter_prior.h5')    # PFLOTRAN parameter HDF file
pflotran_out  = os.path.join(pflotran_out_dir, 'pflotranR[ENS].h5')    # PFLOTRAN output HDF filename template for each ensemble [ENS]

# Data file names, including observations and model input/output
obs_original = os.path.join(pflotran_in_dir, 'temperature.csv')          # The original observation file in CSV format
obs_nc       = os.path.join(pflotran_in_dir, 'obs_pflotran.nc')          # The converted observation file in NetCDF format
obs_dart     = os.path.join(dart_data_dir, 'obs_seq_pflotran.out')       # The converted observation file in DART format
dart_prior_nc= os.path.join(dart_data_dir, 'prior_R[ENS].nc')            # The NetCDF filename template for DART's prior data of each ensemble [ENS]
dart_posterior_nc= os.path.join(dart_data_dir, 'posterior_R[ENS].nc')            # The NetCDF filename template for DART's posterior data of each ensemble [ENS]
dart_prior_template = re.sub(r"R\[ENS\]",'template',dart_prior_nc)  # The NetCDF filename for DART
dart_input_list     = os.path.join(dart_data_dir, "filter_input_list.txt")   # The list of DART prior files for all ensembles
dart_output_list    = os.path.join(dart_data_dir, "filter_output_list.txt") # The list of DART posterior files for all ensembles
convert_nc_file     = os.path.join(utils_dir, 'convert_nc.f90')              # Fortran script for converting NetCDF observation file to DART format
convert_nc_template = os.path.join(utils_dir, 'convert_nc_template.f90')    # Template Fortran script for converting NetCDF observation file to DART format

# Some shell scripts or executable files
convert_nc              = os.path.join(app_work_dir , 'convert_nc')           # The executable file to convert observation from NetCDF to DART formats
model_mod_check         = os.path.join(app_work_dir , 'model_mod_check')      # The executable file to check model_mod.f90
filter_exe              = os.path.join(app_work_dir , 'filter')               # The executable file for filtering
quickbuild              = os.path.join(dart_work_dir , 'quickbuild.csh')          # Shell script for generating several executable files
# compile_convert_nc      = os.path.join(dart_work_dir, 'dart_seq_convert.csh') # Shell script for generating the executable file to convert observation from NetCDF to DART formats
# compile_model_check_mod = os.path.join(dart_work_dir, 'check_model_mod.csh')  # Shell script for checking model_mod.F90 file
run_filter              = os.path.join(dart_work_dir, 'run_filter.csh')       # Shell script for running DART filter and PFLOTRAN
advance_model           = os.path.join(dart_work_dir, 'advance_model.csh')    # Shell script for forward simulation of PFLOTRAN

# Utility file names
csv_to_nc          = os.path.join(utils_dir, 'csv2nc.py')                  # Python script for converting raw observation to NetCDF file
to_dartqty         = os.path.join(utils_dir, 'list2dartqty.py')            # Python script for reading a list of variable names and adding new DART quantities      
prep_pflotran_in   = os.path.join(utils_dir, 'prepare_pflotran_input.py') # Python script for preparing PFLOTRAN.in and parameter.h5 files
prep_convert_nc    = os.path.join(utils_dir, 'prepare_convert_nc.py')      # Python script for preparing convert_nc.F90 script
prep_prior_nc      = os.path.join(utils_dir, 'prepare_prior_nc.py')        # Python script for preparing the prior NetCDF files for DART
prep_inputnml      = os.path.join(utils_dir, 'prepare_input_nml.py')       # Python script for preparing the input.nml file

****************
**Specify the observation data to be assimilated and the PFLOTRAN parameters to be analyzed in DART**

In [6]:
# Observation data to be assimilated
obs_var_set = ['TEMPERATURE']

# The PFLOTRAN parameters to be analyzed
para_set    = ['FLOW_FLUX','POROSITY','THERMAL_CONDUCTIVITY']

pflotran_parastate_set = obs_var_set + para_set

**Specify the temporal information**
- model spinup time
- the map between the begin of observation assimilation and model start time/end of spinup

**note that** model start time is considered after the spinup

In [7]:
# Model spinup length
spinup_length   = 0.5    # spinup time (day)

# Map between assimilation time and model start time/spinup length
assim_duration  = 600./86400  # day
assim_start     = datetime(2017,4,1,0,0,0)
assim_start_str = assim_start.strftime("%Y-%m-%d %H:%M:%S")

assim_start_day    = int(floor(spinup_length))
assim_start_second = int((spinup_length - assim_start_day)*86400)
assim_end_day      = int(floor(spinup_length+assim_duration))
assim_end_second   = int((spinup_length+assim_duration - assim_end_day)*86400)


****************
**Define the data assimilation configurations**

In [8]:
# Data assimilation configurations
# More need to be added...
# And later on, these DA setting can be saved in a txt or pickel file for further loading
obs_resolution= 300.0  # second
obs_error     = 0.1    # observation error
nens          = 30     # number of ensembles

# Assimilation time window time_step_days+time_step_seconds
obs_window_days    = 0   # assimilation time window/step (day)
obs_window_seconds = 1200 # assimilation time window/step  (second)
first_obs_days    = floor(spinup_length)
first_obs_seconds = (spinup_length-first_obs_days)*86400
last_obs_days     = first_obs_days+obs_window_days
last_obs_seconds  = first_obs_seconds+obs_window_seconds*2

In [9]:
# Save the above parameters in pickle???

<a id='pflotran_prepare'></a>
# PFLOTRAN preparation
*Here, we use Kewei's 1D thermal model as an example for generating PFLOTRAN input card and parameter.h5.*

In this section, the following procedures are performed:
- generate PFLOTRAN input deck file ```PFLOTRAN.in```
- generate the parameter files in HDF 5, ```parameter_prior.h5```, used by PFLOTRAN input deck file

**Note that**
- ```PFLOTRAN.in``` for each DA scenario should be prepared by users.

**Run code**
- Run: ```prepare_pflotran_inpara.py```
- Code input arguments:
    - <span style="background-color:yellow">pflotran_in</span>: filename for ```pflotran.in```
    - <span style="background-color:yellow">pflotran_para</span>: filename for ```parameter_prior.h5```
    - <span style="background-color:yellow">obs_resolution, obs_error, nens, spinup_length, spinup</span>: data assimilation settings (i.e., observation timestep, observation error, number of ensemble, whether it is spinup, **to be revised**)

In [10]:
%%script bash -s "$prep_pflotran_in" "$pflotran_in" "$pflotran_para" "$obs_resolution" "$obs_error" "$nens" "$spinup_length" 1
python $1 $2 $3 $4 $5 $6 $7 $8

Finished generating the input card for PFLOTRAN...
Finished generating the DBASE for PFLOTRAN...


In [11]:
%%script bash -s "$pflotran_in"
head $1

#Description: 1D thermal

SIMULATION
  SIMULATION_TYPE SUBSURFACE
  PROCESS_MODELS
    SUBSURFACE_FLOW FLOW
      MODE TH
#      OPTIONS
#	REVERT_PARAMETERS_ON_RESTART
#      /


<a id='pflotran_spinup'></a>
# PFLOTRAN model spin-up
Take in the ```pflotran.in``` and ```parameter.h5``` files and conduct the model spin-up by running ```pflotran.sh``` file. The ```pflotran.sh``` is a simple shell script executing ensemble simulation of PFLOTRAN by using MPI.

**Run the code**
- Run: ```prepare_pflotran_inpara.py```
- Code input arguments:
    - <span style="background-color:yellow">pflotran_exe</span>: location of the executable PFLOTRAN
    - <span style="background-color:yellow">pflotran_in</span>: filename for ```pflotran.in```
    - <span style="background-color:yellow">pflotran_out_dir</span>: directory of PFLOTRAN output
    - <span style="background-color:yellow">nens</span>: number of ensemble
    - <span style="background-color:yellow">mpi_exe, ncore</span>: location of mpirun and number of cpu cores

In [12]:
%%script bash -s "$pflotran_sh" "$pflotran_exe" "$pflotran_in" "$pflotran_in_dir" "$pflotran_out_dir" "$nens" "$mpi_exe" "$ncore_pf"
$1 $2 $3 $4 $5 $6 $7 $8

 here
          30
------------------------------ Provenance --------------------------------------
pflotran_compile_date_time = unknown
pflotran_compile_user = unknown
pflotran_compile_hostname = unknown
pflotran_changeset = unknown
pflotran_status = unknown
petsc_changeset = unknown
petsc_status = unknown
--------------------------------------------------------------------------------
 "grid_structured_type" set to default value.
 Opening hdf5 file: /Users/jian449/Codes/DART/manhattan/models/pflotran/applications/1dthermal/pflotran_input/parameter_prior.h5
 pflotran card:: TIMESTEPPER
 pflotran card:: TIMESTEPPER
 pflotran card:: NEWTON_SOLVER
 pflotran card:: LINEAR_SOLVER
 pflotran card:: NEWTON_SOLVER
 pflotran card:: LINEAR_SOLVER
 pflotran card:: GRID
 pflotran card:: FLUID_PROPERTY
 "FLUID_PROPERTY,diffusion_coeffient units" set to default value.
 pflotran card:: MATERIAL_PROPERTY
   Name :: Alluvium
 "MATERIAL_PROPERTY,rock density units" set to default value.
 "MATERIAL_PROPE

In [13]:
%%script bash -s "$pflotran_out_dir"
cd $1
ls *.h5

pflotranR1.h5
pflotranR10.h5
pflotranR11.h5
pflotranR12.h5
pflotranR13.h5
pflotranR14.h5
pflotranR15.h5
pflotranR16.h5
pflotranR17.h5
pflotranR18.h5
pflotranR19.h5
pflotranR2.h5
pflotranR20.h5
pflotranR21.h5
pflotranR22.h5
pflotranR23.h5
pflotranR24.h5
pflotranR25.h5
pflotranR26.h5
pflotranR27.h5
pflotranR28.h5
pflotranR29.h5
pflotranR3.h5
pflotranR30.h5
pflotranR4.h5
pflotranR5.h5
pflotranR6.h5
pflotranR7.h5
pflotranR8.h5
pflotranR9.h5


<a id='dart_prepare'></a>
# DART files preparation
In this section, the following procedures are performed:
- generate the template for DART generic variable quantity files (i.e., ```DEFAULT_obs_kind_mod.F90``` and ```obs_def_pflotran_mod.f90```);
- generate the DART input namelists;
- generate DART prior NetCDF data ```prior_ensemble_[ENS].nc``` from PFLOTRAN's parameter and outputs;
- generate DART posterior NetCDF files (*sharing the same variable names and dimensions as the prior NetCDF files but without the data values*);
- convert the observation file to DART observation format;
- check ```model_mod.F90``` based on current setting by using the ```check_model_mod``` provided by DART.

<a id='dart_generic_prepare'></a>
## Generate the templates for DART generic variable quantity files
- Run: ```list2dartqty.py``` to sequentially generate
    - a mapping between PFLOTRAN variales and DART generic quantities in ```obs_def_pflotran_mod.F90```
    - the default DART generic quantity definition file ```DEFAULT_obs_kind_mod.F90```
- Code input arguments:
    - <span style="background-color:yellow">obs_type</span>: filename for ```DEFAULT_obs_kind_mod.F90```
    - <span style="background-color:yellow">def_obs_kind</span>: filename for ```obs_def_pflotran_mod.F90```
    - <span style="background-color:yellow">pflotran_parastate_set</span>: a list of variables required to be assimilated

In [14]:
%%script bash -s "$to_dartqty" "$obs_type" "$def_obs_kind" "$pflotran_parastate_set"
python $1 $2 $3 $4

No new DART variable quantity is added...
Finished generating the /Users/jian449/Codes/DART/manhattan/models/pflotran/obs_kind/DEFAULT_obs_kind_mod.f90...
Finished generating the /Users/jian449/Codes/DART/manhattan/models/pflotran/applications/1dthermal/obs_type/obs_def_pflotran_mod.f90...


In [15]:
%%script bash -s "$obs_type"
cat $1

! BEGIN DART PREPROCESS KIND LIST
!TEMPERATURE,  QTY_PFLOTRAN_TEMPERATURE, COMMON_CODE
!FLOW_FLUX,  QTY_PFLOTRAN_FLOW_FLUX, COMMON_CODE
!POROSITY,  QTY_PFLOTRAN_POROSITY, COMMON_CODE
!THERMAL_CONDUCTIVITY,  QTY_PFLOTRAN_THERMAL_CONDUCTIVITY, COMMON_CODE
! END DART PREPROCESS KIND LIST


## Generate  DART input namelists in ```input.nml```

The ```input.nml``` file is generated based on a template ```input.nml.template``` by modifying the following namelist entries:

```input.nml.template``` $\rightarrow$ ```input.nml```

|filter_nml|obs_kind_nml|preprocess_nml|model_nml|convertnc_nml|
|:--:|:--:|:--:|:--:|:--:|
| input_state_file_list, output_state_file_list, ens_size, async, adv_ens_command, obs_sequence_in_name | assimilate_these_obs_types | input_files, input_obs_kind_mod_file | time_step_days, time_step_seconds, nvar, var_names, template_file, var_qtynames | netcdf_file, out_file |

**Namelists from DART**
- [filter_nml](https://www.image.ucar.edu/DAReS/DART/Manhattan/assimilation_code/modules/assimilation/filter_mod.html): namelist of the main module for driving ensemble filter assimilations
- [obs_kind_nml](https://www.image.ucar.edu/DAReS/DART/Manhattan/assimilation_code/modules/observations/obs_kind_mod.html#Namelist): namelist for controling what observation types are to be assimilated
- [preprocess_nml](https://www.image.ucar.edu/DAReS/Codes/DART/manhattan/assimilation_code/programs/preprocess/preprocess): namelist of the DART-supplied preprocessor program which creates observation kind and observation definition modules from a set of other specially formatted Fortran 90 files

**Self-defined namelists**
- model_nml: a self-defined namelist for providing the basic information in the model
    - time_step_days, time_step_seconds: the assimilation time window
    - template_file: the template prior NetCDF file for ```model_mod.F90``` to digest the spatial information of the model
    - var_names: the original variable names
    - var_qtynames: the corresponding DART variable quantities
    - nvar: the number of variables
- convertnc_nml: a self-defined namelist for providing the NetCDF observation file name and the DART observation file name used in ```convert_nc.f90```
    - netcdf_file: the location of the NetCDF file containing the observation data
    - out_file: the location of the DART observation file

**Note that**
- There are more namelists or other items in the above namelist in input.nml.template. Users can edit the below python dictionary ```inputnml``` to include their modifications.
- Users can also include more namelists provided by DART by modifying ```inputnml```.

***************
**Assemble all the namelists in input.nml**

In [31]:
# Parameters for different namelists in input.nml
filter_nml = {"input_state_file_list":dart_input_list,
              "output_state_file_list":dart_output_list,
              "ens_size":nens,
              "num_output_state_members":nens,
#               "async":2,
              "adv_ens_command":advance_model,
              "obs_sequence_in_name":obs_dart}
#               "obs_window_days":obs_window_days,
#               "obs_window_seconds":obs_window_seconds}
obs_kind_nml = {"assimilate_these_obs_types":obs_var_set}
model_nml = {"time_step_days":obs_window_days,
             "time_step_seconds":obs_window_seconds,
             "nvar":len(pflotran_parastate_set),
             "var_names":pflotran_parastate_set,
             "template_file":dart_prior_template,
             "var_qtynames":['QTY_PFLOTRAN_'+v for v in pflotran_parastate_set]}
preprocess_nml = {"input_files":obs_type,
                  "input_obs_kind_mod_file":def_obs_kind}
convertnc_nml = {"netcdf_file": obs_nc,
                 "out_file": obs_dart,
                 "obs_start_day": assim_start_day,
                 "obs_start_second": assim_start_second,
                 "obs_end_day": assim_end_day,
                 "obs_end_second":assim_end_second}
modelmodcheck_nml = {"input_state_files": dart_prior_template}
inputnml = {"filter_nml":filter_nml,
            "obs_kind_nml":obs_kind_nml,
            "model_nml":model_nml,
            "preprocess_nml":preprocess_nml,
            "convert_nc_nml":convertnc_nml,
            "model_mod_check_nml":modelmodcheck_nml}


configurations = {"inputnml":inputnml, "directories": directories}

# Save it in a temperory pickle file
with open(config_pickle, 'wb') as f:
    pickle.dump(configurations, f)

***************
**Run the code**
- Run: ```prepare_inputnml.py```
- Code input arguments:
    - <span style="background-color:yellow">input_nml</span>: the ```input.nml``` namelist file
    - <span style="background-color:yellow">input_nml_dict</span>: the ```inputnml.p``` pickle file

In [32]:
%%script bash -s  "$prep_inputnml" "$input_nml" "$input_nml_template" "$config_pickle"
python $1 $2 $3 $4

Finished generating the input namelist file...


## Convert the model output to DART prior NetCDF and  generate the preliminary DART posterior NetCDF file
- The structure of ```prior_ensemble_[ENS].nc``` and ```posterior_ensemble_[ENS].nc``` files (```[ENS]``` refers to the ensemble number):

| NetCDF dimensions |                      NetCDF variables                      |
|:-----------------:|:----------------------------------------------------------:|
| time: 1           | time: shape(time)                                          |
| x_location: nx    | x_location: shape(x_location)                              |
| y_location: ny    | y_location: shape(y_location)                              |
| z_location: nz    | z_location: shape(z_location)                              |
| member: 1         | member: shape(member)                                      |
|                   | physical variable: shape(x_location,y_location,z_location) |

**Note that** 
- required by DART, each ```prior_R[ENS].nc``` file only includes the state/parameter values of one ensemble member at one given time. 
- For the time, we set the initial time as 0, with time units converted *day* (requied by DART's ```read_model_time``` subroutine). 
- Also, it is different from the definition for the [observation NetCDF](#observationconvertion), because ```prior_R[ENS].nc``` aims for the structured cartesian grids while the observation NetCDF aims for a general case.

**Run the code**
- Run: ```prepare_prior_nc.py``` to generate 
    - the DART prior input file ```prior_ensemble_[ENS].nc```
    - the DART posterior output file ```prior_ensemble_[ENS].nc``` (*sharing the same variable names and dimensions as the prior files but without the variable values*)
    - the prior template file (copied from ```prior_ensemble_1.nc```) used by ```input.nml```
    - the dart_input_list and dart_output_list used by DART
- Code input arguments:
    - <span style="background-color:yellow">pflotran_out</span>: filename ```R[ENS].h5``` from PFLOTRAN model output
    - <span style="background-color:yellow">pflotran_para</span>: pflotran parameter HDF file ```parameter.h5```
    - <span style="background-color:yellow">dart_prior_nc</span>: filename ```prior_R[ENS].nc``` for the prior input file for DART
    - <span style="background-color:yellow">dart_input_list</span>: filename for recording the list of dart_prior_nc
    - <span style="background-color:yellow">nens</span>: number of ensemble
    - <span style="background-color:yellow">spinup</span>: whether it is spinup (if yes, the time is set to zero; otherwise, the time is read from ```R[ENS].h5```)
    - <span style="background-color:yellow">pflotran_parastate_set</span>: a list of variables to be assimilated

In [71]:
%%script bash -s "$prep_prior_nc" "$pflotran_out" "$pflotran_para" "$dart_prior_nc" "$dart_input_list" "$dart_posterior_nc" "$dart_output_list" "$dart_prior_template" "$nens" "$pflotran_parastate_set"
python $1 $2 $3 $4 $5 $6 $7 $8 $9 ${10}

Converting state/parameter into NetCDF file for ensemble 1...
Converting state/parameter into NetCDF file for ensemble 2...
Converting state/parameter into NetCDF file for ensemble 3...
Converting state/parameter into NetCDF file for ensemble 4...
Converting state/parameter into NetCDF file for ensemble 5...
Converting state/parameter into NetCDF file for ensemble 6...
Converting state/parameter into NetCDF file for ensemble 7...
Converting state/parameter into NetCDF file for ensemble 8...
Converting state/parameter into NetCDF file for ensemble 9...
Converting state/parameter into NetCDF file for ensemble 10...
Converting state/parameter into NetCDF file for ensemble 11...
Converting state/parameter into NetCDF file for ensemble 12...
Converting state/parameter into NetCDF file for ensemble 13...
Converting state/parameter into NetCDF file for ensemble 14...
Converting state/parameter into NetCDF file for ensemble 15...
Converting state/parameter into NetCDF file for ensemble 16...
C

In [19]:
%%script bash -s "$dart_prior_template"
ncdump -h $1

netcdf prior_template {
dimensions:
	x_location = 1 ;
	y_location = 1 ;
	z_location = 64 ;
	time = 1 ;
	member = 1 ;
variables:
	double time(time) ;
		time:units = "day" ;
		time:calendar = "none" ;
		time:type = "dimension_value" ;
	double member(member) ;
		member:type = "dimension_value" ;
	double x_location(x_location) ;
		x_location:units = "m" ;
		x_location:type = "dimension_value" ;
	double y_location(y_location) ;
		y_location:units = "m" ;
		y_location:type = "dimension_value" ;
	double z_location(z_location) ;
		z_location:units = "m" ;
		z_location:type = "dimension_value" ;
	double TEMPERATURE(z_location, y_location, x_location) ;
		TEMPERATURE:type = "observation_value" ;
		TEMPERATURE:unit = "[C]" ;
	double FLOW_FLUX(z_location, y_location, x_location) ;
		FLOW_FLUX:type = "observation_value" ;
		FLOW_FLUX:unit = "" ;
	double POROSITY(z_location, y_location, x_location) ;
		POROSITY:type = "observation_value" ;
		POROSITY:unit = "" ;
	double THERMAL_CONDUCTIVITY(z_locati

<a id='observationconvertion'></a>
## Prepare the observation conversion to DART observation format
In this section, we prepare the process of converting the observation data to DART format. We first convert observation data in raw format into NetCDF format. Then, a fortran script is prepared for the conversion from the NetCDF to to DART format. The structure of NetCDF file for recording observation file.

| NetCDF dimensions |           NetCDF variables          |
|:-----------------:|:-----------------------------------:|
| time: 1           | time: shape(time)                   |
| location: nloc    | location: shape(location)           |
|                   | physical variable: shape(time,nloc) |

**Note that** 
- if the time calendar follows *gregorian*, the time unit should be entered as ```seconds since YYYY-MM-DD HH:MM:SS```. Otherwise, put the time calender as *None* and time unit as ```second``` (make sure convert your measurement times to seconds).

***************
**Convert the raw csv temperature observations to NetCDF file**
- Run: ```csv2nc.py```
- Code input arguments:
    - <span style="background-color:yellow">obs_original</span>: filename for the original observed temperature file
    - <span style="background-color:yellow">obs_nc</span>: filename for the observation NetCDF file
    - <span style="background-color:yellow">assim_start_str</span>: the reference time to set zero

In [20]:
%%script bash -s "$csv_to_nc" "$obs_original" "$obs_nc" "$spinup_length" "$assim_start_str"
python $1 $2 $3 $4 $5

Finished converting raw observation in NetCDF format...


In [21]:
%%script bash -s "$obs_nc"
ncdump -v time $1

netcdf obs_pflotran {
dimensions:
	time = 8641 ;
	location = 5 ;
variables:
	double time(time) ;
		time:calendar = "None" ;
		time:units = "days" ;
		time:type = "dimension_value" ;
	double x_location(location) ;
		x_location:units = "m" ;
		x_location:type = "dimension_value" ;
	double y_location(location) ;
		y_location:units = "m" ;
		y_location:type = "dimension_value" ;
	double z_location(location) ;
		z_location:units = "m" ;
		z_location:type = "dimension_value" ;
	double TEMPERATURE(location, time) ;
		TEMPERATURE:_FillValue = -99999. ;
		TEMPERATURE:unit = "C" ;
		TEMPERATURE:type = "observation_value" ;
data:

 time = 0.5, 0.503472222222222, 0.506944444444444, 0.510416666666667, 
    0.513888888888889, 0.517361111111111, 0.520833333333333, 
    0.524305555555556, 0.527777777777778, 0.53125, 0.534722222222222, 
    0.538194444444444, 0.541666666666667, 0.545138888888889, 
    0.548611111111111, 0.552083333333333, 0.555555555555556, 
    0.559027777777778, 0.5625, 0.56597222222

***************
**Prepare the ```convert_nc.f90``` based on the list of observation variables**
- Run: ```prepare_convert_nc.py```
- Code input arguments:
    - <span style="background-color:yellow">obs_nc</span>: filename for the observation NetCDF file

In [22]:
%%script bash -s "$prep_convert_nc" "$obs_nc" "$convert_nc_file" "$convert_nc_template"
python $1 $2 $3 $4

In [23]:
%%script bash -s "$convert_nc_file"
head $1

! DART software - Copyright UCAR. This open source software is provided
! by UCAR, "as is", without charge, subject to all terms of use at
! http://www.image.ucar.edu/DAReS/DART/DART_download
!
! Revised from convert_madis_profiler.f90 written by Nancy Colin
! $Id: convert_nc.f90 2019-09-09 15:48:00Z peishi.jiang@pnnl.gov $

program convert_nc

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


<a id='dart_executables'></a>
# Generate all the executable files
Now, we compile all the executables from ```mkmf_*```. The following executables are generated here:
- ```preprocess```: for preprocessing the [prepared DART generic variable quantity files prepared](#dart_generic_prepare)
- ```convert_nc```: for [converting the observations from NetCDF to DART format](#observationconvertion)
- ```model_mod_check```: for checking ```model_mod.F90``` interface file
- ```filter```: for conducting the [DART data assimilation](https://www.image.ucar.edu/DAReS/DART/Manhattan/assimilation_code/programs/filter/filter.html)

In [46]:
compile_model_check_mod = os.path.join(dart_work_dir, 'check_model_mod.csh')  # Shell script for checking model_mod.F90 file

In [48]:
%%script bash -s "$dart_work_dir" "$compile_model_check_mod" "$app_work_dir"
cd $1
csh $2 $3

---------------------------------------------------------------
building model_mod_check
 Makefile is ready.
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/types_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/utilities_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/time_manager_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/null_mpi_utilities_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/sort_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/ensemble_manager_mod.f90

............................................

## Generate the executables
- Run: ```quickbuild.csh```
- Code input arguments:
    - <span style="background-color:yellow">app_work_dir</span>: location of the application work folder

In [62]:
%%script bash -s "$dart_work_dir" "$quickbuild" "$app_work_dir"
cd $1
csh $2 $3

---------------------------------------------------------------
Removing *.o *.mod files


---------------------------------------------------------------
PFLOTRAN build number 1 is preprocess
 Makefile is ready.
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/types_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/utilities_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/time_manager_mod.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/programs/preprocess/preprocess.f90
gfortran -O2 -ffree-line-length-none -I/usr/local/Cellar/netcdf/4.6.3_1/include  -c	../../../assimilation_code/modules/utilities/null_mpi_utilities_mod.f90
gfortran utilities_mod.o time_manager_mod.o p

rm: No match.
rm: No match.
.......rm: No match.
..................................rm: No match.
......................................................................................

## Convert the observation file in NetCDF to DART format
- Run: ```convert_nc```

In [25]:
%%script bash -s "$app_work_dir" "$convert_nc"
cd $1
$2


 --------------------------------------
 Starting ... at YYYY MM DD HH MM SS = 
                 2019  9 27  9  3 34
 Program convert_nc
 --------------------------------------

  set_nml_output No echo of NML values
  write_obs_seq  opening formatted observation sequence file "/Users/jian449/Codes/DART/manhattan/models/pflotran/applications/1dthermal/dart_inout/obs_seq_pflotran.out"

 --------------------------------------------------------
 -------------- ASSIMILATE_THESE_OBS_TYPES --------------
    TEMPERATURE
 --------------------------------------------------------
 -------------- EVALUATE_THESE_OBS_TYPES   --------------
    none
 --------------------------------------------------------
 ---------- USE_PRECOMPUTED_FO_OBS_TYPES   --------------
    none
 --------------------------------------------------------

  write_obs_seq  closed observation sequence file "/Users/jian449/Codes/DART/manhattan/models/pflotran/applications/1dthermal/dart_inout/obs_seq_pflotran.out"

 ---------

In [26]:
%%script bash -s "$obs_dart"
head -n20 $1

 obs_sequence
obs_kind_definitions
           1
           1 TEMPERATURE                                                     
  num_copies:            1  num_qc:            1
  num_obs:           15  max_num_obs:           15
observation                                                     
Data QC                                                         
  first:            1  last:           15
 OBS            1
   5.3399999999999999     
   1.0000000000000000     
          -1           2          -1
obdef
loc3Dxyz
     0.000000000000000         0.000000000000000       -0.1000000000000000E-01
kind
           1
 43200          0
   1.0000000000000000     


## Check ```model_mod.F90``` interface file
- Run: ```model_mod_check```

In [64]:
%%script bash -s "$app_work_dir" "$model_mod_check"
cd $1
$2


 --------------------------------------
 Starting ... at YYYY MM DD HH MM SS = 
                 2019  9 27 15 56  9
 Program model_mod_check
 --------------------------------------

  set_nml_output No echo of NML values
  initialize_mpi_utilities: Running single process


***************** RUNNING    TEST 0    ***********************
 -- Reading the model_mod namelist and implicitly running static_init_model
**************************************************************

 --------------------------------------------------------
 -------------- ASSIMILATE_THESE_OBS_TYPES --------------
    TEMPERATURE
 --------------------------------------------------------
 -------------- EVALUATE_THESE_OBS_TYPES   --------------
    none
 --------------------------------------------------------
 ---------- USE_PRECOMPUTED_FO_OBS_TYPES   --------------
    none
 --------------------------------------------------------



***************** FINISHED   TEST 0    ***********************
xxxxxxxxxxxxxxx

## Test ```filter```
- Run: ```filter```

In [70]:
%%script bash -s "$app_work_dir" "$filter_exe"
cd $1
$2


 --------------------------------------
 Starting ... at YYYY MM DD HH MM SS = 
                 2019  9 27 16 37 26
 Program Filter
 --------------------------------------

  set_nml_output No echo of NML values
  initialize_mpi_utilities: Running single process

 --------------------------------------------------------
 -------------- ASSIMILATE_THESE_OBS_TYPES --------------
    TEMPERATURE
 --------------------------------------------------------
 -------------- EVALUATE_THESE_OBS_TYPES   --------------
    none
 --------------------------------------------------------
 ---------- USE_PRECOMPUTED_FO_OBS_TYPES   --------------
    none
 --------------------------------------------------------

  quality_control_mod: Will reject obs with Data QC larger than    3
  quality_control_mod: No observation outlier threshold rejection will be done
  assim_tools_init: Selected filter type is Ensemble Kalman Filter (ENKF)
  assim_tools_init: The cutoff namelist value is     1000000.000000
  a

<a id='run_dart_pflotran'></a>
# (TODO) Run DART and PFLOTRAN
In this section, run the shell script to couple DART and PFLOTRAN

In [None]:
# %%script bash -s "$work_dir" "$run_filter"
# cd $1
# csh $2