Notebooks 01a and 01b are used together to get residence-time distribution (RTD) for the entire aquifer from an existing MODFLOW model. It is possible to read in any group or label from a 3D array and make RTDs for those groups. The approach is to 
* read an existing model
* create flux-weighted particle starting locations in every cell
* run MODPATH and read endpoints
* fit parametric distributions to endpoints

Notebook 01a (this notebook) creates flux-weighted particles and runs MODPATH.  Notebook 01b fits parametric distributions.

In [None]:
__author__ = 'Jeff Starn'
%matplotlib notebook

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('png', 'pdf')
from IPython.display import Image
from IPython.display import Math
from ipywidgets import interact, Dropdown
from IPython.display import display

import os
import sys
import shutil
import pickle
import numpy as np
import datetime as dt
import flopy as fp
import imeth
import pandas as pd

# Preliminary stuff

## User-defined variables

**IMPORTANT** 

The directory name where the model is has to be the same as the base name of the MODFLOW name file.

**Time specification**

MODFLOW and MODPATH use elapsed time and are not aware of calendar time. To place MODFLOW/MODPATH elapsed time on the calendar, two calendar dates are specified at the top of the notebook: the beginning of the first stress period (`mf_start_date`) and when particles are to be released (`mp_release_date`). The latter date could be used in many ways, for example to represent a sampling date, or it could be looped over to create a time-lapse set of ages. 

There are several time-related definitons used in MODPATH.
* `simulation time` is the elapsed time in model time units from the beginning of the first stress period
* `reference time` is an arbitrary value of `simulation time` that is between the beginning and ending of `simulation time`
* `tracking time` is the elapsed time relative to `reference time`. It is always positive regardless of whether particles are tracked forward or backward
* `release time` is when a particle is released and is specified in `tracking time`

Particles will be released on the date `mp_release_date_str` relative to the starting time `mf_start_date_str`. The latter date is arbitrary; it is used to set the calendar date for the beginning of the first stress period. The length of the first stress period, even though it is usualy steady state, should correspond to the length of time between `mf_start_date_str` and the start of the first transient stress period. The notebook will calculate `release time` in units of `tracking time` for the `mp_release_date`.

`layer_for_flow_calc` is an arbitrary layer number on which to divide the model domain for calculating RTDs. For example, in glacial aquifers it could represent the layer number of the bottom of unconsolidated deposits. In that case, anything below this layer could be considered bedrock.



## Loop through home directory to get list of name files

User input

In [None]:
# The directories in the following list will be searched for models
homes = ['../Models']
fig_dir = '../Figures'

mfpth = '../executables/MODFLOW-NWT_1.0.9/bin/MODFLOW-NWT_64.exe'
mp_exe_name = '../executables/modpath.6_0/bin/mp6x64.exe' 

mf_start_date_str = '01/01/1900' 
mp_release_date_str = '01/01/2018' 

# Read the zone array:
# use 'None' if there is no zone array
# the zone array should be either a comma-delimited file (.csv) with dimensions
# (nrow * ncol, nlay) or a compressed numpy array (.npz) with dimensions (nlay, nrow, ncol).
# The file with this name has to be in the model workspace directory defined later
zone_array_file = 'fz_zones_v2.csv'

# The total net flow across the layer boundary specified below will be calculated and reported.
layer_for_flow_calc = 3

# The zone number will be encoded in the MODPATH endpoint file under the "Label" variable.
# The RTD for all zones will be calculated if use_all_zones is True.
# You can also calculate RTDs groups of zones.
use_all_zones = True
use_groups_of_zones = True
number_of_particles_per_group = 1.0E+06

# If zones are to be grouped, put the zone numbers in a list inside a tuple.
# You can have more than one group.
# Zones can be part of more than one group and not all zones have to be used,
# e.g. zones_to_group = ([4, 5, 6], [41]).  If only one group is used, put a comma 
# after it, e.g. zones_to_group = ([41],)
zones_to_group = ([15],)

num_cells2budchk = 10

# weighting scheme
weight_scheme = 'flow'
# weight_scheme = 'volume'

por = 0.20

dir_list = []
mod_list = []
i = 0

for home in homes:
    if os.path.exists(home):
        for dirpath, dirnames, filenames in os.walk(home):
            for f in filenames:
                if os.path.splitext(f)[-1] == '.nam':
                    mod = os.path.splitext(f)[0]
                    mod_list.append(mod)
                    dir_list.append(dirpath)
                    i += 1
print('    {} models read'.format(i))

model_area = Dropdown(
    options=mod_list,
    description='Model:',
    background_color='cyan',
    border_color='black',
    border_width=2)
display(model_area)

with open('dir_list.txt', 'w') as f:
    for i in dir_list:
        f.write('{}\n'.format(i))

##  Create names and path for model workspace. 

The procedures in this notebook can be run from the notebook or from a batch file by downloading the notebook as a Python script and uncommenting the following code and commenting out the following block. The remainder of the script has to be indented to be included in the loop.  This may require familiarity with Python. 

In [None]:
# for pth in dir_list:
#     model = os.path.normpath(pth).split(os.sep)[2]
#     model_ws = [item for item in dir_list if model in item][0]
#     nam_file = '{}.nam'.format(model)
#     print("working model is {}".format(model_ws))

In [None]:
model = model_area.value
model_ws = [item for item in dir_list if model in item][0]
nam_file = '{}.nam'.format(model)
print("working model is {}".format(model_ws))

# Load an existing model

In [None]:
print ('Reading model information')

fpmg = fp.modflow.Modflow.load(nam_file, model_ws=model_ws, exe_name=mfpth, version='mfnwt', 
                               load_only=['DIS', 'BAS6', 'UPW', 'OC'], check=False)

dis = fpmg.get_package('DIS')
bas = fpmg.get_package('BAS6')
upw = fpmg.get_package('UPW')
oc = fpmg.get_package('OC')

delr = dis.delr
delc = dis.delc
nlay = dis.nlay
nrow = dis.nrow
ncol = dis.ncol
bot = dis.getbotm()
top = dis.gettop()

hnoflo = bas.hnoflo
ibound = np.asarray(bas.ibound.get_value())
hdry = upw.hdry

print ('   ... done') 

FloPy loads MODFLOW packages but not their name-file unit numbers, so these have to be read separately.

In [None]:
src = os.path.join(model_ws, fpmg.namefile)
name_file_df = pd.read_table(src, header=None, comment='#', delim_whitespace=True, 
              names=['package', 'unit', 'filename', 'type'])

name_file_df['package'] = name_file_df.package.str.lower()
name_file_df.set_index('unit', inplace=True)

head_file_name = name_file_df.loc[oc.iuhead, 'filename']
bud_file_name = name_file_df.loc[oc.get_budgetunit(), 'filename']

## Read head and budget  file headers
The head file is used limit particle placement to the saturated part of each cell and to identify dry cells. Is it also use to define time increments in the simulation.

In [None]:
src = os.path.join(model_ws, head_file_name)
hd_obj = fp.utils.HeadFile(src)
head_df = pd.DataFrame(hd_obj.recordarray)

src = os.path.join(model_ws, bud_file_name)
bud_obj = fp.utils.CellBudgetFile(src)
all_bud_df = pd.DataFrame(bud_obj.recordarray)

# convert to zero base
all_bud_df['kper'] -= 1
all_bud_df['kstp'] -= 1
head_df['kper'] -= 1
head_df['kstp'] -= 1

## Calculation of time in MODFLOW/MODPATH

## Identify time step and stress period for particle release

* read all stress periods and time steps that were preserved in the budget file
* find the largest (latest) stress period and time step that include the mp_release_date
* make a subset of all the budget records from the specified period and step

In [None]:
# Create dictionary of multipliers for converting model time units to days
time_dict = dict()
time_dict[0] = 1.0 # undefined assumes days
time_dict[1] = 24 * 60 * 60
time_dict[2] = 24 * 60
time_dict[3] = 24
time_dict[4] = 1.0
time_dict[5] = 1.0

# convert string representation of dates into Python datetime objects
mf_start_date = dt.datetime.strptime(mf_start_date_str , '%m/%d/%Y')
mp_release_date = dt.datetime.strptime(mp_release_date_str , '%m/%d/%Y')

# check to make sure they are valid
assert mf_start_date < mp_release_date, 'The particle release date has \
to be after the start of the MODFLOW simulation'

# group by period and step
kdf = all_bud_df.groupby(['kper', 'kstp']).median()
kdf = kdf[['pertim', 'totim']]

# make a datetime series for timesteps starting with 0
# totim is elapsed time in simulation time
end_date = mf_start_date + pd.to_timedelta(np.append(0, kdf.totim), unit='days')
end_date = end_date.map(lambda t: t.strftime('%Y-%m-%d %H:%M'))
kdf.loc[:, 'start_date'] = end_date[0:-1]
kdf.loc[:, 'end_date'] = end_date[1:]

# make a datetime series for timesteps starting with 0
# totim is elapsed time in simulation time
# reformat the dates to get rid of seconds
end_date = mf_start_date + pd.to_timedelta(np.append(0, kdf.totim), unit='days')
kdf.loc[:, 'start_date'] = end_date[0:-1].map(lambda t: t.strftime('%Y-%m-%d %H:%M'))
kdf.loc[:, 'end_date'] = end_date[1:].map(lambda t: t.strftime('%Y-%m-%d %H:%M'))

# reference time and date are set to the end of the last stress period
ref_time = kdf.totim.max()
ref_date = end_date.max()

# release time is calculated in tracking time (for particle release) and 
# in simulation time (for identifying head and budget components)
release_time_trk = np.abs((ref_date - mp_release_date).days)
release_time_sim = (mp_release_date - mf_start_date).days

# find the latest group index that includes the release date
idx = (kdf.totim >= release_time_sim).idxmax()
kdf.loc[idx, 'particle_release'] = True

# switch period and step 
kstpkper = (idx[1], idx[0])

assert ref_date > mp_release_date, 'The reference date has \
to be after the particle release'

dst = os.path.join(model_ws, 'time_summary.csv')
kdf.to_csv(dst)

## Read heads

In [None]:
heads = hd_obj.get_data(kstpkper=kstpkper)

## Calculate saturated thickness and volume for each cell.
* create 3D model cell boundary grid
* saturated top in cell is minimum of head or cell top
* saturated thickness is the distance between the saturated top and cell bottom
* if the cell is dry or inactive, the saturated thickness is zero


In [None]:
# create a 3D array of layer boundaries
grid = np.zeros((nlay+1, nrow, ncol))
grid[0, :, :] = top
grid[1:, :, :] = bot

# tmp is the minimum of the head and cell top 
tmp = np.minimum(heads, grid[:-1, :, :])

# the saturated thickness is first estimated to be the difference between tmp and the cell bottom
sat_thk_cell = (tmp - grid[1:, :, :]) 

# sat_thk_cell < 0 means the head is below the bottom of the cell; these are set to zero
sat_thk_cell[sat_thk_cell < 0] = 0

## Calculate the mean exponential age by zone

Based on simulated recharge volumteric rate and simulated aquifer volume. 

## Read zone array to use as particle label

In [None]:
zone_array_src = os.path.join(model_ws, zone_array_file)

if 'None' not in zone_array_src:
    ext = os.path.splitext(zone_array_src)[1].lower()
    if ext == '.csv':
        zones = pd.read_csv(zone_array_src, header=0)
        zones = zones.unstack().values.reshape(nlay, nrow, ncol)
        print('Zones read from csv file')
    elif (ext == '.zon') | (ext == '.zone'):
        # option to read zones from a MODFLOW zone package file not implemented
        print('Zones read from MODFLOW zone array')
        pass 
    elif ext == '.npz':
        print('Zones read from compressed numpy array object')
        np.save()
        d = np.load(zone_array_src)
        if len(d.items()) != 1:
            print('There should only be one item type in the npz file and ')
            print('it should be an array dimensioned as (nlay, nrow, ncol)')
        else:
            zones = d.items()[0][1]
else:
    print('No zone information read')
    zones = np.ones((nlay * nrow * ncol))

# make a data frame to store group id 
zones = zones.ravel()
zone_df = pd.DataFrame(index=np.arange(zones.shape[0]))

group_column_list = list()
zone_list_arr = np.unique(zones)

if use_all_zones:
    group_column_list.append('all_zones')
    zone_df.loc[:, 'all_zones'] = True
    num_of_zones = zone_list_arr.shape[0]
else:
    num_of_zones = 0

if use_groups_of_zones:
    for zone_group in zones_to_group:
        gp_label = 'zones{}'.format(zone_group)
        gp_label = gp_label.replace(', ', '_')
        group_column_list.append(gp_label)
        zone_df.loc[:, gp_label] = False
        for zon in zone_group:
            zone_df.loc[zones.ravel() == zon, gp_label] = True
    num_of_groups = len(zones_to_group)
else:
    num_of_groups = 0
    
dst = os.path.join(model_ws, 'zone_df.csv')
zone_df.to_csv(dst)

# Make MODPATH input files and run MODPATH

## Calculate inflow into each cell
The number of particles in each cell is in proportion to the flux into the cell. Particle locations within a cell are generated randomly. Number of particles per cell is proportional to the flow into the cell such that the total number of particles = `t_num_parts`, in this case 2 million. 

MODFLOW includes a variable called `imeth` in the budget file. `imeth` is used to specify the format in which the budget data are stored. Functions for reading `imeth` for each of the data formats are defined in the module imeth.py.

In [None]:
# extract the budget records for the specified period and step
bud_df = all_bud_df.query('kstp=={} and kper=={}'.format(*kstpkper)).copy()

bud_df.loc[:, 'per_num'] = bud_df.totim.factorize()[0]
num_rec = bud_df.shape[0]

flow_times = bud_df.totim.unique()
nt = bud_df.per_num.nunique()

rxc = dis.nrow * dis.ncol
nn = dis.nlay * rxc

im = imeth.imeth(nlay, nrow, ncol)

qx1 = np.zeros((nt, nn))
qx2 = np.zeros_like(qx1)
qy1 = np.zeros_like(qx1)
qy2 = np.zeros_like(qx1)
qz1 = np.zeros_like(qx1)
qz2 = np.zeros_like(qx1)
storage = np.zeros_like(qx1)

bound_flow = np.zeros((nn, 7))
int_flow_right = np.zeros((nn))
int_flow_left = np.zeros((nn))
int_flow_front = np.zeros((nn))
int_flow_back = np.zeros((nn))
int_flow_lower = np.zeros((nn))
int_flow_top = np.zeros((nn))

for i, rec in bud_df.iterrows():

    BUFF = bud_obj.get_record(i)
    
    internal_flow_list = [b'   CONSTANT HEAD', b'FLOW RIGHT FACE ', b'FLOW FRONT FACE ', b'FLOW LOWER FACE ', b'STORAGE']

    if rec.text in internal_flow_list:
        if b'   CONSTANT HEAD' in rec.text:
            bound_flow += im.imeth2(BUFF)
        elif b'FLOW RIGHT FACE ' in rec.text:
            int_flow_right = im.imeth1(BUFF)
            int_flow_left = np.roll(int_flow_right, 1)
        elif b'FLOW FRONT FACE ' in rec.text:
            int_flow_front = im.imeth1(BUFF)
            int_flow_back = np.roll(int_flow_front, ncol)
        elif b'FLOW LOWER FACE ' in rec.text:
            int_flow_lower = im.imeth1(BUFF)
            int_flow_top = np.roll(int_flow_lower, rxc)
        elif b'STORAGE' in rec.text:
            bound_flow[: , 0] += im.imeth1(BUFF)
        else:
            print('Unrecognized budget type')

    if rec.text not in internal_flow_list:
        if rec.imeth == 1:
            bound_flow[:, 0] += im.imeth1(BUFF)
        elif rec.imeth == 2:
            bound_flow[:, 0] += im.imeth2(BUFF)
        elif rec.imeth == 3:
            bound_flow += im.imeth3(BUFF)
        elif rec.imeth == 4:
            bound_flow += im.imeth4(BUFF)
        elif rec.imeth == 5:
            bound_flow += im.imeth5(BUFF)
        else:
            print('Unrecognized budget type')

    storage[rec.per_num , :] += bound_flow[:, 0]

    qx1[rec.per_num , :] = int_flow_left + bound_flow[:, 1]
    qx2[rec.per_num , :] = int_flow_right - bound_flow[:, 2]

    qy1[rec.per_num , :] = -int_flow_front + bound_flow[:, 3]
    qy2[rec.per_num , :] = -int_flow_back - bound_flow[:, 4]

    qz1[rec.per_num , :] = -int_flow_lower + bound_flow[:, 5]
    qz2[rec.per_num , :] = -int_flow_top - bound_flow[:, 6]

qin1 = np.where(qx1 > 0, qx1, 0)
qin2 = np.where(qx2 < 0, -qx2, 0)
qin3 = np.where(qy1 > 0, qy1, 0)
qin4 = np.where(qy2 < 0, -qy2, 0)
qin5 = np.where(qz1 > 0, qz1, 0)
qin6 = np.where(qz2 < 0, -qz2, 0)

flow_sum = np.sum((qin1, qin2, qin3, qin4, qin5, qin6), axis=0)

# set the flow to zero for cells that went dry during the simulation and also for isolated cells
flow_sum[0, heads.ravel() == hdry] = 0
# flow_sum[heads.ravel() > 1.E+29] = 0

print ('   ... done'  )

## Extract specified flows from MODFLOW budget

Get recharge at top model surface (recharge package only at this time) and flow across the bottom of of the layer in `layer_for_flow_calc`. 

In [None]:
if b'        RECHARGE' in bud_df.text.values:
    # probably should make this so that recharge is summed from the highest active cell
    rch = bud_obj.get_data(text='RECHARGE', kstpkper=kstpkper, full3D=True)
    recharge_vol = rch[0].sum()  
else:
    print('no recharge')
# This assumes all recharge is from RCH package. Should add a check for UZF package & concat them

The next cell can be used to write all the budget components for the whole model into a dataframe that will be stored as a csv file.  The file may be quite large for big models, so the cell is commented out by default.

In [None]:
tmp_df = pd.DataFrame()

for i, j in bud_df.iterrows():
    tmp_df[j.text] = bud_obj.get_data(text=j.text, kstpkper=kstpkper, full3D=True)[0].ravel()
    
dst = os.path.join(model_ws, 'all_cells_budget.csv')
tmp_df.to_csv(dst)

In [None]:
if layer_for_flow_calc <= nlay:  
    flf = bud_obj.get_data(text='FLOW LOWER FACE', kstpkper=kstpkper, full3D=True)
    flow_below_layer = flf[0][layer_for_flow_calc - 1, :, :]
    total_bedrock_recharge_vol = flow_below_layer[flow_below_layer > 0].sum()
    total_bedrock_recharge_frac = total_bedrock_recharge_vol / recharge_vol
else:
    print('invalid layer_for_flow_calc')

In [None]:
# create grid cell dimension arrays
delc_ar, dum, delr_ar = np.meshgrid(delc, np.arange(nlay), delr)

# saturated volume of each cell for the current zone
sat_vol_cell = sat_thk_cell * delc_ar * delr_ar

tau_df = pd.DataFrame(index=group_column_list)
dst = os.path.join(model_ws, 'tau.txt')
for group in zone_df:
    
    sat_vol = sat_vol_cell.ravel()[zone_df[group]].sum()
    recharge = flow_sum[0, zone_df[group]].sum()

    tau_df.loc[group, 'sat_vol'] = sat_vol
    tau_df.loc[group, 'recharge'] = recharge
    tau_df.loc[group, 'tau'] = sat_vol * por / recharge /  365.25
    tau_df.loc[group, 'rech_volume'] = total_bedrock_recharge_vol
    tau_df.loc[group, 'rech_fraction'] = total_bedrock_recharge_frac
    
with open(dst, 'w') as f:
    tau_df.to_csv(f)

## Repeat cell coordinates for the number of particles

## Generate random particle placement in the saturated part of each cell

MODPATH wants particle locations as the layer, row, column (which we now have) plus the relative cell coordinates within each cell over (0, 1). In this application relative cell coordinates are generated randomly. In convertible cells (i.e., in a water table layer) MODPATH treats the water table as the top of the cell. In other words, the water table is at 1.0 in relative cell coordinates. In non-convertible cells, the top of the cell is at 1.0. Thus, in partially saturated cells, the random particle location is scaled to the saturated thickness. 

In [None]:
# specify seed for random number generator--could be any integer value
prng = np.random.RandomState(2909591)
bud_chk_dict = dict()

for group in zone_df:
    
    # switch the commenting of the following 2 lines to switch from volume weighted to flow weighted 
    if weight_scheme == 'flow':
        weight = flow_sum[0, zone_df[group]]
        weight_label = 'flux'
    elif weight_scheme == 'volume':
        weight = sat_vol_cell.ravel()[zone_df[group]]
        weight_label = 'volume'

    f = number_of_particles_per_group / weight.sum()

    parts_per_cell = np.rint( weight * f ).astype( np.int32 )

    l, r, c = np.indices(( nlay, nrow, ncol ))
    l= l.ravel()[zone_df[group]]
    r= r.ravel()[zone_df[group]]
    c= c.ravel()[zone_df[group]]
    label = zones[zone_df[group]]

    lrep = np.repeat( l, parts_per_cell.ravel() )
    rrep = np.repeat( r, parts_per_cell.ravel() )
    crep = np.repeat( c, parts_per_cell.ravel() )
    label = np.repeat( label, parts_per_cell.ravel() )
    num_parts = lrep.shape[0]
    
    # generate random relative coordinates within a cell in 3D
    cell_coords = prng.rand( num_parts, 3 )
    
    grp = 1

    particles = np.zeros( ( num_parts, 11 ) )
    particles[:, 0] = np.arange( 1, num_parts + 1 )
    particles[:, 1] = grp
    particles[:, 2] = 1
    particles[:, 3] = lrep + 1
    particles[:, 4] = rrep + 1
    particles[:, 5] = crep + 1
    particles[:, 6:9] = cell_coords
    particles[:, 9] = release_time_trk
    particles[:, 10] = label
    
    print('Write starting locations for {}'.format(group))

    line = '{:5d}\n{:5d}\n'.format(1, 1)
    line = line + 'group_{}\n'.format(1)
    npart = particles.shape[0]
    line = line + '{:6d}'.format(npart)
    dst_pth = os.path.join(model_ws, '{}_{}_{}.loc'.format(fpmg.name, weight_label, group))
    form = '%6d %6d %3d %3d %3d %3d %12.9f %12.9f %12.9f %12.9e %15.3f'
    np.savetxt(dst_pth, particles, delimiter=' ', fmt=form, header=line, comments='')

    print('   Total particles in "{}" is {}'.format(group, num_parts))
    print('   Min particles per cell in {} = {:10.0f}'.format(group, parts_per_cell.min()))
    print('   Mean particles per cell in {} = {:10.0f}'.format(group, parts_per_cell.mean()))
    print('   Max particles per cell in {} = {:10.0f}'.format(group, parts_per_cell.max()))
    a, b = np.histogram(particles[:, 3], bins= np.arange(1, nlay+2))
    tdf = pd.DataFrame({'Number of particles' : a}, index=pd.Index(b[:-1], name='Layer') )
    print(tdf)
    
    print ('   ... done') 
    
    A = (particles[:, 3:6] - 1)
    A = A[prng.choice(A.shape[0], num_cells2budchk, replace=False), :]
    budchk = np.ones((num_cells2budchk, 4))
    budchk[:, 1:] = A
    budchk = budchk.astype(np.int32())
        
    def seq(item):
        return item[1] * nrow * ncol + item[2] * ncol + item[3] 
    t_df = pd.DataFrame(budchk, columns=('Grid', 'Layer', 'Row', 'Column'))
    t_df['seqnum'] = np.array([seq(item) for item in budchk])
    t_df['flow_sum'] = flow_sum[0, t_df['seqnum']]
    t_df['qz2'] = qz2[0, t_df['seqnum']]

    bud_chk_dict[group] = t_df

# Run MODPATH and read endpoint information

## Get random cells to check budget computations
Select 10 random active cells to check cell budget

## Run MODPATH

In [None]:
print('   Write and run MODPATH')

# prepare Modpath files   
SimulationType = 1              # 1 endpoint; 2 pathline; 3 timeseries
TrackingDirection = 2           # 1 forward; 2 backward
WeakSinkOption = 1              # 1 pass; 2 stop
WeakSourceOption = 1            # 1 pass; 2 stop
ReferemceTimeOption = 1         # 1 time value; 2 stress period, time step, relative offset
StopOption = 2                  # 1 stop with simulation 2; extend if steady state 3; specify time
ParticleGenerationOption = 2    # 1 automatic; 2 external file
TimePointOption = 1             # 1 none; 2 number at fixed intervals; 3 array
BudgetOutputOption = 3          # 1 none; 2 summary; 3 list of cells; 4 trace mode
ZoneArrayOption = 1             # 1 none; 2 read zone array(s) 
RetardationOption = 1           # 1 none; 2 read array(s) 
AdvectiveObservationsOption = 1 # 1 none; 2 saved for all time pts 3; saved for final time pt

options = [SimulationType, TrackingDirection, WeakSinkOption, WeakSourceOption, ReferemceTimeOption, 
           StopOption, ParticleGenerationOption, TimePointOption, BudgetOutputOption, ZoneArrayOption, 
           RetardationOption, AdvectiveObservationsOption]

for group in zone_df:
    print('   Write and run MODPATH for {}'.format(group))
    
    mpname = '{}_{}_{}'.format(fpmg.name, weight_label, group)
    mpnf = '{}_{}_{}.mpnam'.format(fpmg.name, weight_label, group)
    mplf = '{}_{}_{}.mplst'.format(fpmg.name, weight_label, group)

    mp = fp.modpath.Modpath(modelname=mpname, modflowmodel=fpmg, dis_file=dis.file_name[0], exe_name=mp_exe_name,
                            model_ws=model_ws, simfile_ext='mpsim', dis_unit=dis.unit_number[0])

    mpsim = fp.modpath.ModpathSim(mp, mp_name_file=mpnf, 
                                  mp_list_file=mplf, 
                                  option_flags=options,
                                  ref_time=ref_time,
                                  cell_bd_ct=10, 
                                  bud_loc=bud_chk_dict[group].loc[:, ('Grid', 'Layer', 'Row', 'Column')].values.tolist(),
                                  extension='mpsim')

    mpbas = fp.modpath.ModpathBas(mp, hnoflo=hnoflo, hdry=hdry, 
                                  def_face_ct=1, bud_label=['RECHARGE'], def_iface=[6], 
                                  laytyp=upw.laytyp.get_value(), ibound=ibound, 
                                  prsity=por, prsityCB=0.20)    

    mp.write_input()
    success, msg = mp.run_model(silent=False, report=True)

    # delete starting locations to save space--this information is now in the endpoint file
#     if success:
#         dst_pth = os.path.join(model_ws, '{}_{}_{}.loc'.format(fpmg.name, weight_label, group))
#         os.remove(dst_pth)

print ('   ... done')

## Check budget
Compare the calculated composite budget in the notebook to the cell budget output from MODPATH. We assume MODPATH computes the budget items correctly.  If the notebook and MODPATH don't match, it is likely that the model uses a stress package that has not been tested with the notebook and that applies stress in a unique way.  

In [None]:
for group in zone_df:
    
    print('Checking budget for {}'.format(group))
    budchk_df = bud_chk_dict[group]
    
    with open(os.path.join(model_ws, '{}_{}_{}.mplst'.format(fpmg.name, weight_label, group)), 'r') as f:
        lines = f.readlines()

    for n, item in enumerate(lines):
        if (('Processing Time Step' in item) & (str(kstpkper[1] + 1) in item[34:40])):
            begin = n
            end = len(lines)
        elif (('Processing Time Step' in item) & (str(kstpkper[1]) in item[34:40])):
            end = n

    fl = []
    re = []
    for i in lines[begin:end]:
        if 'FLOW IN' in i:
            fl.append(np.float32(i[33:52]))
        if  'QZ2' in i:
            re.append(np.float32(i[48:62]))

    for i in range(num_cells2budchk):
        print('budget comparison for zero-based cell {}'.format(budchk[i]))

        print('   total in from notebook = {:10.4f}'.format(budchk_df.loc[i, 'flow_sum']))
        print('   total in from modflow  = {:10.4f}'.format(fl[i+1]))
        print('   net notebook upper boundary flow = {:10.4f}'.format(budchk_df.loc[i, 'qz2']))
        print('   net modflow  upper boundary flow = {:10.4f}'.format(re[i+1]))
        print()