# Run and process the prior monte carlo and pick a "truth" realization

A great advantage of exploring a synthetic model is that we can enforce a "truth" and then evaluate how our various attempts to estimate it perform. One way to do this is to run a monte carlo ensemble of multiple parameter realizations and then choose one of them to represent the "truth". That will be accomplished in this notebook.

In [None]:
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.rcParams['font.size']=12
import flopy
import pyemu
%matplotlib inline

## SUPER IMPORTANT: SET HOW MANY PARALLEL WORKERS TO USE

In [None]:
num_workers = 20

### set the `t_d` or "template directory" variable to point at the template folder and read in the PEST control file

In [None]:
t_d = "template"
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))

In [None]:
pst.npar_adj

Load the previously generated parameter ensemble and inspect (again!)...

In [None]:
pe = pyemu.ParameterEnsemble.from_binary(pst=pst,filename=os.path.join(t_d,"prior.jcb"))
#pe.loc[:,should_fix] = 1.0
pe.to_csv(os.path.join(t_d,"sweep_in.csv"))
pe.shape

In [None]:
pe.loc[:,"hk031"]

In [None]:
pe.loc[:,"hk031"].plot.hist(bins=50)

look! hk is log-normal-ish

Lets run the first realization through the pest interface for a test:

In [None]:
# replace the par vals with the first row in the par ensemble
pst.parameter_data.loc[pe.columns,"parval1"] = pe.iloc[0,:]
pst.control_data.noptmax = 0
pst.write(os.path.join(t_d,"test.pst"))
pyemu.os_utils.run("pestpp-ies test.pst",cwd=t_d)
res = pyemu.pst_utils.read_resfile(os.path.join(t_d,"test.base.rei"))
res.loc[pst.nnz_obs_names,:]

### run the prior ensemble in parallel locally
This takes advantage of the program `pestpp-swp` which runs a parameter sweep through a set of parameters. By default, `pestpp-swp` reads in the ensemble from a file called `sweep_in.csv` which in this case we made just above.

In [None]:
m_d = "master_prior_sweep"
pyemu.os_utils.start_slaves(t_d,"pestpp-swp","freyberg.pst",num_slaves=num_workers,slave_root=".",master_dir=m_d)

### Load the output ensemble and plot a few things



In [None]:
obs_df = pd.read_csv(os.path.join(m_d,"sweep_out.csv"),index_col=0)
print('number of realization in the ensemble before dropping: ' + str(obs_df.shape[0]))

### drop any failed runs 

In [None]:
obs_df = obs_df.loc[obs_df.failed_flag==0,:]
print('number of realization in the ensemble **after** dropping: ' + str(obs_df.shape[0]))

In [None]:
obs_df.iloc[0,:]

### confirm which quantities were identified as forecasts

In [None]:
fnames = pst.pestpp_options["forecasts"].split(',')
fnames

### now we can plot the distributions of each forecast

In [None]:
for forecast in fnames:
    plt.figure()
    ax = obs_df.loc[:,forecast].plot(kind="hist")
    ax.set_title(forecast)

We see that under scenario conditions, many more realizations for the flow to the aquifer in the headwaters are postive (as expected).  Lets difference these two:

In [None]:
sfnames = [f for f in fnames if "1980" in f or "_001" in f]
hfnames = [f for f in fnames if "1979" in f or "_000" in f]
diff = obs_df.loc[:,hfnames].values - obs_df.loc[:,sfnames].values
diff = pd.DataFrame(diff,columns=sfnames)
diff.hist(figsize=(10,10))

We now see that the most extreme scenario yields a large decrease in flow from the aquifer to the headwaters (the most negative value).  

### Many modeling analyses could stop right here to avoid the ill-effects of history matching...

### setting the "truth"

We just need to replace the observed values (`obsval`) in the control file with the outputs for one of the realizations in `obs_df` that we consider to be the ``truth``.  In this way, we now have the nonzero values for history matching, but also the ``truth`` values for comparing how we are doing with other unobserved quantities.  I'm going to pick a realization that yields an "average" variability of the observed gw levels:

In [None]:
fnames

In [None]:
sorted_vals = obs_df.loc[:,"fa_tw_19791230"].sort_values()
sorted_vals

In [None]:
idx = sorted_vals.index[100]
idx  # candidate truth realization index

What do the outputs corresponding to available observations and forecasts for this realization look like?

In [None]:
obs_df.loc[idx,pst.nnz_obs_names]

Lets see how our selected truth does with the sw/gw forecasts:

In [None]:
obs_df.loc[idx,fnames]

### Weights!!!
Assign some initial weights. Now, it is custom to add noise to the observed values...we will use the classic Gaussian noise...zero mean and standard deviation of 1 over the weight (which we will now specify).  We will speak more about noise and its sources shortly...

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
obs.loc[:,"obsval"] = obs_df.loc[idx,pst.obs_names]
obs.loc[obs.obgnme=="calhead","weight"] = 5.0  # this corresponds to an (expected) noise standard deviation of 20 cm...
obs.loc[obs.obgnme=="calflux","weight"] = 0.01  # corresponding to an (expected) noise standard deviation of 100 m^3/d...
obs.loc[pst.nnz_obs_names,"weight"]

here we just get a sample from a random normal distribution with mean=0 and std=1.
The argument indicates how many samples we want - and we choose `pst.nnz_obs` which is the 
the number of nonzero-weighted observations in the PST file

In [None]:
np.random.seed(seed=0)
snd = np.random.randn(pst.nnz_obs)
noise = snd * 1./obs.loc[pst.nnz_obs_names,"weight"]
pst.observation_data.loc[noise.index,"obsval"] += noise
noise

Then we write this out to a new file and run `pestpp-ies` to see how the objective function looks

In [None]:
pst.write(os.path.join(t_d,"freyberg.pst"))
pyemu.os_utils.run("pestpp-ies freyberg.pst",cwd=t_d)

Now we can read in the results and make some figures showing residuals and the balance of the objective function

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
print(pst.phi)
plt.figure()
pst.plot(kind='phi_pie');
print('Here are the non-zero weighted observation contributions to phi')

figs = pst.plot(kind="1to1");
pst.res.loc[pst.nnz_obs_names,:]
plt.show()

### run the "truth" model once and inspect...

In [None]:
par_df = pd.read_csv(os.path.join(m_d,"sweep_in.csv"),index_col=0)
pst.parameter_data.loc[:,"parval1"] = par_df.loc[idx,pst.par_names]
pst.write(os.path.join(m_d,"test.pst"))

we will run this with `noptmax=0` to perform a single run.

In [None]:
pyemu.os_utils.run("pestpp-ies.exe test.pst",cwd=m_d)
pst = pyemu.Pst(os.path.join(m_d,"test.pst"))
print(pst.phi)
pst.res.loc[pst.nnz_obs_names,:]

The residual should be exactly the noise values from above. Lets load the model (that was just run using the true pars) and check some things

In [None]:
m = flopy.modflow.Modflow.load("freyberg.nam",model_ws=m_d)

In [None]:
a = m.rch.rech[0].array
#a = m.rch.rech[0].array
a = np.ma.masked_where(m.bas6.ibound[0].array==0,a)
print(a.min(),a.max())
c = plt.imshow(a)
plt.colorbar()

In [None]:
lst = flopy.utils.MfListBudget(os.path.join(m_d,"freyberg.list"))
df = lst.get_dataframes(diff=True)[0]
ax = df.plot(kind="bar",figsize=(10,10), grid=True)
a = ax.set_xticklabels(["historic","scenario"],rotation=90)

### see how our existing observation ensemble compares to the truth

forecasts:

In [None]:
obs = pst.observation_data
plt.figure()
for forecast in fnames:
    ax = plt.subplot(111)
    obs_df.loc[:,forecast].hist(ax=ax,color="0.5",alpha=0.5)
    ax.plot([obs.loc[forecast,"obsval"],obs.loc[forecast,"obsval"]],ax.get_ylim(),"r")
    ax.set_title(forecast)
    plt.show()

observations:

In [None]:
for oname in pst.nnz_obs_names:
    ax = plt.subplot(111)
    obs_df.loc[:,oname].hist(ax=ax,color="0.5",alpha=0.5)
    ax.plot([obs.loc[oname,"obsval"],obs.loc[oname,"obsval"]],ax.get_ylim(),"r")
    ax.set_title(oname)
    plt.show()