# pyEMU basics

In this exercise, we will explore some of the capabilities of pyemu to deal with the PEST file formats, such as .pst, .jco/.jcb, .unc, .cov, .mat, etc, as well as generating PEST interface elements

In [None]:
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import pyemu

In [None]:
pyemu.__path__  # check that we're pointing to the provided snapshot of pyemu (and flopy) repos

We will use some pre-cooked files in this notebook:

In [None]:
f_d = "handling_files"
os.listdir(f_d)

### Control files and the `Pst` class

pyEMU encapsulates the PEST control file in the `Pst` class

In [None]:
pst = pyemu.Pst(os.path.join(f_d,"freyberg_pp.pst"))

In [None]:
pst

The "*" sections of the control file are stored as attributes of the `Pst` instance (the PEST variable names are used for consistency)

In [None]:
pst.parameter_data.head()

In [None]:
pst.observation_data.head()

Control data is handled by a special class that tries to prevent stupidity

In [None]:
pst.control_data.noptmax = 0

PEST++ options are stored in a dict:

In [None]:
pst.pestpp_options

### Writing a control file

In [None]:
pst.write(os.path.join(f_d,"test.pst"))

A preview of things to come...

In [None]:
pst.write(os.path.join(f_d,"test.pst"),version=2)

### Constructing a control file from template and instruction files

### DIY: get a new control file from a template file (or files) and an instruction file (or files).  You can use the files in the `f_d` directory or you can write your own.  Change par bounds and obs weights then write

In [None]:
# your code here
#print(os.listdir(f_d))
# a template file
tpl_files = [os.path.join(f_d,"WEL_0001.dat.temp.tpl")]
# and the cooresponding model input file
in_files = [os.path.join(f_d,"WEL_0001.dat")]

# an instruction file
ins_files = [os.path.join(f_d,"freyberg.travel.ins")]
# and the cooresponding model output file
out_files = [os.path.join(f_d,"freyberg.travel")]


pst_new = pyemu.Pst.from_io_files(tpl_files,in_files,
                                  ins_files,out_files)
# adjust upper and lower parameter bounds
pst_new.parameter_data.loc[:,"parubnd"] = 100.0
pst_new.parameter_data.loc[:,"parlbnd"] = 0.1
# set all weights to 0.0
pst_new.observation_data.loc[:,"weight"] = 0.0


pst_new.write("temp.pst")
pst_new.par_names

# Matrices

pyEMU implements a labeled matrix class and overloads the standard operators to make linear alg easier.  Let's start with covariance matrices:

In [None]:
cov = pyemu.Cov.from_parameter_data(pst)
cov

In [None]:
cov.row_names[:5]

In [None]:
cov.col_names[:5]

In [None]:
cov.isdiagonal

the `Cov` has some nice build-in methods:

In [None]:
cov.inv

In [None]:
cov.s #singular values

In [None]:
cov.v #right singular vectors

The actual array of values in the `.x` attribute:

In [None]:
cov.x[0:5]

In [None]:
type(cov),type(cov.x)

Why is this 1-D?

In [None]:
c = plt.imshow(cov.as_2d)
plt.colorbar(c)

In [None]:
post_cov = pyemu.Cov.from_ascii(os.path.join(f_d,"freyberg_pp.post.cov"))
post_cov.isdiagonal

In [None]:
c = plt.imshow(post_cov.x)
plt.colorbar(c)

### DIY: plot the singular spectrum of the posterior covariance matrix.  Then convert the posterior covariance matrix to correlation matrix, mask the diagonal and plot

In [None]:
#hint: Cov.to_pearson()
plt.plot(post_cov.s.x)
plt.show()
post_cov.s.isdiagonal
cc = post_cov.to_pearson()
x = cc.x
indices = np.arange(x.shape[0])
x[indices,indices] = np.nan
cb = plt.imshow(cc.x)
plt.colorbar(cb)

### Residual handling

The `Pst` class tries load a residuals file in the constructor.  If that file is found, you can access some pretty cool stuff (you can pass the name of a residual file to the `Pst` constructor...).  The `res` attribute is stored as a `pd.DataFrame`

In [None]:
pst.phi

In [None]:
pst.phi_components

In [None]:
pst.res.head()

### DIY: plot a bar chart of residuals for non-zero weighted obs

In [None]:
pst.plot(kind="phi_pie")

### The Jacobian matrix

A dervied pyemu.Matrix type...

In [None]:
jco = pyemu.Jco.from_binary(os.path.join(f_d,"freyberg_pp.jcb"))

In [None]:
df = jco.to_dataframe()
df.head()

### Some sweet potting sugar:

In [None]:
pst.plot(kind="phi_pie")

In [None]:
pst.plot(kind='prior')

In [None]:
pst.plot(kind="1to1")

In [None]:
pst.write_par_summary_table(filename="par.tex")

### DIY: Adjust the weights so that both non-zero obs groups contribute equally to the objective function (and plot!) - no model runs required...

In [None]:
# hint: pst.adjust_weights
obsgrp_dict = {}
for nnz_obsgrp in pst.nnz_obs_groups:
    obsgrp_dict[nnz_obsgrp] = 500.0
pst.adjust_weights(obsgrp_dict=obsgrp_dict)
pst.plot(kind="phi_pie")
pst.write("temp_reweight.pst")
pst.observation_data.loc[pst.nnz_obs_names,"weight"] = 10.0
pst.plot(kind="phi_pie")

### DIY: form the normal matrix (XtQX) with non-zero weight obs and plot

Q is the inverse of the obs noise cov matrix

In [None]:
# hint Cov.from_observation_data()
xtqx = jco.T * pyemu.Cov.from_observation_data(pst).inv * jco

### now invert XtQX:

In [None]:
xtqx.inv

### Geostats in pyemu

These are pure python so they arent super fast...

In [None]:
v_contribution = 1.0 # variance
v_range = 1000
exp_vario = pyemu.geostats.ExpVario(v_contribution,v_range)
exp_vario.plot()

now lets build a covariance matrix from x-y points.  We can generate these randomly or just use the pilot points template file:

In [None]:
df = pyemu.pp_utils.pp_tpl_to_dataframe(os.path.join(f_d,"hkpp.dat.tpl"))
df.head()

In [None]:
plt.imshow(pyemu.geostats.ExpVario(0.1,5000).covariance_matrix(df.x,df.y,df.name).x)

Here we will just use a 1-D sequence to get a cov matrix (think "time series")

In [None]:
times = np.arange(0,365,1)
y = np.ones_like(times)
names = ["t_"+str(t) for t in times]

In [None]:
v_contribution = 1.0 # variance
v_range = 5 # days
exp_vario = pyemu.geostats.ExpVario(v_contribution,v_range)
exp_vario.plot()

In [None]:
cov = exp_vario.covariance_matrix(times,y,names)
plt.imshow(cov.x)

### Ensembles

The pyemu ensemble class inherit from pandas DataFrame so all that nice stuff is included for free

In [None]:
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst=pst,cov=pyemu.Cov.from_parameter_data(pst),num_reals=1000)
pe.head()

Check your understanding: where did the first (mean vector) and second (covariance matrix) moments come from in that ensemble generation?  

In [None]:
pe.iloc[:,0].hist()


In [None]:
pe.iloc[:,0].apply(np.log10).hist()

So that was really easy...but what if we want to express spatial/temporal correlation in the prior?  that means we need to form mixed block-diagonal/diagonal cov matrix and then draw from it. In this case, we have spatially correlated pilot point parameters:

In [None]:
df = pyemu.pp_utils.pp_tpl_to_dataframe(os.path.join(f_d,"hkpp.dat.tpl"))
df.head()

Let's build a combined, block diagonal matrix:

In [None]:
ev = pyemu.geostats.ExpVario(1.0,5000)
cov = pyemu.helpers.geostatistical_prior_builder(pst=pst,struct_dict={ev:df})
x = cov.x.copy()
x[x<1.0e-3] = np.NaN
plt.imshow(x)

This is exactly the same line as above except here the `cov` includes some off-diagonals for the pilot points

In [None]:
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst=pst,cov=cov,num_reals=100)
pe.head()

Lets plot the values of the pilot points in space to see their correlation (or lack thereof)

In [None]:
df.index = df.parnme
df.loc[:,"parval1"] = pe.loc[0,df.parnme]
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111,aspect="equal")
plt.scatter(df.x,df.y,c=df.parval1,s=500)

You can "kind of" see that correlation, but if we krige these values to the model grid, we can really see it...

In [None]:
df.loc[:,"parval1"] = pe.loc[0,df.parnme]
df.index = np.arange(df.shape[0])
arr = pyemu.geostats.fac2real(df,factors_file=os.path.join(f_d,"hkpp.dat.fac"),out_file=None)

In [None]:
plt.imshow(np.log10(arr))

### DIY: experiment with changing the variogram range and seeing how it changes the resulting parameter fields

FORESHADOWING: we can also form an empirical covariance matrix from this par ensemble!

In [None]:
emp_cov = pe.covariance_matrix()
x = emp_cov.x.copy()
x[x<1.0e-3] = np.NaN
plt.imshow(x)