# 4. Pastas Project

In this notebook you will learn:
- what a pastas project is.
- how observations series, stresses and models are stored in a pastas project.
- how to do bulk operations on pastas models.
- how to write and read a pastas project from a file.


In order to do bulk operations on time series models you can use a Pastas Project. A Project is a Python class that contains observations, stresses and models of multiple locations. This class has convenient methods to store time series data, create models, add stressmodels and summarize the results. This notebook is an example of the current possibilities.

In [None]:
# First perform the necessary imports
import os
import pandas as pd
import matplotlib.pyplot as plt
import pastas as ps
%matplotlib inline

In [None]:
## Starting a new Project
pr = ps.Project(name='project for notebook')

## Add observations
We can add oseries to the project. Set kind to 'oseries' to add the series as observations. Observation-series are added to pr.oseries, which is a Pandas DataFrame. The measurement-TimeSeries is in the 'series'-column. Metadata provided to add_series is shown in the other columns of pr.oseries.

In [None]:
# add the observations in all the files in the data-directory that end with _1.csv
datapath = r'data\nb4'
files = [x for x in os.listdir(datapath) if x.endswith('_1.csv')]
for file in files:
    fname = os.path.join(datapath,file)
    series = ps.read_dino(fname)
    pr.add_series(series, kind='oseries')
    
# show the contents of pr.oseries
pr.oseries

#### Exercise 1 <a name="ex1"></a>
Create a pastas project. Add the observations in the `data\nb4\ex1`  to the model. Which measurements series has the lowest filter?

<a href="#ans1">Answer Exercise 1</a>

## Stresses
And we can add stresses. To make pr.add_recharge work later, we have to add the precipitation-series as kind='prec' and the evaporation-series as kind='evap'. Stress-series are added to pr.stresses, which is a Pandas DataFrame (just like pr.oseries). The stress-TimeSeries is in the 'series'-column. Metadata provided to add_series is shown in the other columns of pr.stresses.

In [None]:
# add evaporation
fname = os.path.join(datapath,'KNMI_Bilt.txt')
series = ps.read_knmi(fname, variables='EV24')
pr.add_series(series, kind='evap', settings='evap')

# add precipitation
fname = os.path.join(datapath,'KNMI_Akkrum.txt')
series = ps.read_knmi(fname, variables='RD')
pr.add_series(series, kind='prec', settings='prec')


# show the contents of pr.stresses
pr.stresses

#### Exercise 2 <a name="ex2"></a>
Add the evaporation from De Bilt and the precipitation from Akkrum to the project you made in Exercise 1.

<a href="#ans2">Answer Exercise 2</a>

## Make models
We can make models and add recharge. Models are added to pr.models, which is a dictionary with the model-names as the keys, and the models as the values. The add_recharge method finds the closest precipitation- and evaporation-series to the measurement location that the model describes.

The file that we used for precipitation did not contain any coordinates, which will therefore default to 0.0. The evaporation-file contains coordinates in epsg:4326, while our observation-files contain coordinates in epsg:28992. Right now we do not transform coordinates. So finding the closest precipitation- and evaporation-series will normally give wrong results. As we have only one precipitation and evaporation series however, this is not a problem.

In the code-section below, we make three models with recharge and solve them.

In [None]:
for name in pr.oseries.index:
    ml = pr.add_model(name)
    pr.add_recharge(ml)
    ml.solve(report=False)
    
# show the contents of pr.models
pr.models

#### Exercise 3 <a name="ex3"></a>
Create models of your pastas project from exercise 2 and solve them.

<a href="#ans3">Answer Exercise 3</a>

## Plot individual results
Take one of the models and plot the decomposition. As we can see, the precipitation series does not contain the first few years of the simulation. The start- and end-dates of the model (tmin and tmax) are solely determined by the observation-series.

In [None]:
name = 'B58C0698_1'
ml = pr.models[name]
ml.plots.decomposition()

## Get some statistics / parameters of all models
Make a table with some statistics of the models

In [None]:
pr.get_statistics(['evp','aic'])

Make a table with some parameters of the models

In [None]:
pr.get_parameters(['recharge_A','constant_d','noise_alpha'])

Get the EVP from the models. What do you think?

#### Exercise 4 <a name="ex4"></a>
Get the EVP from the models in your pastas project from exercise 3. What do you think?

<a href="#ans4">Answer Exercise 4</a>

#### Exercise 5 <a name="ex5"></a>

Improve the models in your project from exercise 4 by replacing the precipitation from Akkrum with measurements from IJsselstein use this website https://www.knmi.nl/nederland-nu/klimatologie/monv/reeksen. And replace the evaporation from De Bilt with the evaporation from Arcen using this website https://www.knmi.nl/nederland-nu/klimatologie/daggegevens. Does this improve the EVP? Plot the results of model B52D0502_1, what do you see?

<a href="#ans5">Answer Exercise 5</a>

#### Exercise 6 <a name="ex6"></a>
There are no evaporation measurements at Arcen before 1991. Change the calibration period of the models using tmin in such a way that a more realistic model is created.

<a href="#ans6">Answer Exercise 6</a>

#### Exercise 7 <a name="ex7"></a>
Add a step trend to the models in january 2010. Solve the models and explore the results. What happens?

<a href="#ans7">Answer Exercise 7</a>

## Make a map
We can make a map of the locations of oseries. The mapping-functionality of a Pastas Project need to be expanded.

In [None]:
f,ax= plt.subplots()
ax.axis('equal')
pr.maps.series(kind='oseries')

## Saving and loading a project
We can save an entire project, with all its series and models, to a file.

In [None]:
pr.to_file('pastas_project.pas')

Later we can reload this project again

In [None]:
pr = ps.io.load('pastas_project.pas')

Test if everything went ok by plotting the decomposition of B58C0698_1 again. This figure is exactly the same as before.

In [None]:
name = 'B58C0698_1'
ml = pr.models[name]
ml.plots.decomposition()


## Answers

#### <a href="#ex1">Answer exercise 1</a> <a name="ans1"></a>

Measurement point B52D0192_2 has the lowest filter. This can be seen in the column `Onderkant filter (cm t.o.v. NAP)` from the dataframe: `pr_q.oseries`. This is a hard question if you don't know Dutch (sorry!).

In [None]:
# Starting a new Project
pr_q = ps.Project(name='exercise1')

# add the observations in all the files in the data-directory that end with _1.csv
datapath_ex1 = r'data\nb4\ex1'
files = [x for x in os.listdir(datapath_ex1) if x.endswith('_1.csv')]
for file in files:
    fname = os.path.join(datapath_ex1,file)
    series = ps.read_dino(fname)
    pr_q.add_series(series, kind='oseries')
# show the contents of pr.oseries
meetreeks = pr_q.oseries.loc['B52C2089_1','series'].series
pr_q.oseries

#### <a href="#ex2">Answer exercise 2</a> <a name="ans2"></a>

In [None]:
# add evaporation
fname = os.path.join(datapath,'KNMI_Bilt.txt')
series = ps.read_knmi(fname, variables='EV24')
pr_q.add_series(series, kind='evap', settings='evap')

# add precipitation
fname = os.path.join(datapath,'KNMI_Akkrum.txt')
series = ps.read_knmi(fname, variables='RD')
pr_q.add_series(series, kind='prec', settings='prec')


# show the contents of pr.stresses
pr_q.stresses

#### <a href="#ex3">Answer exercise 3</a> <a name="ans3"></a>

In [None]:
#Exercise 3
pr_q.add_models()
pr_q.add_recharge()
pr_q.solve_models()

In [None]:
# alternative with a for-loop
for name in pr_q.oseries.index:
    ml = pr_q.add_model(name)
    pr_q.add_recharge(ml)
    ml.solve(report=False)

#### <a href="#ex4">Answer exercise 4</a> <a name="ans4"></a>

See the explained variance (evp) in the cell below. The evp is rather low. In general people use an evp value of more than 70-80% for a reasonable fit.

In [None]:
pr_q.get_statistics(['evp'])

#### <a href="#ex5">Answer exercise 5</a> <a name="ans5"></a>

The evaporation time series of Arcen has no values before 1991 and therefore the model of B52D0502_1 and others have a poor fit/low evp. 

In [None]:
# add evaporation
fname = os.path.join(datapath,'etmgeg_391.txt')
series = ps.read_knmi(fname, variables='EV24')
pr_q.add_series(series, kind='evap', settings='evap')

# add precipitation
fname = os.path.join(datapath,'neerslaggeg_IJSSELSTEYN-L_913.txt')
series = ps.read_knmi(fname, variables='RD')
pr_q.add_series(series, kind='prec', settings='prec')

# delete existing stresses
pr_q.del_stress('EV24 DE BILT')
pr_q.del_stress('RD 89')

# create and solve the models
for name in pr_q.oseries.index:
    ml = pr_q.add_model(name)
    pr_q.add_recharge(ml)
    ml.solve(report=False)
    
# get the statistics
print(pr_q.get_statistics(['evp']))

# results of individual model
name = 'B52D0502_1'
ml = pr_q.models[name]
ml.plots.decomposition();

#### <a href="#ex6">Answer exercise 6</a> <a name="ans6"></a>

In [None]:
for name in pr_q.oseries.index:
    ml = pr_q.add_model(name)
    pr_q.add_recharge(ml)
    ml.solve(tmin='1993', report=False)
    
print(pr_q.get_statistics(['evp']))

# results of individual model
name = 'B52D0502_1'
ml = pr_q.models[name]
ml.plots.decomposition();

#### <a href="#ex7">Answer exercise 7</a> <a name="ans7"></a>

There seems to be little effect of the steptrend on the model results

In [None]:
sm = ps.stressmodels.StepModel('2010', name='step', up=True)
sm2 = ps.stressmodels.StepModel('2013', name='step2', up=True)

for name in pr_q.oseries.index:
    ml = pr_q.add_model(name)
    ml.add_stressmodel(sm)
    #ml.add_stressmodel(sm2)
    pr_q.add_recharge(ml)
    ml.solve(tmin='1993', report=False)
    
print(pr_q.get_statistics(['evp']))

# results of individual model
name = 'B52D0502_1'
ml = pr_q.models[name]
ml.plots.decomposition();