# Introduction & guide to pastastore for `pastas.Project` users<a id="top"></a>

This notebook is a guide to help users coming from `pastas.Project` to transition to `pastastore`. The implementation has remained quite similar, despite the new (more flexible) design of the pastastore module.

## Content
1. [Introduction](#1)
2. [Comparing pastas.Project and PastaStore](#2)
  1. [Adding and accessing timeseries (oseries and stresses)](#2.1)
  2. [Creating, adding and accessing models](#2.2)
  3. [Overview of oseries, stresses and models](#2.3)
  4. [Deleting items](#2.4)
  5. [Bulk methods](#2.5)

<hr>


## [1. Introduction](#top)<a id="1"></a>

Let's start with a brief comparison between the original pastas.Project and the `pastastore` module. 

The `pastas.Project` implementation is relatively simple. The project consists of three "libraries" (for lack of a better name) containing the oseries, stresses and models. The oseries and stresses libraries are accessible in the form of pandas.DataFrames. The models are stored in a dictionary, with the model name used as key to access a particular model. 

The `pastastore` was designed based on these same principles. The main difference is the way the three libraries (for oseries, stresses and models) are implemented. Instead of using pandas.DataFrames and dictionaries, different implementations (called Connectors in `pastastore`) are available for storing timeseries and models. The different options each have different their own pros and cons:
- `DictConnector`: Very simple to use, but everything is stored in-memory.
- `PystoreConnector`: Requires a dependency that is a bit more challenging to install but uses the harddisk as a "database" which is easy to understand and use.
- `ArcticConnector`: Requires external software (MongoDB) and is a bit more complex, but is the fastest and uses an actual database to store data (more robust).

When using `pastastore` the user essentially has to make a choice as to which storage-type (Connector) they wish to use. In this notebook, we'll only be looking at the DictConnector, as this implementation looks a lot like the original `pastas.Project`.

<hr>

First let's import the necessary modules.

In [1]:
import pastastore as pst
import os
import pandas as pd
import pastas as ps

import sys
sys.path.insert(1, "../..")

## [2. Comparing pastas.Project and PastaStore](#top)<a id="2"></a>

In this section we'll be comparing `pastas.Project` to `pastastore`. For each common operation with pastas.Project the corresponding operation will be shown for the `pastastore`. The first step is to initialize an empty `pastas.Project` and an empty `PastaStore`.

First the pastas.Project:

In [2]:
prj = ps.Project("pastas_project")
prj



<pastas.project.project.Project at 0x7f17145c3250>

Next, the empty `PastaStore`. Recall that for the `PastaStore` we need to pick a storage method (Connector) to manage the storage and retrieval of data. In this case we'll be using the `DictConnector`.

In [3]:
conn = pst.DictConnector("pastastore")
store = pst.PastaStore("pastastore", conn)
store

<PastaStore> pastastore: 
 - <DictConnector object> 'pastastore': 0 oseries, 0 stresses, 0 models

Now we can get started on comparing the two.

### [2.1 Adding and accessing timeseries (oseries and stresses)](#top)<a id="2.1"></a>

#### Adding oseries

Load some groundwater level data and define metadata.

In [4]:
datadir = "../../tests/data/"  # relative path to data directory
oseries1 = pd.read_csv(os.path.join(
    datadir, "head_nb1.csv"), index_col=0, parse_dates=True)
ometa = {"x": 100300, "y": 400400}
oseries1.head()

Unnamed: 0_level_0,head
date,Unnamed: 1_level_1
1985-11-14,27.61
1985-11-28,27.73
1985-12-14,27.91
1985-12-28,28.13
1986-01-13,28.32


Adding observation timeseries to a pastas.Project or PastaStore is basically the same

In [5]:
# pastas.Project:
prj.add_oseries(oseries1, name="oseries1", metadata=ometa)

INFO: Cannot determine frequency of series oseries1: freq=None. The time series is irregular.


In [6]:
# pastastore:
store.add_oseries(oseries1, "oseries1", metadata=ometa)

#### Adding stresses
Load precipitation and evaporation data

In [7]:
# prec
p = pd.read_csv(os.path.join(datadir, "rain_nb1.csv"),
                index_col=0, parse_dates=True)
p.columns = ['value']
pmeta = {"x": 100300, "y": 400400}

# evap
e = pd.read_csv(os.path.join(datadir, "evap_nb1.csv"),
                index_col=0, parse_dates=True)
e.columns = ["value"]
emeta = {"x": 100300, "y": 400400}

Adding stresses to a pastas.Project or PastaStore is also similar:

In [8]:
# pastas.Project
prj.add_stress(p, "prec1", kind="prec", metadata=pmeta)
prj.add_stress(e, "evap1", kind="evap", metadata=emeta)

INFO: Inferred frequency for time series prec1: freq=D
INFO: Inferred frequency for time series evap1: freq=D


In [9]:
# pastastore
store.add_stress(p, "prec1", kind="prec", metadata=pmeta)
store.add_stress(e, "evap1", kind="evap", metadata=emeta)

#### Accessing timeseries

Accessing timeseries is one point where the two implementations differ. In pastas.Projects the oseries and stresses are stored in a pandas.DataFrame and can be acccessed from there. The timeseries are also converted into pastas.TimeSeries which contain extra information about how to up- or downscale the timeseries. 

In the PastaStore, when using a DictConnector, the timeseries are stored in dictionaries but are obtained using get-methods. The timeseries are stored as pandas.Series or pandas.DataFrames.

In [10]:
# pastas.Project
prj.oseries.loc["oseries1", "series"]

TimeSeries(name=oseries1, freq=None, freq_original=None, tmin=1985-11-14 00:00:00, tmax=2015-06-28 00:00:00)

In [11]:
# pastastore
store.get_oseries("oseries1")

Unnamed: 0_level_0,oseries1
date,Unnamed: 1_level_1
1985-11-14,27.61
1985-11-28,27.73
1985-12-14,27.91
1985-12-28,28.13
1986-01-13,28.32
...,...
2015-04-28,28.23
2015-05-14,28.08
2015-05-28,27.82
2015-06-14,27.75


For stresses this works the same way with the `store.get_stresses()` command.

### [2.2 Creating, adding and accessing models](#top)<a id="2.2"></a>

#### Creating a model (and storing it)
In pastas.Project a model is automatically added to the models dictionary when it is created. Adding recharge is a separate command.

In the PastaStore, a model is created but not automatically added to the store. A separate command is used to actually store the model. 

In [12]:
# pastas.Project
ml = prj.add_model("oseries1")
prj.add_recharge()
ml

Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

In [13]:
# pastastore
ml = store.create_model("oseries1", add_recharge=True)
store.add_model(ml)
ml

INFO: Cannot determine frequency of series oseries1: freq=None. The time series is irregular.
INFO: Inferred frequency for time series prec1: freq=D
INFO: Inferred frequency for time series evap1: freq=D


Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

Obtaining a model is slightly different in the two implementations

In [14]:
# pastas.Project
ml = prj.models["oseries1"]
ml

Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

In [15]:
# pastastore
ml = store.get_models("oseries1")
ml

INFO: Cannot determine frequency of series oseries1: freq=None. The time series is irregular.
INFO: User provided frequency for time series prec1: freq=D
INFO: User provided frequency for time series evap1: freq=D


Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

### [2.3 Overview of oseries, stresses and models](#top)<a id="2.3"></a>

Getting an overview of oseries, stresses and models is similar in both implementations, but in the PastaStore, the overview of the timeseries only contains the metadata (and not the timeseries itself), and the models attribute only returns a list of models and not the dictionary containing the models. 

#### oseries

In [16]:
# pastas.Project
prj.oseries

Unnamed: 0,name,series,kind,x,y,z,projection
oseries1,oseries1,"TimeSeries(name=oseries1, freq=None, freq_orig...",oseries,100300,400400,0,


In [17]:
# pastastore
store.oseries

Unnamed: 0_level_0,x,y
name,Unnamed: 1_level_1,Unnamed: 2_level_1
oseries1,100300,400400


#### stresses

In [18]:
# pastas.Project
prj.stresses

Unnamed: 0,name,series,kind,x,y,z,projection
prec1,prec1,"TimeSeries(name=prec1, freq=None, freq_origina...",prec,100300,400400,0,
evap1,evap1,"TimeSeries(name=evap1, freq=None, freq_origina...",evap,100300,400400,0,


In [19]:
# pastastore
store.stresses

Unnamed: 0_level_0,x,y,kind
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
prec1,100300.0,400400.0,prec
evap1,100300.0,400400.0,evap


#### models

In [20]:
# pastas.Project
prj.models

{'oseries1': Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)}

In [21]:
# pastastore
store.models

['oseries1']

### [2.4 Deleting items](#top)<a id="2.4"></a>

Deleting items is similar in both implementations. Here we delete a model, deleting oseries and stresses can be done with the `del_oseries()` or `del_stresses()` methods.

In [22]:
# pastas.Project
prj.del_model("oseries1")

INFO: Model with name Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True) deleted from the database.


In [23]:
# pastastore
store.del_models("oseries1")

### [2.5 Bulk methods](#top)<a id="2.5"></a>

Both implementations include several methods to perform actions in bulk (e.g. creating models for all oseries). To showcase these methods, we load some more data and add to the project and the store.

In [24]:
# oseries 2
o2 = pd.read_csv(os.path.join(datadir, "obs.csv"),
                 index_col=0, parse_dates=True)
o2.index.name = "oseries2"
ometa2 = {"x": 100000, "y": 400000}

# prec 2
p2 = pd.read_csv(os.path.join(datadir, "rain.csv"),
                 index_col=0, parse_dates=True)
pmeta2 = {"x": 100000, "y": 400000}

# evap 2
e2 = pd.read_csv(os.path.join(datadir, "evap.csv"),
                 index_col=0, parse_dates=True)
emeta2 = {"x": 100000, "y": 400000}

We set the pastas logger to be more quiet to reduce the number of messages it prints to the screen.

In [25]:
ps.logger.setLevel("ERROR")

Add the data

In [26]:
# pastas.Project
prj.add_oseries(o2, "oseries2", metadata=ometa2)
prj.add_stress(p2, "prec2", kind="prec", metadata=pmeta2)
prj.add_stress(e2, "evap2", kind="evap", metadata=emeta2)

In [27]:
# pastastore
store.add_oseries(o2, "oseries2", metadata=ometa2)
store.add_stress(p2, "prec2", kind="prec", metadata=pmeta2)
store.add_stress(e2, "evap2", kind="evap", metadata=emeta2)
store

<PastaStore> pastastore: 
 - <DictConnector object> 'pastastore': 2 oseries, 4 stresses, 0 models

#### Get nearest stresses

Obtaining the nearest stresses of a particular type for each oseries

In [28]:
# pastas.Project
prj.get_nearest_stresses(kind="prec")

Unnamed: 0,0
oseries1,prec1
oseries2,prec2


In [29]:
# pastastore
store.get_nearest_stresses(kind="prec")

Unnamed: 0,0
oseries1,prec1
oseries2,prec2


#### Creating and solving multiple models

Creating and solving several models at once

In [30]:
# pastas.Project
prj.add_models()
prj.add_recharge()
prj.solve_models(verbose=True)

solving model -> oseries1
solving model -> oseries2


In [31]:
# pastastore
store.create_models_bulk(add_recharge=True, store=True)
store.solve_models()

# This can also be done in one command:
# store.create_models(add_recharge=True, store=True, solve=True, report=False)

store

Bulk creation models: 100%|██████████| 2/2 [00:00<00:00, 13.98it/s]
Solving models: 100%|██████████| 2/2 [00:01<00:00,  1.17it/s]


<PastaStore> pastastore: 
 - <DictConnector object> 'pastastore': 2 oseries, 4 stresses, 2 models

#### Parameters

Obtaining certain parameter values for all models

In [32]:
# pastas.Project
prj.get_parameters(["recharge_A", "recharge_a", "recharge_n",
                   "recharge_f", "constant_d", "noise_alpha"])

Unnamed: 0,recharge_A,recharge_a,recharge_n,recharge_f,constant_d,noise_alpha
oseries1,682.464954,150.38184,1.018206,-1.271062,27.882297,50.095143
oseries2,601.962925,143.386742,1.019942,-1.373329,28.043446,69.749088


In [33]:
# pastastore
store.get_parameters()

Unnamed: 0,recharge_A,recharge_n,recharge_a,recharge_f,constant_d,noise_alpha
oseries1,682.464954,1.018206,150.38184,-1.271062,27.882297,50.095143
oseries2,601.962925,1.019942,143.386742,-1.373329,28.043446,69.749088


#### Statistics

Get model statistics for each model

In [34]:
# pastas.Project
prj.get_statistics(["evp", "aic", "rmse"])

Unnamed: 0,evp,aic,rmse
oseries1,92.915123,916.144521,0.114455
oseries2,88.478824,-23.443866,0.126184


In [35]:
# pastastore
store.get_statistics(["evp", "aic", "rmse"])

Unnamed: 0,evp,aic,rmse
oseries1,92.915123,916.144521,0.114455
oseries2,88.478824,-23.443866,0.126184
