# Merging observations

This notebook shows how observations and observation collections can be merged. Merging observations can be useful if:
- you have data from multiple sources measuring at the same location
- you get new measurements that you want to add to the old measurements.

## <a id=top></a>Notebook contents

1. [Simple merge](#simplemerge)
2. [Merge options](#mergeoptions)
3. [Merging observation collections](#mergeoc)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import display

import hydropandas as hpd

hpd.util.get_color_logger("INFO");

## Simple merge<a id=simplemerge></a>

In [None]:
# observation 1
df = pd.DataFrame(
    {"measurements": np.random.randint(0, 10, 5)},
    index=pd.date_range("2020-1-1", "2020-1-5"),
)
o1 = hpd.Obs(df, name="obs", x=0, y=0)
print(o1)

In [None]:
# observation 2
df = pd.DataFrame(
    {"measurements": np.random.randint(0, 10, 5)},
    index=pd.date_range("2020-1-6", "2020-1-10"),
)
o2 = hpd.Obs(df, name="obs", x=0, y=0)
print(o2)

In [None]:
o_merged = o1.merge_observation(o2)
o_merged

In [None]:
f, axes = plt.subplots(figsize=(9, 7), nrows=3, sharex=True, sharey=True)
o1["measurements"].plot(ax=axes[0], marker="o", label="observation 1").legend(loc=1)
o2["measurements"].plot(ax=axes[1], marker="o", label="observation 2").legend(loc=1)
o_merged["measurements"].plot(ax=axes[2], marker="o", label="merged").legend(loc=1)

## Merge options<a id=mergeoptions></a>

#### overlapping timeseries

In [None]:
# create a parly overlapping dataframe
df = pd.DataFrame(
    {
        "measurements": np.concatenate(
            [o1["measurements"].values[-2:], np.random.randint(0, 10, 3)]
        )
    },
    index=pd.date_range("2020-1-4", "2020-1-8"),
)
o3 = hpd.Obs(df, name="obs", x=0, y=0)
print(o3)

In [None]:
o_merged = o1.merge_observation(o3)

In [None]:
f, axes = plt.subplots(figsize=(9, 7), nrows=3, sharex=True, sharey=True)
o1["measurements"].plot(ax=axes[0], marker="o", label="observation 1").legend(loc=1)
o3["measurements"].plot(ax=axes[1], marker="o", label="observation 3").legend(loc=1)
o_merged["measurements"].plot(ax=axes[2], marker="o", label="merged").legend(loc=1)

In [None]:
# create a parly overlapping dataframe with different values
df = pd.DataFrame(
    {"measurements": np.random.randint(0, 10, 5)},
    index=pd.date_range("2020-1-4", "2020-1-8"),
)
o4 = hpd.Obs(df, name="obs", x=0, y=0)
print(o4)

by default an error is raised if the overlapping time series have different values

In [None]:
o1.merge_observation(o4)

With the 'overlap' argument you can specify to use the left or the right observation when merging. See example below.

In [None]:
print("use left")
merged_left = o1.merge_observation(o4, overlap="use_left")
display(merged_left)  # use the existing observation
print("use right")
merged_right = o1.merge_observation(o4, overlap="use_right")
display(merged_right)  # use the existing observation

In [None]:
f, axes = plt.subplots(figsize=(9, 7), nrows=4, sharex=True, sharey=True)
o1["measurements"].plot(ax=axes[0], marker="o", label="observation 1").legend(loc=2)
o4["measurements"].plot(ax=axes[1], marker="o", label="observation 4").legend(loc=2)
merged_left["measurements"].plot(ax=axes[2], marker="o", label="merged left").legend(
    loc=2
)
merged_right["measurements"].plot(ax=axes[3], marker="o", label="merged right").legend(
    loc=2
)

#### metadata
The `merge_observation` method checks by default if the metadata of the two observations is the same.

In [None]:
# observation 2
df = pd.DataFrame(
    {"measurements": np.random.randint(0, 10, 5)},
    index=pd.date_range("2020-1-6", "2020-1-10"),
)
o5 = hpd.Obs(df, name="obs5", x=0, y=0)
o5

When the metadata differs a ValueError is raised.

In [None]:
o1.merge_observation(o5)

If you set the `merge_metadata` argument to `False` the metadata is not merged and only the timeseries of the observations is merged.

In [None]:
o1.merge_observation(o5, merge_metadata=False)

Just as with overlapping timeseries, the 'overlap' argument can also be used for overlapping metadata values

In [None]:
o_merged = o1.merge_observation(o5, overlap="use_left", merge_metadata=True)
print('observation name when overlap="use_left":', o_merged.name)
o_merged = o1.merge_observation(o5, overlap="use_right", merge_metadata=True)
print('observation name when overlap="use_right":', o_merged.name)

#### all combinations

In [None]:
# observation 6
df = pd.DataFrame(
    {"measurements": np.random.randint(0, 10, 5), "filter": np.ones(5)},
    index=pd.date_range("2020-1-1", "2020-1-5"),
)
o6 = hpd.Obs(df, name="obs6", x=100, y=0)
o6

In [None]:
# observation 7
df = pd.DataFrame(
    {
        "measurements": np.concatenate(
            [o5["measurements"].values[-1:], np.random.randint(0, 10, 4)]
        ),
        "remarks": ["", "", "", "unreliable", ""],
    },
    index=pd.date_range("2020-1-4", "2020-1-8"),
)
o7 = hpd.Obs(df, name="obs7", x=0, y=100)
o7

In [None]:
merged_right = o6.merge_observation(o7, overlap="use_right")
merged_right

In [None]:
f, axes = plt.subplots(figsize=(9, 7), nrows=3, sharex=True, sharey=True)
o6["measurements"].plot(ax=axes[0], marker="o", label="observation 6").legend(loc=2)
o7["measurements"].plot(ax=axes[1], marker="o", legend=True, label="observation 7")
merged_right["measurements"].plot(
    ax=axes[2], marker="o", legend=True, label="merged right"
)

## Merge observation collections<a id=mergeoc></a>

In [None]:
# create an observation collection from a single observation
oc1 = hpd.ObsCollection(o1)

We can add a single observation to this collection using the `add_observation` method.

In [None]:
oc1.add_observation(o2)
oc1

We can also combine two observation collections.

In [None]:
# create another observation collection from a list of observations
oc2 = hpd.ObsCollection([o5, o6])
oc2

# add the collection to the previous one
oc1.add_obs_collection(oc2, inplace=True)
oc1

There is an automatic check for overlap based on the name of the observations. If the observations in both collections are exactly the same they are merged.

In [None]:
# add o2 to the observation collection 1
oc1.add_observation(o2)

If the observation you want to add has the same name but not the same timeseries an error is raised.

In [None]:
o1_mod = o1.copy()
o1_mod.loc["2020-01-02", "measurements"] = 100
oc1.add_observation(o1_mod)

To avoid errors we can use the `overlap` arguments to specify which observation we want to use.

In [None]:
oc1.add_observation(o1_mod, overlap="use_left")
oc1