# Merging observations

This notebook shows how observations and observation collections can be merged.

## <a id=top></a>Notebook contents

1. [Simple merge](#simplemerge)
2. [Merge options](#mergeoptions)
3. [Merging observation collections](#mergeoc)

In [1]:
import numpy as np
import pandas as pd
import hydropandas as hpd
from IPython.display import display

import logging
logging.basicConfig(level=logging.INFO)

## Simple merge<a id=simplemerge></a>

In [2]:
# observation 1
df = pd.DataFrame({'measurements':np.random.randint(0,10,5)}, index=pd.date_range('2020-1-1', '2020-1-5'))
o1 = hpd.Obs(df, name='obs1',x=0, y=0)
o1 

Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,4
2020-01-05,3


In [3]:
# observation 2
df = pd.DataFrame({'measurements':np.random.randint(0,10,5)}, index=pd.date_range('2020-1-6', '2020-1-10'))
o2 = hpd.Obs(df, name='obs2',x=0, y=0)
o2

Unnamed: 0,measurements
2020-01-06,2
2020-01-07,2
2020-01-08,3
2020-01-09,9
2020-01-10,5


In [4]:
o1.merge_observation(o2)

INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,4
2020-01-05,3
2020-01-06,2
2020-01-07,2
2020-01-08,3
2020-01-09,9
2020-01-10,5


## Merge options<a id=mergeoptions></a>

#### overlapping timeseries
Checks if the metadata of the two Observations is the same. 

In [5]:
o1

Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,4
2020-01-05,3


In [6]:
# create a parly overlapping dataframe
df = pd.DataFrame({'measurements':np.concatenate([o1['measurements'].values[-2:],np.random.randint(0,10,3)])}, index=pd.date_range('2020-1-4', '2020-1-8'))
o3 = hpd.Obs(df, name='obs3', x=0, y=0)
o3

Unnamed: 0,measurements
2020-01-04,4
2020-01-05,3
2020-01-06,3
2020-01-07,7
2020-01-08,6


In [7]:
o1.merge_observation(o3)

INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,4
2020-01-05,3
2020-01-06,3
2020-01-07,7
2020-01-08,6


In [8]:
# create a parly overlapping dataframe with different values
df = pd.DataFrame({'measurements':np.random.randint(0,10,5)}, index=pd.date_range('2020-1-4', '2020-1-8'))
o4 = hpd.Obs(df, name='obs4', x=0, y=0)
o4

Unnamed: 0,measurements
2020-01-04,0
2020-01-05,6
2020-01-06,0
2020-01-07,6
2020-01-08,9


by default an error is raised if the overlapping time series have different values

In [9]:
o1.merge_observation(o4)

INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


ValueError: observations have different values for same time steps

With the 'overlap' argument you can specify to use the left or the right observation when merging. See example below.

In [10]:
print('use left')
display(o1.merge_observation(o4, overlap='use_left')) # use the existing observation
print('use right')
display(o1.merge_observation(o4, overlap='use_right')) # use the existing observation


INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


use left


Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,4
2020-01-05,3
2020-01-06,0
2020-01-07,6
2020-01-08,9


INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


use right


Unnamed: 0,measurements
2020-01-01,4
2020-01-02,9
2020-01-03,4
2020-01-04,0
2020-01-05,6
2020-01-06,0
2020-01-07,6
2020-01-08,9


#### overlapping metadata
Checks if the metadata of the two Observations is the same. 

In [11]:
o1.merge_observation(o2, check_metadata=True)



ValueError: existing observation name differs from new observation

Just as with overlapping timeseries, the 'overlap' argument can also be used for overlapping metadata values

In [12]:
o_merged = o1.merge_observation(o2, overlap='use_left', check_metadata=True)
print('observation name when overlap = "use_left":', o_merged.name)
o_merged = o1.merge_observation(o2, overlap='use_right', check_metadata=True)
print('oobservation name when overlap = "use_right":', o_merged.name)

INFO:hydropandas.observation:existing observation name differs from new observation, use existing
INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series
INFO:hydropandas.observation:existing observation name differs from new observation, use new
INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


observation name when overlap = "use_left": obs1
oobservation name when overlap = "use_right": obs2


#### all combinations

In [24]:
# observation 5
df = pd.DataFrame({'measurements':np.random.randint(0,10,5),
                   'filter':np.ones(5)}, index=pd.date_range('2020-1-1', '2020-1-5'))
o5 = hpd.Obs(df, name='obs5',x=100, y=0)
o5

Unnamed: 0,measurements,filter
2020-01-01,9,1.0
2020-01-02,8,1.0
2020-01-03,4,1.0
2020-01-04,5,1.0
2020-01-05,3,1.0


In [25]:
# observation 6
df = pd.DataFrame({'measurements':np.concatenate([o5['measurements'].values[-1:],np.random.randint(0,10,4)]),
                   'remarks':['', '', '', 'unreliable', '']}, index=pd.date_range('2020-1-4', '2020-1-8'))
o6 = hpd.Obs(df, name='obs6',x=0, y=100)
o6

Unnamed: 0,measurements,remarks
2020-01-04,3,
2020-01-05,5,
2020-01-06,7,
2020-01-07,9,unreliable
2020-01-08,5,


In [30]:
pd.concat([o5,o6], axis=1)

Unnamed: 0,measurements,filter,measurements.1,remarks
2020-01-01,9.0,1.0,,
2020-01-02,8.0,1.0,,
2020-01-03,4.0,1.0,,
2020-01-04,5.0,1.0,3.0,
2020-01-05,3.0,1.0,5.0,
2020-01-06,,,7.0,
2020-01-07,,,9.0,unreliable
2020-01-08,,,5.0,


In [28]:
o5.merge_observation(o6, overlap='use_right')

INFO:hydropandas.observation:new observation has a different time series
INFO:hydropandas.observation:merge time series


Unnamed: 0,measurements,remarks,filter
2020-01-01,9,,1.0
2020-01-02,8,,1.0
2020-01-03,4,,1.0
2020-01-04,3,,1.0
2020-01-05,5,,1.0
2020-01-06,7,,
2020-01-07,9,unreliable,
2020-01-08,5,,


## Merge observation collections<a id=mergeoc></a>