# Merging processed data
This notebook relies on the data from the previous notebook (but there is no need to run the previous notebook for this one to work however).

In [8]:
import gnssvod as gv
import pandas as pd
import xarray as xr
import numpy as np
import glob
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import matplotlib.dates as mdates

## Merge
In the previous notebook, we processed raw RINEX observation files individually for each receiver and saved the results in corresponding NetCDF files.

In the case of a GNSS-VOD set up, receivers are analysed as pairs. One receiver lies above the forest canopy and provides a clear-sky reference, and the other one lies below the canopy and measures the forest attenuation.

Here we merge the data from these two receivers before making any plots. We also save the merged data in chunks that are always the same (for example we save them in daily chunks). This makes it easier to manipulate data and avoids relying on the temporal chunks with which data was initially logged (here data was logged in hourly log files that span from xx:07 too xx+1:06).

### gv.gather_stations()
This function will do several things
- It will read processed observation files that were saved in NetCDF format (output of "preprocess").
- It will combine data from the various receivers/stations according to user-specified pairing rules.
- It will only process data belonging to the requested time interval.
- It will return and/or save paired data in temporal chunks specified by the time interval.

#### Specifying input files

In [2]:
# first let's indicate where to find the data for each receiver
pattern={'MACROCOSM-5':'data_pr/nc/MACROCOSM-5*.nc',
        'MACROCOSM-2': 'data_pr/nc/MACROCOSM-2*.nc'}

#### Specifying time interval
Then we need to define the temporal interval and the temporal chunks we will want for the output data
                                                                             
Here we decide to process all data from '28-04-2021' to '29-04-2021', meaning 2 days, starting at '28-04-2021'

In [3]:
startday = start=pd.to_datetime('10-01-2024',format='%d-%m-%Y')
timeintervals=pd.interval_range(start=startday, periods=2, freq='D', closed='left')
timeintervals

IntervalIndex([[2024-01-10 00:00:00, 2024-01-11 00:00:00), [2024-01-11 00:00:00, 2024-01-12 00:00:00)], dtype='interval[datetime64[ns], left]')

Using the timeintervals above will save/return the results in chunks of 1 day. If we wanted the results in hourly chunks, we could have written instead:

`timeintervals=pd.interval_range(start=startday, periods=48, freq='H', closed='left')`

Now the only thing left is to define how to combine the stations, using the same dictionary keys as in 'pattern'.

In [4]:
# define how to make pairs, always give reference station first, matching the dictionary keys of 'pattern'
pairings={'MACROCOSM':('MACROCOSM-5','MACROCOSM-2')}

# run function
out = gv.gather_stations(pattern,pairings,timeintervals)

Processing MACROCOSM
Listing the files matching with the interval
Found 1 files for MACROCOSM-5
Reading
Found 1 files for MACROCOSM-2
Reading
Concatenating


  out[case_name] = [x for x in iout.groupby(pd.cut(iout.index.get_level_values('Epoch').tolist(), timeintervals))]


The result is of the form

out = dict(key:list(
<br>&emsp;&emsp;tuple(pd.Interval,pd.DataFrame)),
<br>&emsp;&emsp;tuple(pd.Interval,pd.DataFrame)),
<br>&emsp;&emsp;tuple(pd.Interval,pd.DataFrame))
<br>)

In our case, something like:

out = dict('Dav': \[
<br>&emsp;&emsp;(Interval('2021-04-28', '2021-04-29', closed='left'), dataframe),
<br>&emsp;&emsp;(Interval('2021-04-29', '2021-04-30', closed='left'), dataframe)
<br>\])

In [5]:
out['MACROCOSM'][0][1]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,S1C,S1X,S2C,S2X,Azimuth,Elevation
Station,Epoch,SV,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MACROCOSM-5,2024-01-10 14:17:15,C20,,,,,-34.7,47.4
MACROCOSM-5,2024-01-10 14:17:15,C32,,,,,139.6,72.3
MACROCOSM-5,2024-01-10 14:17:15,C37,,,,,-10.6,59.5
MACROCOSM-5,2024-01-10 14:17:15,E05,,43.1,,,-27.2,35.3
MACROCOSM-5,2024-01-10 14:17:15,E09,,39.0,,,54.3,59.2
...,...,...,...,...,...,...,...,...
MACROCOSM-2,2024-01-10 16:04:30,R07,44.0,,35.0,,-178.5,62.2
MACROCOSM-2,2024-01-10 16:04:30,R08,41.9,,31.7,,-44.8,58.3
MACROCOSM-2,2024-01-10 16:04:30,R09,31.3,,,,35.3,13.2
MACROCOSM-2,2024-01-10 16:04:30,R10,42.0,,,,,


#### Specifying output destination
Instead of just returning the result as an output of the function, we can specify where to save it instead. Again it may also be useful to get rid of some variables that are not useful to reduce file size.

In [12]:
# define where to save output data, matching the dictionary keys in 'pairings'
outputdir = {'MACROCOSM':'data_pr/MACROCOSM_paired/'}
# define which variables to keep
keepvars = ['S1','S2','Azimuth','Elevation']

# run function
out = gv.gather_stations(pattern,pairings,timeintervals,keepvars=keepvars,outputdir=outputdir)

Processing MACROCOSM
Listing the files matching with the interval
Found 1 files for MACROCOSM-5
Reading
Found 1 files for MACROCOSM-2
Reading
Concatenating
Saving files for MACROCOSM in data_pr/MACROCOSM_paired/
Saved 3982 obs in MACROCOSM_20240110000000_20240111000000.nc
No data for timestep 20240111000000_20240112000000, no file saved


  out[case_name] = [x for x in iout.groupby(pd.cut(iout.index.get_level_values('Epoch').tolist(), timeintervals))]


As we asked, the results have been saved as daily files (even though the input files are hourly files). The file names are generated based on the key of the 'pairing' argument (here 'Dav') and the specified time intervals.