# Demonstration notebook for processing raw RINEX data
In this Notebook, we process some example RINEX files to demonstrate gnssvod.

In [1]:
import gnssvod as gv



## gv.preprocess()
The main pre-processing function is preprocess(). This function  will do several things
- It will read RINEX observation files as pandas data frames
- It can aggregate the raw data to a lower temporal rate if specified.
- It will by default download orbit and clock files for the corresponding days from the GSSC ESA server
- From the orbit and clock files, it will calculate azimuth and elevation for each measurement
- It can save each processed file as a netcdf file in the outputdir folder or return the results as a dictionary

### specifying input files
The function exclusively reads RINEX observation files. Such files typically end with the extension '.yyO' where yy is the last two digit of the year. The function can be used to process a single file, a group of files, or several groups of files corresponding to several receivers, as shown in the examples below. All of this is done by specifying a pattern as the first argument to the function.

### specifying output destinations
Results are saved to a NetCDF file when an output directory is specified and/or returned as a dictionary when "outputresult=True" is passed.

Let's read a single file using the example data to begin with

In [2]:
pattern = {'MACROCOSM-5':'data_pr/MACROCOSM-5_raw_202401101416.24O'}
result = gv.preprocess(pattern,outputresult=True)

data_pr/MACROCOSM-5_raw_202401101416.24O exists | Reading...
Observation file  data_pr/MACROCOSM-5_raw_202401101416.24O  is read in 3.33 seconds.
Processing 173896 individual observations
0.2
Calculating Azimuth and Elevation
GFZ0MGXRAP_20240100000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20240100000_01D_05M_ORB.SP3 file is read in 0.24 seconds
GFZ0MGXRAP_20240100000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20240100000_01D_30S_CLK.CLK file is read in 1.57 seconds
SP3 interpolation is done in 35.05 seconds


The default logs should indicate how many observations were read in the file. If this is the first time you run the script, it also shows some orbit files were downloaded.

If you process very recent data (less than 3 days old), it could be that the orbit and clock files are not available on the ESA server yet and there would then be an error.

The result returned by the function is a dictionary providing lists of Observation objects.

In [3]:
result

{'MACROCOSM-5': [<gnssvod.io.io.Observation at 0x153334d8d780>]}

Since we processed one file, there is only one Observation object in the list. Let us access this first and unique item.

In [4]:
obs = result['MACROCOSM-5'][0]
obs

<gnssvod.io.io.Observation at 0x153334d8d780>

Observation objects are custom classes introduced in the `gnsspy` package by Mustafa Serkan Işık and Volkan Özbey. A significant number of base functions in `gnssvod` are based on gnsspy.

Observation objects contain the following properties
- obs.filename          = the name of the source file
- obs.epoch             = a datetime indicate the day at the start of the record
- obs.observation       = a pandas data frame containing all measurements
- obs.approx_position   = the approximate receiver position as provided in the RINEX file [X,Y,Z]
- obs.receiver_type     = the receiver type if provided in the RINEX file
- obs.antenna_type      = the antenna type if provided in the RINEX file
- obs.interval          = the measurement frequency in seconds
- obs.receiver_clock    = the receiver clock if provided in the RINEX file
- obs.version           = the version of the RINEX file
- obs.observation_types = the observation types reported as columns in obs.observation

Let's just look at the data..

In [7]:
obs.observation

Unnamed: 0_level_0,Unnamed: 1_level_0,C1C,C1X,C2C,C2I,C2X,C7I,C7X,D1C,D1X,D2C,...,L7X,S1C,S1X,S2C,S2I,S2X,S7I,S7X,Azimuth,Elevation
Epoch,SV,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2024-01-10 14:17:20.199999,G10,2.179389e+07,,,,2.179390e+07,,,-4167.363,,,...,,49.0,,,,40.0,,,,
2024-01-10 14:17:20.199999,G26,2.081376e+07,,,,2.081377e+07,,,746.197,,,...,,48.0,,,,41.0,,,,
2024-01-10 14:17:20.199999,G28,2.173687e+07,,,,2.173688e+07,,,-1773.103,,,...,,45.0,,,,41.0,,,,
2024-01-10 14:17:20.199999,G31,2.166509e+07,,,,2.166509e+07,,,-1046.497,,,...,,47.0,,,,38.0,,,,
2024-01-10 14:17:20.199999,G32,2.115887e+07,,,,2.115887e+07,,,-826.085,,,...,,47.0,,,,39.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-01-10 15:04:57.801999,E25,,2.527410e+07,,,,,2.527412e+07,,247.015,,...,1.017684e+08,,41.0,,,,,36.0,,
2024-01-10 15:04:57.801999,E31,,2.740893e+07,,,,,2.740894e+07,,-3780.169,,...,,,35.0,,,,,29.0,,
2024-01-10 15:04:57.801999,C20,,,,2.276216e+07,,,,,,,...,,,,,46.0,,,,,
2024-01-10 15:04:57.801999,C32,,,,2.352408e+07,,,,,,,...,,,,,46.0,,,,,


The pandas data frame has a MultIndex that contains both Epoch and SV as indices. The columns correspond to:
- C# = Pseudorange from the receiver to the satellite, in meters
- L# = Carrier phase, in cycles
- D# = Doppler, in Hz
- S# = Carrier to noise density C/N$_0$, in dB (receiver-dependent)

And the numbers (S1, S2, etc. ) indicate the corresponding GNSS frequency

The azimuth and elevation of the satellite with respect to the receiver are expressed in degrees. Computation speed for the azimuth and elevation can vary according to your hardware. Most of the time is spent interpolating the orbit parameters to the time stamps of each measurement. This is why it is sometimes useful to aggregate high frequency data (here one measurement per second) to for instance one measurement each 15 seconds.

### resampling

We can pass "interval='15S'" to resample the data during the preprocessing. The returned data will be smaller and the calculation of the azimuths and elevations (reported as "SP3 interpolation") will be faster.

In [8]:
pattern = {'MACROCOSM-5':'data_pr/MACROCOSM-5_raw_202401101416.24O'}
result = gv.preprocess(pattern,interval='15S',outputresult=True)
# and show data frame
result['MACROCOSM-5'][0].observation

data_pr/MACROCOSM-5_raw_202401101416.24O exists | Reading...
Something awry with interval calculations
0 0.2
Observation file  data_pr/MACROCOSM-5_raw_202401101416.24O  is read in 3.37 seconds.
Processing 173896 individual observations
0.2


  obs.observation = obs.observation[subset].groupby([pd.Grouper(freq=interval, level='Epoch'),pd.Grouper(level='SV')]).mean()
  obs.interval = pd.Timedelta(interval).seconds


Calculating Azimuth and Elevation
15
GFZ0MGXRAP_20240100000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20240100000_01D_05M_ORB.SP3 file is read in 0.23 seconds
GFZ0MGXRAP_20240100000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20240100000_01D_30S_CLK.CLK file is read in 1.61 seconds


  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp.resample(f"{interval}S")
  sp3_temp_resampled = sp3_temp

SP3 interpolation is done in 1.11 seconds


Unnamed: 0_level_0,Unnamed: 1_level_0,C1C,C1X,C2C,C2I,C2X,C7I,C7X,D1C,D1X,D2C,...,L7X,S1C,S1X,S2C,S2I,S2X,S7I,S7X,Azimuth,Elevation
Epoch,SV,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2024-01-10 14:17:15,C20,,,,2.297873e+07,,,,,,,...,,,,,47.000000,,,,-34.739211,47.399714
2024-01-10 14:17:15,C32,,,,2.200606e+07,,,,,,,...,,,,,47.953488,,,,139.563362,72.284940
2024-01-10 14:17:15,C37,,,,2.257064e+07,,,,,,,...,,,,,48.465116,,,,-10.645488,59.519913
2024-01-10 14:17:15,E05,,2.412110e+07,,,,,2.412081e+07,,760.307673,,...,9.712434e+07,,43.142857,,,,,38.607143,-27.209961,35.252094
2024-01-10 14:17:15,E09,,2.410676e+07,,,,,,,-787.507102,,...,,,38.959184,,,,,,54.342066,59.184168
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-01-10 15:04:45,R07,1.989512e+07,,,,,,,-379.621071,,,...,,42.928571,,,,,,,-0.082711,79.961121
2024-01-10 15:04:45,R08,2.256099e+07,,2.256011e+07,,,,,2516.342393,,1956.755556,...,,36.892857,,28.000000,,,,,-24.759251,29.324852
2024-01-10 15:04:45,R09,2.223900e+07,,2.223981e+07,,,,,-3952.130071,,-3073.648412,...,,35.642857,,33.941176,,,,,12.628861,32.179252
2024-01-10 15:04:45,R10,2.095269e+07,,,,,,,-2196.010600,,,...,,37.840000,,,,,,,,


Orbit and clock files are not downloaded again if they already exist. There are now less rows in the data frame.

## Batch processing
We now use the preprocessing function to process many files and save the outputs as NetCDF files (instead of returning as objects). If we were to process several hundreds of files, the system would likely not have sufficient memory to hold all of the outputs, so it makes sense to processed data as a NetCDF file.

### Specifying several groups of files
Instead of specifying just one file, we use the dictionary to specify a pattern. All files matching the pattern will be processed. We can process several groups files by specifying different matching patterns (see below).

### Specifying where to save data
Same as for specifying the inputs, we use a dictionary to indicate where to save data. The function will create the destination folder if it does not exist.

### Specifying a list of variables to save
For calculating GNSS-VOD, we only need the "S" variables. We can reduce the size of the saved NetCDF files by discarding the other variables, this is done with the 'keepvars' argument, which will only keep the variables present in the passed list. This argument supports UNIX-style pattern matching (e.g. 'S*' will match all variables starting with 'S')

### Compression
Unless `compress=False` is passed as argument, `gv.preprocess()` will compress all S* variables, as well as Azimuth and Elevation when saving to NetCDF. These variables are encoded as Int16 with a scale factor of 0.1. The decoding is automatically applied when reading the data with xarray.

In [7]:
# use gnssvod to batch process the observation RINEX files 
# (files with extension .yyO for each station)
# pattern = {'choice_of_name_for_station1':'pattern to match (UNIX-style)',
#            'choice_of_name_for_station2':'pattern to match (UNIX-style)',
#             ...}
#
pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/*.*O',
          'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/rinex/*.*O'}
outputdir = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/',
            'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/'}
# what variables should be kept
keepvars = ['S?','S??']

gv.preprocess(pattern,interval='15S',keepvars=keepvars,outputdir=outputdir)



data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104290006.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104290006.21O  is read in 2.42 seconds.
Processing 109702 individual observations
Calculating Azimuth and Elevation
GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 file is read in 0.27 seconds
GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 file is read in 0.28 seconds
GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK file is read in 1.77 seconds
GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK file is read in 2.06 seconds
SP3 interpolation is done in 5.37 seconds
Saved 7370 individual observations in Reach_Dav2_Twr-raw_202104290006.nc
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104290106.21O exists | Reading...
Observation file  d



Observation file  data_RINEX2.11/Dav1_Grnd/rinex/Reach_Dav1_Grnd-raw_202104290006.21O  is read in 2.44 seconds.
Processing 110793 individual observations
Calculating Azimuth and Elevation
GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 file is read in 0.43 seconds
GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 exists | Reading...
GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 file is read in 0.28 seconds
GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK file is read in 1.74 seconds
GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK exists | Reading...
GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK file is read in 1.72 seconds
SP3 interpolation is done in 5.25 seconds
Saved 7493 individual observations in Reach_Dav1_Grnd-raw_202104290006.nc
data_RINEX2.11/Dav1_Grnd/rinex/Reach_Dav1_Grnd-raw_202104290106.21O exists | Reading...
Observation file  data_RINEX2.11/Dav1_Grnd/rinex/Reach_Dav1_Grnd-raw_202104290106.21O  is read in 2.

### Skipping existing files by default
The preprocess function will scan the destination folder for existing NetCDF files. If some files are found that have already been processed, these files will be skipped unless overwrite=True has been passed.

Here because the destination folder was empty, a user warning appears in the log above but can be ignored ("Could not find any files matching the pattern data_RINEX2.11/Dav2_Twr/nc/*.nc")