# Processing raw RINEX data
In this example, we process some example RINEX files to demonstrate gnssvod. 

## Running the tutorials
Once gnssvod is installed in your environment, you can download the tutorials using the command line
```bash
git clone --depth 1 --filter=blob:none --sparse https://github.com/vincenthumphrey/gnssvod.git
cd gnssvod
git sparse-checkout set docs/source/examples
```

Alternatively, [download the repository as a ZIP](https://github.com/vincenthumphrey/gnssvod/archive/refs/heads/main.zip), unzip, and navigate to docs/source/examples

## Preprocessing
The main pre-processing function is {py:func}`gnssvod.preprocess`. This function can do several things:
- It will read RINEX observation files as pandas data frames
- It can aggregate the raw data to a lower temporal rate if specified.
- It will automatically download orbit and clock files for the corresponding days from the GSSC ESA server
- From the orbit and clock files, it will calculate azimuth and elevation for each measurement
- It can save each processed file as a netcdf file in the outputdir folder or return the results as a dictionary

### specifying input files
The function exclusively reads RINEX observation files. Such files typically end with the extension '.yyO' where yy is the last two digit of the year. The function can be used to process a single file, a group of files, or several groups of files corresponding to several receivers, as shown in the examples below. All of this is done by specifying a pattern as the first argument to the function.

### specifying output destinations
Results are saved to a NetCDF file when an output directory is specified and/or returned as a dictionary when "outputresult=True" is passed.

Let's read a single file using the example data to begin with

In [None]:
import gnssvod as gv

In [None]:
#pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O'}
#pattern = {'Happy':'data_RINEX3.03/Aspen_happy/rinex/RM2Happy_raw_20250820212818.25O'}
pattern = {'MtB_Twr':'data_RINEX2.11/MtB_Twr/rinex/SEPT101a.25o'}
result = gv.preprocess(pattern, outputresult=True)

The logs should indicate how many observations were read in the file.

The logs also show some orbit files were downloaded. Orbit files are necessary to calculate the azimuth and elevation of the satellites. A temporary folder is automatically created to store those orbit files and process them, then the temporary folder is deleted.

If you process very recent data (less than 3 days old), it could be that the orbit and clock files are not available on the ESA server yet and there would then be an error.

The result returned by the function is a dictionary that maps keys to lists of Observation objects.

In [None]:
result

Since we processed one file, there is only one Observation object in the list. Let us access this first and unique item.

In [None]:
#obs = result['Dav2_Twr'][0]
#obs = result['Happy'][0]
obs = result['MtB_Twr'][0]
obs

Observation objects are custom classes introduced in the [`gnsspy`](https://github.com/GNSSpy-Project/gnsspy) package by Mustafa Serkan Işık and Volkan Özbey. A significant number of base functions in `gnssvod` are based on gnsspy.


| Attribute | Description |
|-----------|-------------|
| `obs.filename` | Name of the source file |
| `obs.epoch` | Datetime indicating the day at the start of the record |
| `obs.observation` | Pandas DataFrame containing all measurements |
| `obs.approx_position` | Approximate receiver position from the RINEX file `[X, Y, Z]` |
| `obs.receiver_type` | Receiver type (if provided in the RINEX file) |
| `obs.antenna_type` | Antenna type (if provided in the RINEX file) |
| `obs.interval` | Measurement frequency in seconds |
| `obs.receiver_clock` | Receiver clock information (if provided) |
| `obs.version` | RINEX file version |
| `obs.observation_types` | Observation types reported as columns in `obs.observation` |

---

Let's just look at the data..

In [None]:
obs.observation

The pandas data frame has a MultIndex that contains both Epoch and SV as indices. The Epoch is the local time of the measurement and the SV is a satellite identification number (also called PRN).

The columns correspond to:
- C# = Pseudorange from the receiver to the satellite, in meters
- L# = Carrier phase, in cycles
- D# = Doppler, in Hz
- S# = Carrier to noise density C/N$_0$, in dB (receiver-dependent)

And the numbers (S1, S2, etc. ) indicate the corresponding GNSS frequency

The azimuth and elevation of the satellite with respect to the receiver are expressed in degrees. Computation speed for the azimuth and elevation can vary according to your hardware. Most of the time is spent interpolating the orbit parameters to the time stamps of each measurement. This is why it is sometimes useful to resample high frequency data (here one measurement per second) to for instance one measurement each 15 seconds.

### Resampling

We can pass "interval='15s'" to resample the data during the preprocessing. The returned data will be smaller and the calculation of the azimuths and elevations (reported as "SP3 interpolation") will be faster.

In [None]:
#pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O'}
#pattern = {'Happy':'data_RINEX3.03/Aspen_happy/rinex/RM2Happy_raw_20250820212818.25O'}
pattern = {'MtB_Twr':'data_RINEX2.11/MtB_Twr/rinex/SEPT101a.25o'}
result = gv.preprocess(pattern,interval='15s',outputresult=True)
# and show data frame
#result['Dav2_Twr'][0].observation
#result['Happy'][0].observation
result['MtB_Twr'][0].observation

There are now less rows in the data frame since we resampled data to one value every 15 seconds.

## Batch processing
We now demonstrate how to use the preprocessing function to process not just one but many files and save the outputs as NetCDF files (instead of returning the results as objects). If we were processing several hundreds of files, your computer may not have sufficient memory to hold all of the outputs, so it makes sense to save processed data as a NetCDF file.

### Specifying several groups of files
Instead of specifying just one file, like `data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O`, we specify a UNIX-style pattern like `data_RINEX2.11/Dav2_Twr/rinex/*.*O`. All files matching that pattern (relying on function :func:`glob.glob`) will be processed. We can process several groups files by specifying different matching patterns (see below).

### Specifying where to save data
Same as for specifying the inputs, we use a dictionary to indicate where to save data (see the example call below). The destination folder will be created if it does not exist.

### Specifying a list of variables to save
For calculating GNSS-VOD, we only need the "S" variables. We can reduce the size of the saved NetCDF files by discarding the other variables, this is done with the 'keepvars' argument, which will only keep the variables present in the passed list. This argument supports UNIX-style pattern matching (e.g. 'S*' will match all variables starting with 'S')

### Compression
Unless `encoding=None` is passed as argument, `gv.preprocess()` will compress all S* variables, as well as Azimuth and Elevation when saving to NetCDF. These variables are encoded as Int16 with a scale factor of 0.1. The decoding is automatically applied when reading the data with xarray.

In [None]:
# use gnssvod to batch process the observation RINEX files 
# (files with extension .yyO for each station)
# pattern = {'choice_of_name_for_station1':'pattern to match (UNIX-style)',
#            'choice_of_name_for_station2':'pattern to match (UNIX-style)',
#             ...}
#
pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/*.*O',
          'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/rinex/*.*O'}
outputdir = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/',
            'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/'}
# what variables should be kept
keepvars = ['S?','S??']

gv.preprocess(pattern,interval='15s',keepvars=keepvars,outputdir=outputdir)

### Skipping existing files by default
The preprocess function will scan the destination folder for existing NetCDF files. If some files are found that have already been processed, these files will be skipped unless overwrite=True has been passed.

Here because the destination folder was empty, a user warning appears in the log above but can be ignored ("Could not find any files matching the pattern data_RINEX2.11/Dav2_Twr/nc/*.nc")