# Peform statistical analyses of GNSS station locations and tropospheric zenith delays

**Author**: Simran Sangha, David Bekaert - Jet Propulsion Laboratory

This notebook provides an overview of the functionality included in the **`raiderStats.py`** program. Specifically, we outline examples on how to perform basic statistical analyses of GNSS station location and tropospheric zenith delay information over a user defined area of interest, span of time, and seasonal interval. In this notebook, we query GNSS stations spanning northern California between 2018 and 2019. 

We will outline the following statistical analysis and filtering options:
- Restrict analyses to range of years
- Restrict analyses to range of months (i.e. seasonal interval)
- Illustrate station distribution and tropospheric zenith delay mean/standard deviation
- Illustrate gridded distribution and tropospheric zenith delay mean/standard deviation
- Generate variogram plots across specified time periods

<div class="alert alert-info">
    <b>Terminology:</b>
    
- *GNSS*: Stands for Global Navigation Satellite System. Describes any satellite constellation providing global or regional positioning, navigation, and timing services.
- *tropospheric zenith delay*: The precise atmospheric delay satellite signals experience when propagating through the troposphere.
- *variogram*: Characterization of the difference between field values at two locations.
- *empirical variogram*: Provides a description of how the data are correlated with distance.
- *experimental variogram*: A discrete function calculated using a measure of variability between pairs of points at various distances
- *sill*: Limit of the variogram, tending to infinity lag distances.
- *range*: The distance in which the difference of the variogram from the sill becomes negligible, such that the data arre no longer autocorrelated.
    
    </div>
    

## Table of Contents:
<a id='example_TOC'></a>

[**Overview of the raiderStats.py program**](#overview)
- [1. Basic user input options](#overview_1)
- [2. Run parameters](#overview_2)
- [3. Optional controls for spatiotemporal subsetting](#overview_3)
- [4. Supported types of individual station scatter-plots](#overview_4)
- [5. Supported types of gridded station plots](#overview_5)
- [6. Supported types of variogram plots](#overview_6)
- [7. Optional controls for plotting](#overview_7)

[**Download prerequisite GNSS station location and tropospheric zenith delay information with the raiderDownloadGNSS.py program**](#downloads)

[**Examples of the raiderStats.py program**](#examples)
- [Example 1. Generate all individual station scatter-plots, as listed under section #4](#example_1)
- [Example 2. Generate all gridded station plots, as listed under section #5 ](#example_2)
- [Example 3. Generate gridded mean tropospheric zenith delay plot, with stations superimposed](#example_3)
- [Example 4. Generate variogram plots](#example_4)

## Prep: Initial setup of the notebook

Below we set up the directory structure for this notebook exercise. In addition, we load the required modules into our python environment using the **`import`** command.

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt

## Defining the home and data directories
tutorial_home_dir = os.path.abspath(os.getcwd())
work_dir = os.path.abspath(os.getcwd())
print("Tutorial directory: ", tutorial_home_dir)
print("Work directory: ", work_dir)

# Verifying if RAiDER is installed correctly
try:
    from RAiDER import statsPlot
except:
    raise Exception('RAiDER is missing from your PYTHONPATH')

os.chdir(work_dir)

## Overview of the raiderStats.py program
<a id='overview'></a>

The **`raiderStats.py`** program provides a suite of convinient statistical analyses of GNSS station locations and tropospheric zenith delays.

Running **`raiderStats.py`** with the **`-h`** option will show the parameter options and outline several basic, practical examples. 

Let us explore these options:

In [None]:
!raiderStats.py -h

### 1. Basic user input options
<a id='overview_1'></a>

#### Input CSV file (**`--file FNAME`**)

**REQUIRED** argument. Provide a valid CSV file as input through **`--file`** which lists the GNSS station IDs (ID), lat/lon coordinates (Lat,Lon), dates in YYYY-MM-DD format (Date), and the desired data field in units of meters.

Note that the complementary **`raiderDownloadGNSS.py`** format generates such a primary CSV file named **`CombinedGPS_ztd.csv`** that contains all such fields and is already formatted as expected by **`raiderStats.py`**. Please refer to the accompanying **`raiderDownloadGNSS/raiderDownloadGNSS_tutorial.ipynb `** for more details and practical examples.

#### Data column name (**`--column_name COL_NAME`**)

Specify name of data column in input CSV file through **`--column_name `** that you wish to perform statistical analyses on. Input assumed to be in units of meters.

Default input column name set to **`ZTD`**, the name assigned to tropospheric zenith delays populated under the **`CombinedGPS_ztd.csv`** file generated through **`raiderDownloadGNSS.py`**

#### Data column unit (**`--unit UNIT`**)

Specify unit of input data column through **`--unit `**. Again, input assumed to be in units of meters so it will be converted into meters if not already in meters.

### 2. Run parameters
<a id='overview_2'></a>

#### Output directory (**`--workdir WORKDIR`**)

Specify directory to deposit all outputs into with **`--workdir`**. Absolute and relative paths are both supported.

By default, outputs will be deposited into the current working directory where the program is launched.

#### Number of CPUs to be used (**`--cpus NUMCPUS`**)

Specify number of cpus to be used for multiprocessing with **`--cpus`**. For most cases, multiprocessing is essential in order to access data and perform statistical analyses within a reasonable amount of time.

May specify **`--cpus all`** at your own discretion in order to leverage all available CPUs on your system.

By default 8 CPUs will be used.

#### Verbose mode (**`--verbose`**)

Specify **`--verbose`** to print all statements through entire routine. 

Additionally, verbose mode will generate variogram plots per gridded station AND time-slice.

### 3. Optional controls for spatiotemporal subsetting
<a id='overview_3'></a>

#### Geographic bounding box (**`--bounding_box BOUNDING_BOX`**)

An area of interest may be specified as `SNWE` coordinates using the **`--bounding_box`** option. Coordinates should be specified as a space delimited string surrounded by quotes. The common intersection between the user-specified spatial bounds and the spatial bounds computed from the station locations in the input file is then passed. This example below would restrict the analysis to stations over northern California:
**`--bounding_box '36 40 -124 -119'`**

If no area of interest is specified, by default the spatial bounds computed from the station locations in the input file as passed.

#### Gridcell spacing (**`--spacing SPACING`**)

Specify degree spacing of grid-cells for statistical analyses through **`--spacing`**

By default grid-cell spacing is set to 1°. If the specified grid-cell spacing is not a multiple of the spatial bounds of the dataset, the grid-cell spacing again defaults back to  1°.

#### Subset in time (**`-ti TIMEINTERVAL`**)

Define temporal bounds with **`-ti TIMEINTERVAL`** by specifying earliest YYYY-MM-DD date followed by latest date YYYY-MM-DD. For example: **`-ti 2018-01-01,2019-01-01`**

By default, bounds set to earliest and latest time found in input file.

#### Seasonal interval (**`-si SEASONALINTERVAL`**)

Define subset in time by a specific interval for each year (i.e. seasonal interval) with **`-si SEASONALINTERVAL`** by specifying earliest MM-DD time followed by latest MM-DD time. For example: **`-si '03-21 06-21'`**

### 4. Supported types of individual station scatter-plots
<a id='overview_4'></a>

#### Plot station distribution (**`--station_distribution`**)

Illustrate each individual station with black markers.

#### Plot mean tropospheric zenith delay by station (**`--station_delay_mean`**)

Illustrate the tropospheric zenith delay mean for each station with a **`hot`** colorbar.

#### Plot standard deviation of tropospheric zenith delay by station  (**`--station_delay_stdev`**)

Illustrate the tropospheric zenith delay standard deviation for each station with a **`hot`** colorbar.

### 5. Supported types of gridded station plots
<a id='overview_5'></a>

#### Plot gridded station heatmap (**`--grid_heatmap`**)

Illustrate heatmap of gridded station array with a **`hot`** colorbar.

#### Plot gridded mean tropospheric zenith delay (**`--grid_delay_mean`**)

Illustrate gridded tropospheric zenith delay mean with a **`hot`** colorbar.

#### Plot gridded standard deviation of tropospheric zenith delay  (**`--grid_delay_stdev`**)

Illustrate gridded tropospheric zenith delay standard deviation with a **`hot`** colorbar.

### 6. Supported types of variogram plots
<a id='overview_6'></a>

#### Plot variogram (**`--variogramplot`**)

Passing **`--variogramplot`** toggles plotting of gridded station variogram, where gridded sill and range values for the experimental variogram fits are illustrated.

#### Apply experimental fit to binned variogram (**`--binnedvariogram`**)

Pass **`--binnedvariogram`** to apply experimental variogram fit to total binned empirical variograms for each time slice. 

Default is to pass total unbinned empiricial variogram.

#### Save variogram figures per time-slice (**`--variogram_per_timeslice`**)

Specify **`--variogram_per_timeslice`** to generate variogram plots per gridded station AND time-slice.

If option not toggled, then variogram plots are only generated per gridded station and spanning entire time-span.

### 7. Optional controls for plotting
<a id='overview_7'></a>

#### Plot format (**`--plot_format PLOT_FMT`**)

File format for saving plots. Default is PNG.

#### Colorbar bounds (**`--color_bounds CBOUNDS`**)

Set lower and upper-bounds for plot colorbars. For example: **`--color_bounds '0 100'`**

By default set to the dynamic range of the data.

#### Colorbar percentile limits (**`--colorpercentile COLORPERCENTILE COLORPERCENTILE`**)

Set lower and upper percentile for plot colorbars. For example: **`--colorpercentile '30 100'`**

By default set to 25% and 95%, respectively.

#### Variogram density threshold (**`--densitythreshold DENSITYTHRESHOLD`**)

For variogram plots, a given grid-cell is only valid if it contains this specified threshold of stations. 

By default set to 10 stations.

#### Superimpose individual stations over gridded array (**`--stationsongrids`**)

In gridded plots, superimpose your gridded array with a scatterplot of station locations.

#### Draw gridlines (**`--drawgridlines`**)

In gridded plots, draw gridlines.

#### Generate all supported plots (**`--plotall`**)

Generate all supported plots, as outlined under sections #4, #5, and #6 above.

## Download prerequisite GNSS station location and tropospheric zenith delay information with the **`raiderDownloadGNSS.py`** program
<a id='downloads'></a>

Virtually access GNSS station location and zenith delay information for the years '2018,2019', at a UTC time of day 'HH:MM:SS' of '00:00:00', and across a geographic bounding box '36 40 -124 -119' spanning over Northern California.

The footprint of the specified geographic bounding box is again depicted in **Fig. 1**.

In addition to querying for multiple years, we will also experiment with using the maximum number of allowed CPUs to save some time! Recall again that the default number of CPUs used for parallelization is 8.

Note these features and similar examples are outlined in more detail in the companion notebook **`raiderDownloadGNSS/raiderDownloadGNSS_tutorial.ipynb`**

In [None]:
!raiderDownloadGNSS.py --out products --years '2018,2019' --returntime '00:00:00' --bounding_box '36 40 -124 -119' --cpus all

All of the extracted tropospheric zenith delay information stored under **`GPS_delays`** is concatenated with the GNSS station location information stored under **`gnssStationList_overbbox.csv`** into a primary comprehensive file **`CombinedGPS_ztd.csv`**

**`CombinedGPS_ztd.csv`** may in turn be directly used to perform basic statistical analyses using **`raiderStats.py`**.

<img src="support_docs/bbox_footprint.png" alt="footprint" width="700">
<center><b>Fig. 1</b> Footprint of geopraphic bounding box used in examples 1 and 2. </center>

## Examples of the **`raiderStats.py`** program
<a id='examples'></a>

### Example 1. Generate all individual station scatter-plots, as listed under [section #4](#overview_4) <a id='example_1'></a>

Using the file **`CombinedGPS_ztd.csv`** generated by **`raiderDownloadGNSS.py`** as input, produce plots illustrating station distribution, mean tropospheric zenith delay by station, and standard deviation of tropospheric zenith delay by station.

Restrict the temporal span of the analyses to all data acquired between 2018-01-01 and 2019-12-31, and restrict the spatial extent to a geographic bounding box '36 40 -124 -119' spanning over Northern California.

The footprint of the specified geographic bounding box is depicted in **Fig. 1**.

These basic spatiotemporal constraints will be inherited by all successive examples.

In [None]:
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --station_distribution --station_delay_mean --station_delay_stdev

Now we can take a look at the generated products:

In [None]:
!ls maps/figures

Here we visualize the spatial distribution of stations (*ZTD_station_distribution.png*) as black markers.

<img src="maps/figures/ZTD_station_distribution.png" alt="ZTD_station_distribution" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --station_distribution
```

Here we visualize the mean tropospheric zenith delay by station (*ZTD_station_delay_mean.png*) with a **`hot`** colorbar. 

<img src="maps/figures/ZTD_station_delay_mean.png" alt="ZTD_station_delay_mean" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --station_delay_mean
```

Here we visualize the standard deviation of tropospheric zenith delay by station (*ZTD_station_delay_stdev.png*) with a **`hot`** colorbar.  

<img src="maps/figures/ZTD_station_delay_stdev.png" alt="ZTD_station_delay_stdev" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --station_delay_stdev
```

### Example 2. Generate all gridded station plots, as listed under [section #5](#overview_5) <a id='example_2'></a>

Produce plots illustrating gridded station distribution, gridded mean tropospheric zenith delay, and gridded standard deviation of tropospheric zenith delay.

In [None]:
!rm -rf maps
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --grid_heatmap --grid_delay_mean --grid_delay_stdev

Now we can take a look at the generated products:

In [None]:
!ls maps/figures

Here we visualize the heatmap of gridded station array (*ZTD_grid_heatmap.png*) with a **`hot`** colorbar.

Note that the colorbar bounds are saturated, which demonstrates the utility of plotting options outlined under section #7 such as **`--color_bounds`** and **`--colorpercentile`**

<img src="maps/figures/ZTD_grid_heatmap.png" alt="ZTD_grid_heatmap" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --grid_heatmap
```

Here we visualize the gridded mean tropospheric zenith delay (*ZTD_grid_delay_mean.png*) with a **`hot`** colorbar.

<img src="maps/figures/ZTD_grid_delay_mean.png" alt="ZTD_grid_delay_mean" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --grid_delay_mean
```

Here we visualize the gridded standard deviation of tropospheric zenith delay (*ZTD_grid_delay_stdev.png*) with a **`hot`** colorbar.

<img src="maps/figures/ZTD_grid_delay_stdev.png" alt="ZTD_grid_delay_stdev" width="700">

To generate this figure alone, run:
```
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --grid_delay_stdev
```

### Example 3. Generate gridded mean tropospheric zenith delay plot, with stations superimposed <a id='example_3'></a>

Produce plot illustrating gridded mean tropospheric zenith delay, superimposed with individual station locations.

Additionally, subset data in time for spring. I.e. **`'03-21 06-21'`**

In [None]:
!rm -rf maps
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --seasonalinterval '03-21 06-21' --grid_delay_mean --stationsongrids

Now we can take a look at the generated product:

In [None]:
!ls maps/figures

Here we visualize the gridded mean tropospheric zenith delay (*ZTD_grid_delay_mean.png*) with a **`hot`** colorbar, with superimposed station locations denoted by blue markers.

<img src="maps/figures/ZTD_grid_delay_mean.png" alt="ZTD_grid_delay_mean" width="700">

### Example 4. Generate variogram plots <a id='example_4'></a>

Produce plots illustrating empirical/experimental variogram fits per gridded station and time-slice (**`-variogram_per_timeslice`**) and also spanning the entire time-span. Plots of gridded station experimental variogram-derived sill and range values also generated.

Additionally, subset data in time for spring. I.e. **`'03-21 06-21'`**

Finally, use the maximum number of allowed CPUs to save some time.

In [None]:
!rm -rf maps
!raiderStats.py --file products/CombinedGPS_ztd.csv -w maps -b '36 40 -124 -119' -ti '2018-01-01 2019-12-31' --seasonalinterval '03-21 06-21' -variogramplot -variogram_per_timeslice --cpus all

Now we can take a look at the generated variograms:

In [None]:
!ls maps/variograms

There are several subdirectories corresponding to each grid-cell that each contain empirical and experimental variograms generated for each time-slice (e.g. **`grid6_timeslice20180321_justEMPvariogram.eps `** and **`grid6_timeslice20180321_justEXPvariogram.eps `**, respectively) and across the entire sampled time period (**`grid6_timeslice20180321–20190621_justEMPvariogram.eps `** and **`grid6_timeslice20180321–20190621_justEXPvariogram.epss `**, respectively).

Recall that the former pair of empirical/experimental variograms per time-slice are generated only if the **`---variogram_per_timeslice`** option is toggled. By default only the latter two pair of empirical/experimental variograms spanning the entire time-span are generated.

Here we visualize the total empirical variogram corresponding to the entire sampled time period for grid-cell 6 in the array (*grid6_timeslice20180321–20190621_justEMPvariogram.eps*). 

<img src="maps/variograms/grid6_timeslice20180321–20190621_justEMPvariogram.eps" alt="justEMPvariogram" width="700">

Here we visualize the total experimental variogram corresponding to the entire sampled time period for grid-cell 6 in the array (*grid6_timeslice20180321–20190621_justEXPvariogram.eps*). 

<img src="maps/variograms/grid6_timeslice20180321–20190621_justEXPvariogram.eps" alt="justEXPvariogram" width="700">

The central coordinates for all grid-nodes that satisfy the specified station density threshold (**`--densitythreshold`**, by default 10 stations per grid-cell) for variogram plots are stored in a lookup table:

In [None]:
!head maps/variograms/gridlocation_lookup.txt

Now we can take a look at the other generated figures:

In [None]:
!ls maps/figures

Here we visualize the gridded experimental variogram range (*ZTD_range_heatmap.png*) with a **`hot`** colorbar.

<img src="maps/figures/ZTD_range_heatmap.png" alt="ZTD_range_heatmap" width="700">

Here we visualize the gridded experimental variogram sill (*ZTD_sill_heatmap.png*) with a **`hot`** colorbar.

<img src="maps/figures/ZTD_sill_heatmap.png" alt="ZTD_sill_heatmap" width="700">