# NavData Operations

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Stanford-NavLab/gnss_lib_py/blob/main/notebooks/tutorials/navdata/operations.ipynb)

This tutorial shows how to perform funtional operations on instances of 
the `NavData` class, such as looping across a time row in a `NavData` instance,
concatenating multiple instances together, sorting a `NavData` instance 
based on the values in a particular row, and interpolating any NaN values.

In [1]:
import gnss_lib_py as glp

In [2]:
# Get data path of example file
glp.make_dir("../data")
!wget https://raw.githubusercontent.com/Stanford-NavLab/gnss_lib_py/main/notebooks/tutorials/data/myreceiver.csv --quiet -nc -O "../data/myreceiver.csv"
data_path = "../data/myreceiver.csv"

Create a NavData class from a csv file

In [3]:
navdata = glp.NavData(csv_path=data_path)
print(navdata)

   myTimestamp  mySatId  myPseudorange
0           10       10      270000001
1           10       14      270000007
2           10        7      270000004
3           10        3      270000005
4           11       10      270000002
5           11       14      270000008
6           11        7      270000003
7           11        3      270000004


## Looping across a Time Row

You can use the `NavData.loop_time()` method to loop over groups of data that belong to same time stamp.

In [4]:
for timestamp, delta_t, navdata_subset in glp.loop_time(navdata,'myTimestamp'):
    print('Current timestamp: ', timestamp)
    print('Difference between current and future time step', delta_t)
    print('Current group of data')
    print(navdata_subset)

Current timestamp:  10
Difference between current and future time step 0
Current group of data
   myTimestamp  mySatId  myPseudorange
0           10       10      270000001
1           10       14      270000007
2           10        7      270000004
3           10        3      270000005
Current timestamp:  11
Difference between current and future time step 1
Current group of data
   myTimestamp  mySatId  myPseudorange
0           11       10      270000002
1           11       14      270000008
2           11        7      270000003
3           11        3      270000004


## Concatenating NavData Instances

Use the `glp.concat()` method to concatenate two or more `NavData` instances. Each type of data is included in a row, so adding new rows with ``axis=0``, means adding new types of data. Concat requires that the new NavData matches the length of the existing NavData. Row concatenation assumes the same ordering within rows across both NavData instances (e.g. sorted by timestamp) and does not perform any matching/sorting itself. If the concatenating navdatas share a column name with ``axis=0`` then concat will add a suffix to create a unique row name.

In [5]:
double_navdata = glp.concat(navdata, navdata, axis=0)
double_navdata

   myTimestamp  mySatId  myPseudorange  myTimestamp_0  mySatId_0  \
0           10       10      270000001             10         10   
1           10       14      270000007             10         14   
2           10        7      270000004             10          7   
3           10        3      270000005             10          3   
4           11       10      270000002             11         10   
5           11       14      270000008             11         14   
6           11        7      270000003             11          7   
7           11        3      270000004             11          3   

   myPseudorange_0  
0        270000001  
1        270000007  
2        270000004  
3        270000005  
4        270000002  
5        270000008  
6        270000003  
7        270000004  

You can also concatenate new data to existing rows with ``axis=1``. If the row names of the new NavData instance don't match the row names ofthe existing NavData instance, the mismatched values will be filled with np.nan.

In [6]:
glp.concat(double_navdata, navdata, axis=1)

    myTimestamp  mySatId  myPseudorange  myTimestamp_0  mySatId_0  \
0            10       10      270000001           10.0       10.0   
1            10       14      270000007           10.0       14.0   
2            10        7      270000004           10.0        7.0   
3            10        3      270000005           10.0        3.0   
4            11       10      270000002           11.0       10.0   
5            11       14      270000008           11.0       14.0   
6            11        7      270000003           11.0        7.0   
7            11        3      270000004           11.0        3.0   
8            10       10      270000001            NaN        NaN   
9            10       14      270000007            NaN        NaN   
10           10        7      270000004            NaN        NaN   
11           10        3      270000005            NaN        NaN   
12           11       10      270000002            NaN        NaN   
13           11       14      2700

## Sorting a NavData Instance based on Row Values

An entire `NavData` instance can be sorted based on the values in a specified
row or to match a previously determined order.

This operation can be performed in place using the argument `inplace = True`.
In this case, the `sort` function returns `None` and modifies the input `NavData`
in existing memory. If `inplace=False`, a new sorted `NavData` is returned.

In [7]:
# Generate a new row with random numbers
import numpy as np
new_row = np.arange(len(navdata))
np.random.shuffle(new_row)
#Add a new row with random numbers to the existing NavData
navdata['random_row'] = new_row
print('New NavData \n', navdata)


New NavData 
    myTimestamp  mySatId  myPseudorange  random_row
0           10       10      270000001           5
1           10       14      270000007           7
2           10        7      270000004           4
3           10        3      270000005           0
4           11       10      270000002           2
5           11       14      270000008           6
6           11        7      270000003           3
7           11        3      270000004           1


Sort in ascending order

In [8]:
print('Ascending order sorted NavData')
print(glp.sort(navdata, order='random_row'))

Ascending order sorted NavData
   myTimestamp  mySatId  myPseudorange  random_row
0           10        3      270000005           0
1           11        3      270000004           1
2           11       10      270000002           2
3           11        7      270000003           3
4           10        7      270000004           4
5           10       10      270000001           5
6           11       14      270000008           6
7           10       14      270000007           7


Sort in descending order

In [9]:
print('Descending order sorted NavData')
print(glp.sort(navdata, order='random_row', ascending=False))

Descending order sorted NavData
   myTimestamp  mySatId  myPseudorange  random_row
0           10       14      270000007           7
1           11       14      270000008           6
2           10       10      270000001           5
3           10        7      270000004           4
4           11        7      270000003           3
5           11       10      270000002           2
6           11        3      270000004           1
7           10        3      270000005           0


Sort using indices given by an externally determined order

In [10]:
#find indices corresponding to external order of sorting
sort_order = np.argsort(new_row)
print('Sorted using externally determined indices')
print(glp.sort(navdata, ind=sort_order))

Sorted using externally determined indices
   myTimestamp  mySatId  myPseudorange  random_row
0           10        3      270000005           0
1           11        3      270000004           1
2           11       10      270000002           2
3           11        7      270000003           3
4           10        7      270000004           4
5           10       10      270000001           5
6           11       14      270000008           6
7           10       14      270000007           7


## Interpolate NaN values in a NavData Row

Some algorithms might not return results for some time instances or samples, leading to a situation where the `NaN` values have to be replaced with values interpolated between known topics.
In this case, we can use the `glp.interpolate` function with corresponding attributes for the x-axis and y-axis values to interpolate and replace the `NaN` values.

In [11]:
#Create a new y-axis row with some NaN values
nan_row_x = np.arange(len(navdata)).astype(np.float64)
nan_row_y = np.arange(len(navdata)).astype(np.float64)
nan_indices = [1, 3, 4, 6]
nan_row_y[nan_indices] = np.nan
#Set these rows in the navdata
navdata['nan_row_x'] = nan_row_x
navdata['nan_row_y'] = nan_row_y
print('NavData with NaN values \n', navdata)

#Interpolate the NaN values
glp.interpolate(navdata, 'nan_row_x', 'nan_row_y', inplace=True)
print('NavData values with interpolated values \n', navdata)



NavData with NaN values 
    myTimestamp  mySatId  myPseudorange  random_row  nan_row_x  nan_row_y
0           10       10      270000001           5        0.0        0.0
1           10       14      270000007           7        1.0        NaN
2           10        7      270000004           4        2.0        2.0
3           10        3      270000005           0        3.0        NaN
4           11       10      270000002           2        4.0        NaN
5           11       14      270000008           6        5.0        5.0
6           11        7      270000003           3        6.0        NaN
7           11        3      270000004           1        7.0        7.0
NavData values with interpolated values 
    myTimestamp  mySatId  myPseudorange  random_row  nan_row_x  nan_row_y
0           10       10      270000001           5        0.0        0.0
1           10       14      270000007           7        1.0        1.0
2           10        7      270000004           4      

## Find row names that correspond to a particular pattern

In some cases, row names follow a pattern but the exact row name might be unknown.
These cases might happen when the row name also mentions the algorithm used to obtain the estimate but downstream processing does not care about this distinction.
In this case, we can use the `find_wildcard_indexes` function to extract the relavant row names.

This method returns a dictionary where the wildcard query is the key and 
the found row names are the columns.

In [12]:
print(glp.find_wildcard_indexes(navdata, 'nan_*_x', max_allow=1))
navdata['nan_col_x'] = 0
print(glp.find_wildcard_indexes(navdata, 'nan_*_x'))

{'nan_*_x': ['nan_row_x']}
{'nan_*_x': ['nan_row_x', 'nan_col_x']}


This function can also be used to exclude certain wildcards or allow only
one row name using functional attributes. If `max_allow=1` is used when
there are more than `max_allow` entries, the function will raise a `KeyError`

In [14]:
try:
    glp.find_wildcard_indexes(navdata, 'nan_*_x', max_allow=1)
except KeyError as excp:
    print('Error:', excp)

Error: 'More than 1 possible row indexes for nan_*_x'
