**INTRODUCTION**

RTKLIB is an open source software library used for (among other things) calculating GNSS solutions from raw observation data.  It was originally written by Tomoji Takasu of the Tokyo University of Marine Science and Technology, but there are now multiple forks available, including the demo5 fork which I maintain. When used to generate PPK (post-processing kinematic) solutions it has two advantages over the baseline solutions provided by Google.  First of all, it uses the carrier phase observations (ADR) as well as the pseduorange observations.  The carrier phase observations are more difficult to use but also have smaller errors than the pseudorange observations.  Secondly, the PPK solutions are differential, relative to a nearby known base location, rather than absolute like the Google solutions.  The differential solution allows us to difference raw observations between the rover and base which effectively cancels most of the satellite orbital, clock, and atmospheric errors, resulting in more accurate solutions.  

I describe this process in more detail in [this blog post](https://rtklibexplorer.wordpress.com/2022/01/10/google-smartphone-decimeter-challenge/) in which I share my experience working with last year's GSDC data after the competition was over.  It includes a link to download the code I used to generate RTKLIB solutions for last year's data.  Submitting these solutions to Kaggle (after the competition was over) resulted in a score of 2.15 meters which put it into fifth place on the final Private Leaderboard.

I would like to encourage use of RTKLIB in this year's competition and so I am sharing an updated version of the previous code to duplicate my results with this year's data and the most recent RTKLIB code. This code gives a set of solutions that score 3.135 on this year's Public Leaderboad.  The new code is only slightly modified from the previous version, so anyone interested in using RTKLIB for this challenge can start by becoming familiar with the previous code on the previous data.  I would strongly recommend starting with the older code and data because it comes as a complete package and is easier to get results with than what I am presenting here.

My hope is to provide a platform which will allow competitors to jump right into extending the existing GNSS theory rather than having to build a solution from scratch. In addition to the C version of RTKLIB I described in the above post, I have recently created an all python subset of RTKLIB for PPK solutions. This runs somewhat slower than the C code but does make an easier development platform. In this notebook I will provide code and guidelines for working with the C code version of RTKLIB.  I will also describe working with the Python version in a separate notebook after I complete this one.

In the interestes of getting this out sooner rather than later, this will not be the "push the button and you get an answer" kind of notebook.  It will be more like a set of handwritten notes scribbled down quickly while actually running the experiment.  In it's current form you will need to download the pieces of code to your own system, put them together and run locally since it relies on open source compiled code.  In the future I hope to make this more user friendly, but for now it is assumed that you are fairly familiar with python and can debug simple issues that may crop up when trying to follow these instructions.

I ran this exercise on a Windows PC but you should be able to run it on linux as well.  In general, the top of each file will have a set of input parameters.  Unless your folder names and paths are identical to mine, you will often need to update these before running.

The folder structure I use in this solution is:

GSDC_2022

    config
    
    data
    
      test
    
      train
      
    python
   
      android_rinex
      
      rtklib-py
      
    rtklib

It will be easier to follow these instuctions if you use the same folder structure.

**Step 1: Retrieve base observation and satellite navigation files**

Since these are differential solutions, we will need raw observation measurements from a nearby base station for each data set.  Fortunately, these are available from the National Geodetic Survey (NGS) website.  We will also need satellite orbital data for each data set for the GPS, GLONASS, and Galileo constellations.  These are available from multiple websites.  I chose to retrieve them from the UNAVCO site, in part because these files include Galileo navigation data as well as GPS and GLONASS.  Some of the other sources include only GPS and GLONASS.

Last year it was sufficient to use a single base station for every data set since they were all located in a small geographic area.  This year there are data sets from the Bay Area and from the LA area, so we will need to select the appropriate base station for each data set and also use the correct location for that base station in the solution.

The code below simply retrieves the base and navigation data for the full day corresponding to the starting time of each data set.  This works fine for the test data set since all data sets start and end on the same UTC day but but in the training set there are multiple data sets that start in one (UTC) day and finish in the next day.  I will be demonstrating this exercise on the test data so will not worry about this issue but if you are trying to retrieve this data for the training data you will either need to eliminate the multi-day sets or manually download a file with the correct starting and stopping times.  This is most easily done from the "User Friendly CORS" page on the NGS site.

The observation files are doubly compressed.  They first need to be decompressed with gzip and then with crx2rnx.  This second step translates from compressed rinex to uncompressed and requires an executable file from RTKLIB.

You can download the RTKLIB executables for Windows at https://github.com/rtklibexplorer/RTKLIB/releases.  Put them in the GSDC_2022/rtklib folder.  Note that these are for the demo5 fork of RTKLIB which I maintain.  You can use any fork of RTKLIB for this step, but later when we are calulating the solutions, you will need to use the demo5 fork.  If you are running in Linux, you will need to build your own executables from the source code.  This is decribed at https://rtklibexplorer.wordpress.com/2020/12/18/building-rtklib-code-in-linux/



In [None]:
""" 
get_base_data.py - retrieve base observation and navigation data for the
    2022 GSDC competition 
"""

import os
from datetime import datetime
import requests
import gzip
from glob import glob


# Input parameters

datadir = r'C:\gps\GSDC_2022\data\test'
stas = ['slac', 'vdcy', 'p222']  # Bay Area, LA. backup for Bay Area
obs_url_base = 'https://geodesy.noaa.gov/corsdata/rinex'
nav_url_base = 'https://data.unavco.org/archive/gnss/rinex3/nav' 
nav_file_base = 'AC0300USA_R_'  # 20210060000_01D_MN.rnx.gz

# Make sure you have downloaded this executable before running this code
crx2rnx_bin = r'C:\gps\GSDC_2022\rtklib\crx2rnx.exe'


os.chdir(datadir)

for dataset in os.listdir():
    if not os.path.isdir(dataset):
        continue
    print(dataset)
    ymd = dataset.split('-')
    doy = datetime(int(ymd[0]), int(ymd[1]), int(ymd[2])).timetuple().tm_yday # get day of year
    doy = str(doy).zfill(3)
    
    if len(glob(os.path.join(dataset,'*.*o'))) == 0:
        # get obs data
        i = 1 if '-LAX-' in dataset else 0  # use different base for LA
        fname = stas[i] + doy + '0.' + ymd[0][2:4] + 'd.gz'
        url = '/'.join([obs_url_base, ymd[0], doy, stas[i], fname])
        try:
            obs = gzip.decompress(requests.get(url).content) # get obs and decompress
            # write obs data
            open(os.path.join(dataset, fname[:-3]), "wb").write(obs)
        except:
            # try backup CORS station
            i += 2
            fname = stas[i] + doy + '0.' + ymd[0][2:4] + 'd.gz'
            url = '/'.join([obs_url_base, ymd[0], doy, stas[i], fname])
            try:
                obs = gzip.decompress(requests.get(url).content) # get obs and decompress
                # write obs data
                open(os.path.join(dataset, fname[:-3]), "wb").write(obs)
            except:
                print('Fail obs: %s' % dataset)
            
        # convert crx to rnx
        crx_files = glob(os.path.join(dataset,'*.*d'))
        if len(crx_files) > 0:
            os.system(crx2rnx_bin + ' ' + crx_files[0])
    
    # get nav data
    if len(glob(os.path.join(dataset,'*.rnx'))) > 0:
           continue  # file already exists
    fname = nav_file_base + ymd[0] + doy + '0000_01D_MN' + '.rnx.gz'
    url = '/'.join([nav_url_base, ymd[0], doy, fname])
    try:
        obs = gzip.decompress(requests.get(url).content) # get obs and decompress    
        # write nav data
        open(os.path.join(dataset, fname[:-3]), "wb").write(obs)
    except:
        print('Fail nav: %s' % dataset)




**Step 2: Download android_rinex library and create RTKLIB config file**

You will need the android_rinex library for converting the raw Android observation files to rinex format.  RTKLIB post processing solutions require that the input files be in the rinex format.  You will need to use my fork of this library which is available at the address shown in the code below.  

Put this in the GSDC_2022/python/android_rinex folder.  (Temporary workaround: There is currently a path issue when running in multi-processing mode.  For now, you will need to copy the files from the android_rinex/src folder into the GSDC_2022/python folder)

You will also need a configuration file for the solution. Copy the config file below to the GSDC_2022/config folder with a file name of ppk_phone_0510.conf

In [None]:

!git clone https://github.com/rtklibexplorer/android_rinex.git


In [None]:
# ppk_phone_0510.conf - config file for RTKLIB PPK solution

pos1-posmode       =kinematic  # (0:single,1:dgps,2:kinematic,3:static,4:static-start,5:movingbase,6:fixed,7:ppp-kine,8:ppp-static,9:ppp-fixed)
pos1-frequency     =l1+l2+l5   # (1:l1,2:l1+l2,3:l1+l2+l5,4:l1+l2+l5+l6)
pos1-soltype       =combined-nophasereset # (0:forward,1:backward,2:combined,3:combined-nophasereset)
pos1-elmask        =15         # (deg)
pos1-snrmask_r     =on         # (0:off,1:on)
pos1-snrmask_b     =on         # (0:off,1:on)
pos1-snrmask_L1    =24,24,24,24,24,24,24,24,24
pos1-snrmask_L2    =34,34,34,34,34,34,34,34,34
pos1-snrmask_L5    =24,24,24,24,24,24,24,24,24
pos1-dynamics      =on         # (0:off,1:on)
pos1-tidecorr      =off        # (0:off,1:on,2:otl)
pos1-ionoopt       =brdc       # (0:off,1:brdc,2:sbas,3:dual-freq,4:est-stec,5:ionex-tec,6:qzs-brdc)
pos1-tropopt       =saas       # (0:off,1:saas,2:sbas,3:est-ztd,4:est-ztdgrad)
pos1-sateph        =brdc       # (0:brdc,1:precise,2:brdc+sbas,3:brdc+ssrapc,4:brdc+ssrcom)
pos1-posopt1       =off        # (0:off,1:on)
pos1-posopt2       =off        # (0:off,1:on)
pos1-posopt3       =off        # (0:off,1:on,2:precise)
pos1-posopt4       =off        # (0:off,1:on)
pos1-posopt5       =off        # (0:off,1:on)
pos1-posopt6       =off        # (0:off,1:on)
pos1-exclsats      =           # (prn ...)
pos1-navsys        =13         # (1:gps+2:sbas+4:glo+8:gal+16:qzs+32:bds+64:navic)
pos2-armode        =off        # (0:off,1:continuous,2:instantaneous,3:fix-and-hold)
pos2-gloarmode     =off        # (0:off,1:on,2:autocal,3:fix-and-hold)
pos2-bdsarmode     =on         # (0:off,1:on)
pos2-arfilter      =on         # (0:off,1:on)
pos2-arthres       =3
pos2-arthresmin    =3
pos2-arthresmax    =3
pos2-arthres1      =0.05
pos2-arthres2      =0
pos2-arthres3      =1e-09
pos2-arthres4      =1e-05
pos2-varholdamb    =0.1        # (cyc^2)
pos2-gainholdamb   =0.01
pos2-arlockcnt     =5
pos2-minfixsats    =4
pos2-minholdsats   =5
pos2-mindropsats   =10
pos2-arelmask      =15         # (deg)
pos2-arminfix      =10
pos2-armaxiter     =1
pos2-elmaskhold    =15         # (deg)
pos2-aroutcnt      =4
pos2-maxage        =30         # (s)
pos2-syncsol       =off        # (0:off,1:on)
pos2-slipthres     =0.1        # (m)
pos2-dopthres      =5          # (m)
pos2-rejionno      =1          # (m)
pos2-rejgdop       =30
pos2-niter         =1
pos2-baselen       =0          # (m)
pos2-basesig       =0          # (m)
out-solformat      =llh        # (0:llh,1:xyz,2:enu,3:nmea)
out-outhead        =on         # (0:off,1:on)
out-outopt         =on         # (0:off,1:on)
out-outvel         =off        # (0:off,1:on)
out-timesys        =gpst       # (0:gpst,1:utc,2:jst)
out-timeform       =tow        # (0:tow,1:hms)
out-timendec       =3
out-degform        =deg        # (0:deg,1:dms)
out-fieldsep       =
out-outsingle      =off        # (0:off,1:on)
out-maxsolstd      =0          # (m)
out-height         =ellipsoidal # (0:ellipsoidal,1:geodetic)
out-geoid          =internal   # (0:internal,1:egm96,2:egm08_2.5,3:egm08_1,4:gsi2000)
out-solstatic      =all        # (0:all,1:single)
out-nmeaintv1      =0          # (s)
out-nmeaintv2      =0          # (s)
out-outstat        =residual   # (0:off,1:state,2:residual)
stats-eratio1      =300
stats-eratio2      =300
stats-eratio5      =100
stats-errphase     =0.003      # (m)
stats-errphaseel   =0.003      # (m)
stats-errphasebl   =0          # (m/10km)
stats-errdoppler   =1          # (Hz)
stats-snrmax       =52         # (dB.Hz)
stats-errsnr       =0          # (m)
stats-errrcv       =0          # ( )
stats-stdbias      =30         # (m)
stats-stdiono      =0.03       # (m)
stats-stdtrop      =0.3        # (m)
stats-prnaccelh    =3          # (m/s^2)
stats-prnaccelv    =1          # (m/s^2)
stats-prnbias      =0.01       # (m)
stats-prniono      =0.001      # (m)
stats-prntrop      =0.0001     # (m)
stats-prnpos       =0          # (m)
stats-clkstab      =5e-12      # (s/s)
ant1-postype       =llh        # (0:llh,1:xyz,2:single,3:posfile,4:rinexhead,5:rtcm,6:raw)
ant1-pos1          =0          # (deg|m)
ant1-pos2          =0          # (deg|m)
ant1-pos3          =0          # (m|m)
ant1-anttype       =
ant1-antdele       =0          # (m)
ant1-antdeln       =0          # (m)
ant1-antdelu       =0          # (m)
ant2-postype       =posfile    # (0:llh,1:xyz,2:single,3:posfile,4:rinexhead,5:rtcm,6:raw)
ant2-pos1          =0          # (deg|m)
ant2-pos2          =0          # (deg|m)
ant2-pos3          =0          # (m|m)
ant2-anttype       =
ant2-antdele       =0          # (m)
ant2-antdeln       =0          # (m)
ant2-antdelu       =0          # (m)
ant2-maxaveep      =1
ant2-initrst       =on         # (0:off,1:on)
misc-timeinterp    =on         # (0:off,1:on)
misc-sbasatsel     =0          # (0:all)
misc-rnxopt1       =
misc-rnxopt2       =
misc-pppopt        =
file-satantfile    =
file-rcvantfile    =
file-staposfile    =C:\gps\GSDC_2022\config\bases.sta
file-geoidfile     =
file-ionofile      =
file-dcbfile       =
file-eopfile       =
file-blqfile       =
file-tempdir       =
file-geexefile     =
file-solstatfile   =
file-tracefile     =


Step 3: Setup base station locations

Since we are dealing with multiple base stations this year, we need a separate file containing the different base locations.  Create a file named bases.sta in the C:\gps\GSDC_2022\config folder and copy the lines below into this file.  RTKLIB will use the first four characters of the base station file to select the correct location from this list.  Note that if you don't use the exact same file name and folder name as I used, you will need to modify the "file-staposfile" parameter in the config file above.

In [None]:
%  LATITUDE(DEG) LONGITUDE(DEG) HIGHT(M)      NAME
  37.4165169989997  -122.204262    63.69      SLAC
  34.1785656499996  -118.220000564  318.16    VDCY
  37.539239559      -122.083264600  53.52     P222

**Step 4: Convert raw files and run PPK solutions**

As configured in the header below, this code will convert the raw Android files to rinex and run the RTKLIB PPK solutions for the test set.

In the main code at the bottom of the file, the execution can be set up as either sequential or multiprocessing by commenting or uncommenting the appropriate lines, both for the file conversion, and for running the solutions.  It is easier to debug when run sequentially but is much slower.  I recommend running each step sequentially until you are convinced it's working, then switch it to multiprocessing.  

The solution files will all be tagged with the "soltag_rtklib" parameter defined in the header so you can use this to keep separate the results from multiple runs.  They will be in the "supplemental" folders inside each phone folder.

This code is setup to run either the C version of RTKLIB or the python version or both.  In this notebook, I am only addressing the C version, please see my other notebook if you would like to run the python code.

Debugging hint:  If the code runs without error but does not produce any solution files then the error is very likely occurring during the call to the rtklib executable since any errors that occur in that code are not fed back to the python code.  The easiest way to debug this is to place a breakpoint in the "run_rtklib" function, open a console window, change the directory to the contents of the "folder" variable, then copy and paste the contents of the "rtkcmd" variable into the console window and run it.  Most likely you will find that one of the input files is missing or in the wrong location.

In [None]:
"""
run_ppk_multi.py - convert raw android files to rinex and run PPK solutions for GDSC_2022
data set with RTKLIB and/or rtklib-py.   
"""

import sys
if 'rtklib-py/src' not in sys.path:
    sys.path.append('rtklib-py/src')
if 'android_rinex/src' not in sys.path:
    sys.path.append('android_rinex/src')

import os, shutil
from os.path import join, isdir, isfile
from glob import glob
from multiprocessing import Pool
import subprocess
import gnsslogger_to_rnx as rnx
from time import time

# set run parameters
maxepoch = None # max number of epochs, used for debug, None = no limit

# Set solution choices
ENABLE_PY = False        # Use RTKLIB-PY to generate solutions 
ENABLE_RTKLIB = True     # Use RTKLIB to generate solutions
OVERWRITE_RINEX = False  # overwrite existing rinex filex
OVERWRITE_SOL = False    # overwrite existing solution files

# specify location of input folder and files
datadir = r'C:\gps\GSDC_2022\data\test'
basefiles = '../*0.2*o' # rinex2, use this for rtklib only
#basefiles = '../base.obs' # rinex3, use this for python only
navfiles = '../*MN.rnx' # navigation files with wild cards

# Setup for RTKLIB 
binpath_rtklib  = r'C:\gps\GSDC_2022\rtklib\rnx2rtkp.exe'
cfgfile_rtklib = r'C:\gps\GSDC_2022\config\ppk_phone_0510.conf'
soltag_rtklib = '_rtklib' # postfix for solution file names

# Setup for rtklib-py
cfgfile = r'C:\gps\GSDC_2022\config\ppk_phone_0510.py' # cfgfile must be absolute path
soltag_py = '_py0510'  # postfix for solution file names

PHONES = ['GooglePixel4', 'GooglePixel4XL', 'Pixel4Modded', 'GooglePixel5', 'GooglePixel6Pro', 'XiaomiMi8', 'SamsungGalaxyS20Ultra']
BASE_POS = {'slac' : [-2703115.9184, -4291767.2037, 3854247.9027],  # WGS84 XYZ coordinates
            'vdcy' : [-2497836.5139, -4654543.2609, 3563028.9379],
            'p222' : [-2689640.2891, -4290437.3671, 3865050.9313]}


# input structure for rinex conversion
class args:
    def __init__(self):
        # Input parameters for conversion to rinex
        self.slip_mask = 0 # overwritten below
        self.fix_bias = True
        self.timeadj = 1e-7
        self.pseudorange_bias = 0
        self.filter_mode = 'sync'
        # Optional hader values for rinex files
        self.marker_name = ''
        self.observer = ''
        self.agency = ''
        self.receiver_number = ''
        self.receiver_type = ''
        self.receiver_version = ''
        self.antenna_number = ''
        self.antenna_type = ''

# Copy and read config file
if ENABLE_PY:
    shutil.copyfile(cfgfile, '__ppk_config.py')
    import __ppk_config as cfg
    import rinex as rn
    import rtkcmn as gn
    from rtkpos import rtkinit
    from postpos import procpos, savesol

# function to convert single rinex file
def convert_rnx(folder, rawFile, rovFile, slipMask):
    os.chdir(folder)
    argsIn = args()
    argsIn.input_log = rawFile
    argsIn.output = os.path.basename(rovFile)
    argsIn.slip_mask = slipMask
    rnx.convert2rnx(argsIn)

# function to run single RTKLIB-Py solution
def run_ppk(folder, rovfile, basefile, navfile, solfile):
    # init solution
    os.chdir(folder)
    gn.tracelevel(0)
    nav = rtkinit(cfg)
    nav.maxepoch = maxepoch
    print(folder)

    # load rover obs
    rov = rn.rnx_decode(cfg)
    print('    Reading rover obs...')
    if nav.filtertype == 'backward':
        maxobs = None   # load all obs for backwards
    else:
        maxobs = maxepoch
    rov.decode_obsfile(nav, rovfile, maxobs)

    # load base obs and location
    base = rn.rnx_decode(cfg)
    print('   Reading base obs...')
    base.decode_obsfile(nav, basefile, None)
    
    # determine base location from original base obs file name
    if len(BASE_POS) > 1:
        baseName = glob('../*.2*o')[0][-12:-8]
        nav.rb[0:3]  = BASE_POS[baseName]
    elif nav.rb[0] == 0:
        nav.rb = base.pos # from obs file
        
    # load nav data from rover obs
    print('   Reading nav data...')
    rov.decode_nav(navfile, nav)

    # calculate solution
    print('    Calculating solution...')
    sol = procpos(nav, rov, base)

    # save solution to file
    savesol(sol, solfile)
    return rovfile

# function to run single RTKLIB solution
def run_rtklib(folder, rovfile, basefile, navfile, solfile):
    # create command to run solution
    rtkcmd='%s -x 3 -y 2 -k %s -o %s %s %s %s' % \
        (binpath_rtklib, cfgfile_rtklib, solfile, rovfile, basefile, navfile)
    
    # run command
    os.chdir(folder)
    subprocess.run(rtkcmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)   

####### Start of main code ##########################

def main():

    # get list of data sets in data path
    datasets = os.listdir(datadir)

    # loop through data set folders
    rinexIn = []
    ppkIn = []
    rtklibIn = []
    for dataset in datasets:
        for phone in PHONES:
            # skip if no folder for this phone
            folder = join(datadir, dataset, phone)
            if not isdir(folder):  
                continue
            os.chdir(folder)
            rawFile = join('supplemental', 'gnss_log.txt')
            rovFile = join('supplemental', 'gnss_log.obs')

            rinex = False
            # check if need rinex conversion
            if OVERWRITE_RINEX or not isfile(rovFile):
                # generate list of input parameters for each rinex conversion
                if phone == 'SamsungS20Ultra': # Use cycle slip flags for Samsung phones
                    slipMask = 0 # 1 to unmask recevier cycle slips
                else:
                    slipMask = 0 
                rinexIn.append((folder, rawFile, rovFile, slipMask))
                print(rawFile, '->', rovFile) 
                rinex = True
            
            # check if need to create PPK solution
            try:
                baseFile = glob(basefiles)[0]
                navFile = glob(navfiles)[0]
                solFile = rovFile[:-4] + soltag_py + '.pos'
                solFile_rtklib = rovFile[:-4] + soltag_rtklib + '.pos'
            except:
                print(folder,'  Error: Missing file')
                continue
            if ENABLE_PY and (OVERWRITE_SOL == True or len(glob(solFile)) == 0 
                              or rinex == True):
                # generate list of input/output files for each python ppk solution
                print('PY: ', join(dataset, phone))
                ppkIn.append((folder, rovFile, baseFile, navFile, solFile))
            if ENABLE_RTKLIB and (OVERWRITE_SOL == True or 
                        len(glob(solFile_rtklib)) == 0 or rinex == True):
                # generate list of input/output files for each rtklib ppk solution
                print('RTKLIB: ', join(dataset, phone))
                rtklibIn.append((folder, rovFile, baseFile, navFile, solFile_rtklib))

    if len(rinexIn) > 0:
        print('\nConvert rinex files...')
        # generate rinx obs files in parallel, does not give error messages
        #with Pool() as pool: # defaults to using cpu_count for number of procceses
        #    res = pool.starmap(convert_rnx, rinexIn)
        # run sequentially, use for debug
         for input in rinexIn:
             convert_rnx(input[0],input[1],input[2],input[3])

    if ENABLE_PY and len(ppkIn) > 0:
        print('Calculate PPK solutions...')
        # run PPK solutions in parallel, does not give error messages
        # with Pool() as pool: # defaults to using cpu_count for number of procceses
        #     res = pool.starmap(run_ppk, ppkIn)
        # run sequentially, use for debug
        for input in ppkIn:
            run_ppk(input[0],input[1],input[2],input[3],input[4])

    if ENABLE_RTKLIB and len(rtklibIn) > 0:
        print('Calculate RTKLIB solutions...')
        # run PPK solutions in parallel, does not give error messages
        # with Pool() as pool: # defaults to using cpu_count for number of procceses
        #     res = pool.starmap(run_rtklib, rtklibIn)
        # run sequentially, use for debug
        for input in rtklibIn:
            run_rtklib(input[0],input[1],input[2],input[3],input[4])

if __name__ == '__main__':
    t0 = time()
    main()
    print('Runtime=%.1f' % (time() - t0))

**Step 5: Combine RTKLIB solutions into a single .csv file**

The code below will read in all the individual RTKLIB solution files and create a single .csv file in the correct format for submitting to Kaggle.  The time stamps in the RTKLIB solutions may not exactly match the time stamps in the original raw data and may be missing some data points, so the RTKLIB solution points are interpolated onto the time stamps in the sample submission file that was provided by Google.  Make sure this file (sample_submission.csv) is in the data folder.

This will work for the test data, but for the training data you will need to generate a reference file for the correct timestamps from the ground truth data.  The second code block below will do this.

Only solutions with the same tag will be included so make sure you use the same tag (SOL_TAG) here as you did when creating the solutions in the previous step.

The output file name will include the test set and date and will be in the datapath folder.  This will be in the correct format for submission to Kaggle although we are not quite ready to submit it yet.

In [None]:
""" create_baseline_csv_from_pos.py -  Create csv file PPK solution files using timestamps in reference file
"""

import os
from os.path import join, isfile
import numpy as np
from datetime import date

########### Input parameters ###############################

DATA_SET = 'test'
SOL_TAG = '_rtklib'
datapath = r'C:\gps\GSDC_2022\data'
hdrlen = 25    # 25 for RTKLIB, 1 for rtklib-py

# Also make sure the appropriate reference file is in the datapath
#  test: sample_submission.csv - provided in Google data
# train: ground_truths_train.csv - created with crete_ground_truths.py

############################################################

GPS_TO_UTC = 315964782  # second


# get timestamps from existing baseline file
os.chdir(datapath)
if DATA_SET == 'train':
    baseline_file = 'ground_truths_train.csv'
else: # 'test'
    baseline_file = 'sample_submission.csv'
base_txt = np.genfromtxt(baseline_file, delimiter=',',invalid_raise=False, 
                         skip_header=1, dtype=str)
msecs_base = base_txt[:,1].astype(np.int64)
phones_base = base_txt[:,0]

# open output file
fout =open('baseline_locations_' + DATA_SET + '_' + date.today().strftime("%m_%d") + '.csv','w')
fout.write('tripId,UnixTimeMillis,LatitudeDegrees,LongitudeDegrees\n')

# get list of data sets in data path
os.chdir(join(datapath, DATA_SET))
trips = os.listdir()

# loop through data set folders
ix_b = 0
for trip in trips:
    if isfile(trip):
        continue
    phones = os.listdir(trip)
    # loop through phone folders
    for phone in phones:
        # check for valid folder and file
        folder = join(trip, phone)
        if isfile(folder):
            continue
        trip_phone = trip + '/' + phone
        print(trip_phone)

        ix_b = np.where(phones_base == trip_phone)[0]
        sol_path = join(folder, 'supplemental', 'gnss_log' + SOL_TAG + '.pos')
        if isfile(sol_path):
            # parse solution file
            fields = np.genfromtxt(sol_path, invalid_raise=False, skip_header=hdrlen)
            if int(fields[0,1]) > int(fields[-1,1]): # invert if backwards solution
                fields = fields[::-1]
            pos = fields[:,2:5]
            # qs = fields[:,5].astype(int)
            # nss = fields[:,6].astype(int)
            # acc = fields[:,7:13]
            msecs = (1000 * (fields[:,0] * 7 * 24 * 3600 + fields[:,1])).astype(np.int64)
            msecs += GPS_TO_UTC * 1000
        else:
            print('File not found: ', sol_path)
            msecs = msecs_base.copy()
            pos = acc = np.zeros((len(msecs), 3))
            qs = nss = np.zeros(len(msecs))
            
           
        # interpolate to baseline timestamps to fill in missing samples
        llhs = []; stds = []
        for j in range(6):
            if j < 3:
                llhs.append(np.interp(msecs_base[ix_b], msecs, pos[:,j]))
        #     stds.append(np.interp(msecs_b, msecs, acc[:,j],
        #                         left=1000, right=1000))
        # qsi = np.interp(msecs_b, msecs, qs)
        # nssi = np.interp(msecs_b, msecs, nss)

        # # write results to combined file
        for i in range(len(ix_b)):
            fout.write('%s,%d,%s,%s\n' % 
                    (trip_phone, msecs_base[ix_b[i]], llhs[0][i], llhs[1][i]))

fout.close()

In [None]:
"""
create_groundtruth_csv.py - create csv file from all training set ground truth files
"""

import os
from os.path import join, isfile


datapath = r'C:\gps\GSDC_2022\train'
GPS_TO_UTC = 315964782  # second

# open output file
os.chdir(datapath)
fout =open('../ground_truths_train.csv','w')
fout.write('tripId,UnixTimeMillis,LatitudeDegrees,LongitudeDegrees\n')

# get list of data sets in data path
datasets = os.listdir(datapath)

# loop through data set folders
for dataset in datasets:
    if isfile(dataset):
        continue
    phones = os.listdir(join(datapath,dataset))
    for phone in phones:
        folder = join(datapath, dataset, phone)
        if isfile(folder):
            continue
        
        csv_file = join(folder, 'ground_truth.csv')
        if not isfile(csv_file):
            continue

        # parse ground truth file
        with open(csv_file) as f:
            lines = f.readlines()[1:]
        flag = 0
        for line in lines:
            d = line.split(',')
            t = float(d[8]) # get time stamp
            if flag == 0:            
                print('%20s,%16s' % (dataset, phone))
                flag = 1
            # write results to combined file
            fout.write('%s/%s,%.0f,%s,%s\n' % ((dataset, phone, t, d[2], d[3])))
        
fout.close()

**Step 6:  Filtering out RTKLIB solutions with hardware clock discontinuites**

Unfortunately, there are a couple of data sets in the test data and several more in the training data sets that have corrupted carrier phase measurements.  The RTKLIB solutions are quite poor for these datasets and are typicaly worse than the included Google baseline solutions which do not use the carrier phase measurements.  

These data sets can be identified by the HardwareClockDiscontinuityCount field in the raw Androiod log files. If the final value in this field is larger than the initial value, then the carrier phase measurements will be corrupted by the clock discontinuities.  The code block below will scan the log files and list any with greater than one discontinuity.

For my initial submission, I simply replaced the RTKLIB solutions for these two data sets with the Google baseline solutions but I hope to eventually find a better solution.

In [None]:
"""
couont_clock_errs.py - count hardware clock discontinuities in raw logs
"""

import os
from os.path import join, isfile


########### Input parameters ###############################

DATA_SET = 'test'
datapath = r'C:\gps\GSDC_2022\data''

############################################################



# get list of data sets in data path
os.chdir(join(datapath, DATA_SET))
trips = os.listdir()

# loop through data set folders
for trip in trips:
    if isfile(trip):
        continue
    phones = os.listdir(trip)
    # loop through phone folders
    for phone in phones:
        # check for valid folder and file
        folder = join(trip, phone)
        if isfile(folder):
            continue
        trip_phone = trip + '_' + phone

        infile = join(folder, 'supplemental','gnss_log.txt')

        # parse solution file
        clks, secs = [], []
        fid = open(infile,'r')
        lines = fid.readlines(10000000)
        fid.close()
        for line in lines:
            x = line.split(',')
            if x[0] == 'Raw':
                clks.append(int(x[10]))
                secs.append(int(x[1]))
                
                
        dclks = clks[-1]-clks[0]
        dsecs = (secs[-1]-secs[0]) / 1000

        if dclks > 1:
            print('%3d/%.0f: %s' % ( dclks, dsecs, trip_phone))

**Step 7: Submit CSV file to Kaggle**

You can now submit the modified csv file to Kaggle.  This should give you a score close to 3.135.  However, if you are using the latest RTKLIB demo5 executables, they are missing a couple of robustness improvements and so the score will be a little worse.  If you want to duplicate this score exactly, you will need to use the most recent source code available at https://github.com/rtklibexplorer/RTKLIB and compile the rnx2rktp.exe app yourself.  There are instructions for compiling the code in [Windows](https://rtklibexplorer.wordpress.com/2020/12/04/building-rtklib-code-in-windows/) or [linux](https://rtklibexplorer.wordpress.com/2020/12/18/building-rtklib-code-in-linux/) on my blog.  Note that the Windows instructions use the Embarcadero compiler which is required for the GUI apps but if you are just compiling the rnx2rtkp app, you can compile it with the VisualStudio compiler using the project file in the \app\consapp\rnx2rtkp\msc folder.

I put this explanation together fairly quickly so as to make it available while the competition is still in its early phase. I suspect it has some errors and ommissions so please let me know if you find any issues and I will make any necessary updates.

**Final thoughts**

The intent of this notebook is not to provide a fully optmized solution, but only to get you started with RTKLIB and demonstrate some of its capability.  

Following these instructions will provide an improved baseline solution file which can be post-processed with filtering, map-matching, etc to give you a good jump on the competition. This alone, however, will probably not be enough to win the competion.  To do that I believe that you will need to improve the RKTLIB solution itself.  Some of this can be done by modifying the configuration file.  More dramatic changes will require modifying the code.  More information on the configuration file and the code algorithms are available in the [demo5 RTKLIB Users Manual](https://rtkexplorer.com/pdfs/manual_demo5.pdf), particularly section 3.5 and Appendix F for information on configuration, and Appendix E for information on the core algorithms.  More involved changes to the code can probably be done more easily in rtklib-py, the python version of RTKLIB.

Here are a few suggestions to get you started.  Most of them do not require any modifications to the RTKLIB code, just the configuration file.

1) **Raw measurement weighting.**  This solution uses only satellite elevation for weighting the input observations.  The code supports weighting with arbitrary combinations of elevation, signal strength, and receiver quality estimates.  Other research has shown that signal strength weighting should outperform elevation weighting for smartphone receivers.  The receiver accuracy estimates are also potentially valuable sources of information that is currently being discarded

2) **Solution ensembles**:  In many cases you will find that adjusting the RTKLIB configuration parameters will improve some solutions and degrade others.  Weighted combinations of multiple solutions will likely out-perform any single solution.  I would recommend the variance based weighting used in my phone merge code example from last year (see link in the introduction).

3) **Hardware clock discontinuites:** In this solution, the two datasets with clock discontinuties were simply replaced with the Google baseline solutions.  It should be possible to use RTKLIB to find an improvement over these.

4) **Cycle slip detection/mitigation**:  This solution entirely ignores flags from the receiver indicating possible cycle slips.  Instead it relies on consistency checks to detect the cycle slips.  While ignoring the slips entirely works better than relying on them completely, there is valuable information being discarded here which could potentially improve the solution.

5) **Tectonic plate movement:**  Base station movement due to tectonic plate movement is not fully accounted for in this solution and there may be some opportunity for improvement with more careful accounting.

6) **IMU measurements:**  IMU measurements could be incorporated into the RTKLIB solution.  The solutions noticeably degrade when the vehicles are stationary due to increased multipath effects so the simplest solution would be to detect this state and respond accordingly.  More sophisticated approachs could take more advantage of this information.

I'm happy to answer any questions regarding RTKLIB.  I just ask that, to follow the rules of the competition, you ask your questions in the discussion group here so that the answers are available to all of the competitors.

