<a href="https://colab.research.google.com/github/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/nmea.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#NMEA files processing

NMEA stands for National Marine Electronics Association. Nowadays, NMEA is a standard data format supported by all GPS manufacturers. Take a look at this [NMEA file](https://github.com/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/code/nmea1.txt) containing NMEA data. You can collect your own data in NMEA file in your smart phone via a wide range of applications, like [AndroSensor](http://www.fivasim.com/androsensor.html).

In this lesson we will read a NMEA file, get statistics about the different sentences, visualize the trajectory, and some further data.

First off, you need to upload the file into your Colab virtual machine to be able to read it. For that, please execute the next code block.

In [2]:
!wget -q -O sample_data/nmea1.txt https://raw.githubusercontent.com/OSGeoLabBp/tutorials/master/english/data_processing/lessons/code/nmea1.txt

The following code block read the file line by line and count the lines:

In [3]:
with open('sample_data/nmea1.txt') as fp:
  i = 0
  for line in fp:
    i += 1
print(i)

63


In [4]:
with open('sample_data/nmea1.txt') as fp:
  print(len(fp.readlines()))    # readlines function reads the whole file into a list line by line

63


**Statistics of NMEA sentences**

A typical NMEA file contains several types of sentences. The following code block counts the various types using the python dictionary data structure. Sentence type is given by the 3..6 characters of each line, such as GGA, GSA, GLL... . If you are less familiar with the NMEA sentences, feel free to Google it up.

In [5]:
tokens = {} # init tokens dictionary
fi = open('sample_data/nmea1.txt', 'r') # open input file
for line in fi: #loop over lines in the file
  token = line[3:6] #get the sentence type
  if token not in tokens:
    tokens[token] = 0   # create new item in dictionary
  tokens[token] += 1
fi.close()

Print out the dictionary. 

In [6]:
print(tokens)

{'GGA': 19, 'GLL': 27, 'RMC': 17}


Sort the items according to their occurrence.

In [8]:
for t in sorted(tokens.items(), key=lambda x: x[1], reverse=True):
    print(f"{t[0]}: {t[1]}")

GLL: 27
GGA: 19
RMC: 17


**Coordinate list from GGA**

Let's create a coordinate list from GGA sentences. All NMEA sentences store the positions with geographic coordinates. Latitude and longitude are expressed as angular value; however, the unit seems to be a rather odd format. 

In the first GGA sentence of nmea1.txt the latitude is *1130.832,N*, which means 11 degree and 30.832 minute to the North. Longitude is *04344.045,E* which means 43 degree and 44.045 minute to the East. Please note that the longitude degree is given by three characters, while latitude is given by two. In order to use coordinates for further computations, we need to define a function to convert the provided coordinates in degrees, as follows:

In [9]:
def nmea2deg(nmea):
    """ convert nmea angle (dddmm.mm) to degree """
    w = nmea.split('.')
    return int(w[0][:-2]) + float(w[0][-2:] + '.' + w[1]) / 60.0

print(nmea2deg('1130.832623'))
print(nmea2deg('04344.045'))

11.51387705
43.73408333333333


There is a clearer solution using string find function.

In [10]:
def nmea2deg(nmea):
    """ convert nmea angle (dddmm.mm) to degree """
    pos = nmea.find('.')
    return int(nmea[:pos-2]) + float(nmea[pos-2:]) / 60.0

print(nmea2deg('1130.832623'))
print(nmea2deg('04344.045'))

11.51387705
43.73408333333333


*Is there any other way to convert the values in nmea format to degree? Just using arithmetic operations?*

Open the input file again, loop over the lines and use regular expressions to filter out sentences starting by the word *$..GGA*. Do not forget the command **import re** to import the module. If you are less familiar with regular expressions feel free to [Google it up](https://docs.python.org/3/library/re.html).

In [12]:
import re
with open('sample_data/nmea1.txt', 'r') as fi:  # open input file
    for line in fi: #loop over lines in the file
        line = line.strip()
        if re.match('\$..GGA', line):
            gga = line.split(',')

The following code block convert values to degree. It is important to pay attention to the sigh of coordinates (i.e., points on the southern hemisphere have negative latitudes)

In [15]:
with open('sample_data/nmea1.txt', 'r') as fi:   # open input file
    for line in fi: #loop over lines in the file
        line = line.strip()
        if re.match('\$..GGA', line):
            gga = line.split(',')
            if gga[6] > '0':  # skip invalid positions
                lat = nmea2deg(gga[2])
            if gga[3].upper() == 'S':
                lat *= -1

The code block above manages just the latitudes. As a practical exercise, adjust it to get the longitudes as well, and print out a coordinate list. Do not forget the sign of the longitudes!

**Checksum**

NMEA data structure has a checksum on the end of each sentence. A checksum block is mainly used for detecting errors that may have been introduced during data transmission. The checksum is the XOR of all the bytes between the $ and the * (not including the delimiters themselves), and written in hexadecimal format. 
XOR stands for e**X**clusive **OR**, which is a Boolean logic operation. It compares two input bits and generates one output bit. The logic is simple: if the bits are the same, the result is 0; if the bits are different, the result is 1. For more bitwise operators visit [this link](https://realpython.com/python-bitwise-operators/).

Let's calculate the result of a bitwise XOR operation of two letters. The first letter is '9' and the second is '3'. The ASCII code for '9' is 57 and for '3' is 51. The binary equivalent of decimal 57 is 00111001 and decimal 51 is 00110011. Now we can execute the XOR operation for the corresponding bits:

```
00111001 ('9', ASCII 57)
00110011 ('3', ASCII 51)
--------
00001010
```

In [16]:
code1 = ord('9')
code2 = ord('3')
res = code1 ^ code2
print(bin(code1)[2:])
print(bin(code2)[2:])
print('-'*6)
print(f'{bin(res)[2:]:>6}')

111001
110011
------
  1010


A python function to get the *checksum* of an NMEA sentence is shown in the following code block: 

In [18]:
def checksum(buf):
  """ check nmea checksum on line """
  cs = ord(buf[1])
  for ch in buf[2:-3]:
    cs ^= ord(ch)
  return f"{cs:02X}"

Try it on the first line of the [file](https://github.com/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/code/nmea1.txt):

In [19]:
buf = '$GPGGA,160023.69,1130.832,N,04344.045,E,1,04,2.6,100.00,M,-33.9,M,,0000*7C'
checksum(buf)

'7C'

What happens if you change any of the characters in the line above?

In order to filter out corrupted sentences, adjust your code by comparing the actual checksum of each line to the last two characters, as follows: 

In [20]:
with open('sample_data/nmea1.txt', 'r') as fi:  # open input file
    for line in fi:
        line = line.strip()
        if checksum(line) != line[-2:]:
            print("Chechsum error: " + line)
            continue

**Tasks for practice**

Write a simple program which only lists the different types of NMEA sentences in a file (use set and set operations)

Adjust the code to add:

*   time in *hh:mm:ss* format
*   number of tracked satellites

Prepare a plot of time vs. number of satellites ([an example](https://github.com/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/images/nmea_nsat.png)).

Load the coordinate list into QGIS and visualize the track on OSM in the background ([an example](https://github.com/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/images/nmea_map.png)).

The aforementioned examples were generated from [this file](https://github.com/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/code/nmea3.txt). For this task, collect you own data and develope the plot and the map.