AIS I/O with Python, dictionaries, and the GPSd AIVDM schema.
This project is still evolving but will calm down once it hits v0.1
. We
would love to hear from you're using, or would like to use, this project, both
so we don't make any unexpected changes, and to get outside opinions on AIS
processing.
Vessels use a ship-to-ship Automated Identification System (AIS) to avoid collisions by broadcasting information about who they are, where they are, and what they are doing. These messages are broadcast as NMEA 0183 or NMEA 2000 sentences and are constantly being collected by terrestrial and satellite receivers.
NMEA is very large and difficult to work with natively, so the solution is to
parse it and store as another format. Rather than spend time developing our
own schema we chose to adopt the GPSd AIVDM
schema, which clearly defines all message types. Messages map well to Python
dictionaries due to how fields vary type-to-type, so that's what gpsdio
uses.
This project aims to make AIS data easier to work with by providing I/O and a small set of useful transforms, and was built with large-scale data processing pipelines in mind.
Here's a small example of how to read data stored as newline delimited JSON,
add a field, and write as GZIP compressed MsgPack.
The driver and compression are explicitly given but can also be detected from the file path.
For more information on what gpsdio.open()
returns, see the section on messages.
import gpsdio
with gpsdio.open('tests/data/types.json') as src:
with gpsdio.open('with-num-fields.msg.gz', 'w', driver='MsgPack', compression='GZIP') as dst:
for msg in src:
dst.write(msg)
gpsdio
does not yet support reading NMEA directly, although it will hopefully
in the near future. In the meantime, libais
has an aisdecode
utility with an optional --gpsd
format that produces data
readable by this library.
This project also offers a gpsdio
commandline utility for common tasks like
inspecting and transforming data. See the CLI docs
for more information.
NOTE: Message validation and transformation has not quite settled and will
change for v0.1
. The description below is currently mostly relevant, although
some fields may be placed into an __invalid__
key on read or write.
gpsdio.open()
returns a file-like object called Stream()
that is
responsible for taking a dictionary from the underlying driver and transforming
it into a well formed message.
Normally I/O libraries perform some strict validation while reading and before
writing data, but working with AIS usually involves adding some custom fields.
Rather than telling gpsdio.open()
what additional fields it may encounter
every time a file is opened, message validation only happens when requested.
This may seem backwards, but the idea is that validation really only needs to
happen as data parsed and brought into the gpsdio
world. After that the
user knows far more about their data and is likely adding additional fields
during processing.
See tests/data
for some data gpsdio
can immediately read and write.
Developers can create their own gpsdio
commands with click-plugins
.
gpsdio
loads plugins from a setuptools entry point
called gpsdio.cli_plugins
, so in your setup.py
:
setup(
entry_points='''
[gpsdio.cli_plugins]
name=package.module:click_command
'''
)
For a more in-depth description see the click-plugins documentation. Additionally, see gpsdio-density for an example, or one of the other plugins listed in the plugin registry.
External drivers should be registered to the entry-point gpsdio.driver_plugins
and
must subclass gpsdio.base.BaseDriver
or gpsdio.base.BaseCompressionDriver
.
See the docstrings on those two objects for subclassing information.
v0.1
We will have reached v0.1
and can confidently share this project with others. To do that,
we need to support the syntax below. The schema building is a little rough, but the assumption
is that users only need to do this to being non-GPSd data into gpsdio, which they should only
need to do once. We will make this easier in the near future with a gpsdio-nmea-driver
.
Either way, there need to be some functions to assemble the schema. This example brings
points with limited information into the GPSd
schema:
import gpsdio
from gpsdio.schema import build_schema
# Define which fields we will see for each type
fields_by_type = {
1: ('mmsi', 'lat', 'lon', 'course', 'speed', 'shipname')
}
fields_by_type[2] = fields_by_type[1]
fields_by_type[3] = fields_by_type[1]
fields = {
'type': int,
'mmsi': int,
'lat': float,
'lon': float,
'course': float,
'speed': float,
'shipname': str
}
schema = build_schema(fields_by_type, fields)
with gpsdio.open('infile.json', schema=schema) as src, gpsdio.open('outfile.json', 'w') as dst:
for msg in src:
# Fill in fields not included in the message to match the destination schema
msg = dst.fill_msg(msg)
# Write to output file
dst.write(msg)
- Functions to easily assemble a custom schema.
- Included schema should only define AIVDM types and _maybe_ a
timestamp
. - Take another pass at the driver and compression class structure, but don't let that hold up a release.
- Support experimental driver plugin registration.
- Support experimental schema extension registration.
v1.0
- Try pretty hard not to change the schema but if it needs to change, change it.
- Probably no dependencies that require a C compiler, except maybe an optional speedups module that re-implements some of the core I/O objects and transforms in Cython.
- Move
MsgPack
driver to an external project. gpsdio-nmea-driver
for reading raw NMEA.- GeoJSON support as an interface to GDAL. Maybe a
gpsdio-vector-driver
.
With pip:
$ pip install gpsdio
From source:
$ git clone https://github.com/SkyTruth/gpsdio
$ cd gpsdio
$ python setup.py install
$ git clone https://github.com/SkyTruth/gpsdio.git
$ cd gpsdio
$ virtualenv venv
$ source venv/bin/activate
$ pip install -e .[dev]
$ py.test tests --cov gpsdio --cov-report term-missing
See CHANGES.md
See LICENSE.txt