Plane-crash locations from NTSB reports
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.org
csv-to-vnl.pl
joint.vnl
make-map.pl
parse-airport-data.pl
wrecks.png

README.org

Synopsis

Check this out!

wrecks.png

Interactive map at

https://caltopo.com/m/N13E

WARNING: non-tiny amount of data, so this is SLOW

Overview

This is a set of scripts to massage NTSB incident reports into a visualization of crash sites. After an aviation incident occurs, the NTSB investigates each event, and publishes a report. These often contain information about the crash site location. Sometimes this exists explicitly: as a latitude, longitude pair. At other times, the report lists the bearing and range to an “observing facility”. This set of tools extracts both of these into a visualize-able form.

Caveats

The data isn’t particularly accurate: most of the locations I know of are in the right ballpark (within a few miles), but some are completely off. It is, however, useful as a jumping-off point for further research.

If you hike to the exact coordinates given, you’re unlikely to find anything

The data I’m visualizing here comes from the post-1982 incident database. A database of older incidents exists also, but it doesn’t appear to contain any location information. Finally, while civilian incidents are well-covered by this dataset, I don’t know if military ones are covered as well.

Details

We download some data, massage it, and produce a single vnlog that contains everything we care about. Below, I describe how I made this file, but a pre-cooked copy is available in this repo: joint.vnl. This can be used to visualize specific areas, as described below.

This all works today (January 2019) on my up-to-date Debian/sid box. The URLs and tools and such can change at any time, so interpret all this as a set of loose guidelines. Make adjustments as required.

NTSB incident database

The main website that describes the reports is this: https://www.ntsb.gov/_layouts/ntsb.aviation/index.aspx

It points to a repository of databases: https://app.ntsb.gov/avdata/

We grab the database of events we care about. Today the post-1982 data can be obtained and extracted thusly:

$ wget https://app.ntsb.gov/avdata/Access/avall.zip

$ unzip avall.zip

We now have a Microsoft Access database file: avall.mdb with all the data. The things I care about live in the events and aircraft tables. I extract those into an sqlite database, query the columns I want as csv, and convert to a form I can work with: vnlog:

$ ( mdb-schema avall.mdb sqlite -T aircraft;
    echo "BEGIN;";
    mdb-export -I sqlite avall.mdb aircraft;
    echo "COMMIT;";
  ) | sqlite3 data.sqlite

$ ( mdb-schema avall.mdb sqlite -T events;
    echo "BEGIN;";
    mdb-export -I sqlite avall.mdb events;
    echo "COMMIT;";
  ) | sqlite3 data.sqlite

$ sqlite3 -header -csv data.sqlite \
    'select events.ev_id,ev_year,acft_make,acft_model,acft_series,acft_serial_no,latitude,longitude,wx_obs_fac_id,wx_obs_dist,wx_obs_dir from events,aircraft where events.ev_id==aircraft.ev_id;' |
    ./csv-to-vnl.pl > data.vnl

Yes, vnlog isn’t a real database, so lookups will be much slower than if I continued with sqlite. But many things become simple, and I’m going to be doing very few lookups, so this is ok.

Airport codes

Alright. If we’re getting the location in reference to some observing airport, we need to know where each airport is. There’re a number of lists on the internet of airport codes. For this project I’d like to know the location of a bunch of tiny, obscure airports, so most lists were insufficient. This list here was sufficiently complete. The data is licensed under the Public Domain Dedication and License (PDDL). I download the dataset, and convert it to a nice vnlog containing just the data I need

$ wget https://datahub.io/core/airport-codes/r/airport-codes.csv

$ ./parse-airport-data.pl > airports.vnl

And then I add these location fields to the main dataset

$ vnl-join -a1 --vnl-sort - -j wx_obs_fac_id                                               \
    data.vnl                                                                               \
    <(< airports.vnl vnl-filter -p wx_obs_fac_id=code,lat-observing=lat,lon-observing=lon) \
  > joint.vnl

Putting it all together

Now we have all the information, and we run another script to generate a GeoJSON file that mapping tools can ingest. I cut it down to a small region because web browsers are glacially slow. To see all the incidents in the San Gabriel Mountains:

$ ./make-map.pl 34.08 -118.52 34.54 -117.38 joint.vnl > wrecks.json

This file is visualized in that caltopo link above.

License

All code Copyright 2019 Dima Kogan, released under the terms of the Lesser GNU Public License (any version)