Skip to content

dkogan/ntsb-wrecks

Repository files navigation

Synopsis

Check this out!

wrecks.png

Interactive map at

http://ntsb.secretsauce.net/map/

Overview

This is a set of scripts to massage NTSB incident reports into a visualization of crash sites. After an aviation incident occurs, the NTSB investigates each event, and publishes a report. These often contain information about the crash site location. Sometimes this exists explicitly: as a latitude, longitude pair. At other times, the report lists the bearing and range to an “observing facility”. This set of tools extracts both of these into a visualize-able form.

Caveats

The data isn’t particularly accurate: most of the locations I know of are in the right ballpark (within a few miles), but some are completely off. It is, however, useful as a jumping-off point for further research.

If you hike to the exact coordinates given, you’re unlikely to find anything

The data I’m visualizing here comes from the post-1982 incident database. A database of older incidents exists also, but it doesn’t appear to contain any location information. Finally, while civilian incidents are well-covered by this dataset, I don’t know if military ones are covered as well.

Details

We download some data, massage it, and produce a single data-store that contains everything we care about. Below, I describe how I made this file, but a pre-cooked copy is available in this repo: joint.vnl (plain text) or joint.sqlite (sqlite database). This can be used to visualize specific areas, as described below.

This all works today (January 2019) on my up-to-date Debian/sid box. The URLs and tools and such can change at any time, so interpret all this as a set of loose guidelines. Make adjustments as required.

NTSB incident database

The main website that describes the reports is this: https://www.ntsb.gov/_layouts/ntsb.aviation/index.aspx

It points to a repository of databases: https://app.ntsb.gov/avdata/

We grab the database of events we care about. Today the post-1982 data can be obtained and extracted thusly:

$ wget https://app.ntsb.gov/avdata/Access/avall.zip

$ unzip avall.zip

We now have a Microsoft Access database file: avall.mdb with all the data. The things I care about live in the events and aircraft tables. I

  • extract those into an sqlite database
  • query the columns I want as csv
  • make the latitude and longitude fields numeric (i.e.”360148N 1190347W” becomes “36.03 -119.063055555556”)
  • convert to a form I can work with: vnlog
$ ( mdb-schema avall.mdb sqlite -T aircraft;
    echo "BEGIN;";
    mdb-export -I sqlite avall.mdb aircraft;
    echo "COMMIT;";
  ) | sqlite3 data.sqlite

$ ( mdb-schema avall.mdb sqlite -T events;
    echo "BEGIN;";
    mdb-export -I sqlite avall.mdb events;
    echo "COMMIT;";
  ) | sqlite3 data.sqlite

$ sqlite3 -header -csv data.sqlite \
    'select events.ev_id,ev_year,acft_make,acft_model,acft_series,acft_serial_no,latitude,longitude,wx_obs_fac_id,wx_obs_dist,wx_obs_dir from events,aircraft where events.ev_id==aircraft.ev_id;' |
    ./csv-to-vnl.pl |
    vnl-filter --perl \
      --sub 'conv{return undef unless defined $_[0]; return $1 + ($2 + $3/60.)/60. if $_[0] =~ /^([0-9]+)([0-9][0-9])([0-9][0-9])[NE]$/i; return -($1 + ($2 + $3/60.)/60.) if $_[0] =~ /^([0-9]+)([0-9][0-9])([0-9][0-9])[SW]$/i; return undef;}' \
       -p '!(latitude|longitude),latitude=conv(latitude),longitude=conv(longitude)' \
    > data.vnl

Yes, vnlog isn’t a real database, so lookups will be much slower than if I continued with sqlite. But many things become simple, and I’m going to be doing very few lookups, so this is ok.

Airport codes

Alright. If we’re getting the location in reference to some observing airport, we need to know where each airport is. There’re a number of lists on the internet of airport codes. For this project I’d like to know the location of a bunch of tiny, obscure airports, so most lists were insufficient. This list here was sufficiently complete. The data is licensed under the Public Domain Dedication and License (PDDL). I download the dataset, and convert it to a nice vnlog containing just the data I need

$ wget https://datahub.io/core/airport-codes/r/airport-codes.csv

$ ./parse-airport-data.pl > airports.vnl

Putting it all together

And then I add these location fields to the main dataset

$ vnl-join -a1 --vnl-sort - -j wx_obs_fac_id                                               \
    data.vnl                                                                               \
    <(< airports.vnl vnl-filter -p wx_obs_fac_id=code,lat_observing=lat,lon_observing=lon) \
  > joint.vnl

I later decided to make a website that can select subsets of the data, so I really did need fast lookups. So I can convert this vnlog to an sqlite db:

./vnl-to-sql.pl --real  latitude,longitude,lat_observing,lon_observing \
                --index latitude,longitude,lat_observing,lon_observing \
    < joint.vnl | sqlite3 joint.sqlite

Visualize-able data output

Now we have all the information, and we run another script to generate a GeoJSON file that mapping tools can ingest. I cut it down to a small region because web browsers are glacially slow. To see all the incidents in the San Gabriel Mountains:

$ ./make-map.pl 34.08 -118.52 34.54 -117.38 joint.sqlite > wrecks.json

This file is visualized in that interactive map linked above.

Acknowledgement

Huge thanks to Erik Price and Chris McKenzie for help in getting the interactive map working. I’m not a web dev.

License

All code Copyright 2019 Dima Kogan, released under the terms of the Lesser GNU Public License (any version)

About

Plane-crash locations from NTSB reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published