The script `sortCOLMA.py` is a demonstration of how to sort data from the Colorado Lightning Mapping Array. It is easily adapted to data from other networks.

The flash sorting infrastructure is modular. This script uses the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN">DBSCAN algorithm </a> as implemented in the <a href="http://scikit-learn.org">scikit-learn</a> machine-learning library. In order to manage the $N^2$ efficiency of the underlying DBSCAN implementation, data are clustered in pairs of `thresh_duration` chunks.

The script is configurable in a few places. 
- `base_sort_dir` sets the path where 
- `center_ID` chooses a network center. The centers are defined in the `centers` dictionary. The ID is used later when constructing output filenames, too.
- The `params` dictionary configures the flash sorting algorithm. Of particular importance are the following.
  - `stations`: sets the (min, max) number of stations that must participate in each solution for it to count. Max should be larger than the number of stations. Min should be six or seven, depending on the number of stations.
  - `chi2`: sets the (min, max) chi-squared value. The minimum should be zero, while a good maximum to start with is 1.0.
  - `distance`: maximum distance between a source and its closest neighbor before a new flash is started
  - `thresh_critical_time`: maximum temporal separation between a source and its closest neighbor before a new flash is started
  - `thresh_duration`: All flashes should be last less than or equal to this number of seconds. All flashes of duration < `thresh_duration` are guaranteed to remain clustered. An occasional lucky flash of duration =  2 \* `thresh_duration` is possible.

The script is broken into three sections.
- Run the flash sorting, which creates HDF5 data files with VHF source data, their flash IDs, and a matching flash data table.
- Grab the flash-sorted files and create CF-compliant NetCDF grids
- Grab the grids and create PDF images of each grid

The grid spacing, boundaries, and frame intervals are configured at the begining of the gridding section of the script. This script creates regularly-spaced lat/lon grids, with the center grid cell size calculated to match the specified `dx_km` and `dy_km`. It is also possible to grid directly in a map projection of choice by changing `proj_name`, as well as `x_name` and `y_name` in the call to `make_plot`.

The PDF images are created as small-multiple plots, with the number of columns given by `n_cols` at the beginning of the plotting section.

The script is run from the terminal as you see in the cell below. It accepts a list of filenames matching the standard `LYLOUT_YYMMDD_HHMMSS_duration.dat[.gz]` file naming convention. It does not handle day boundaries, so run it on at most a single day's worth of data at a time.

In [1]:
%%bash

python sortCOLMA.py /data/DC3/20120602/LMA/LYLOUT_120602_2[0-1]*.dat.gz

sorting 804017 total points
(3331,) (2503,)
(2503,) (8105,)
(8105,) (2222,)
(2220,) (2140,)
(2136,) (1609,)
(1608,) (3976,)
(3579,) (6662,)
(6595,) (1640,)
(1532,) (4300,)
(4259,) (4008,)
(4007,) (6544,)
(6509,) (3759,)
(3538,) (3032,)
(2715,) (2441,)
(2337,) (1748,)
(1438,) (3568,)
(3489,) (5775,)
(5686,) (2601,)
(2084,) (4658,)
(4584,) (6302,)
(5869,) (4782,)
(4746,) (2457,)
(1721,) (3505,)
(3475,) (2173,)
(2161,) (4470,)
(4437,) (5296,)
(3864,) (2968,)
(2705,) (1745,)
(1744,) (3823,)
(3768,) (2817,)
(2817,) (5775,)
(3687,) (5067,)
(4306,) (2499,)
(2053,) (1347,)
(1347,) (1404,)
(1366,) (2370,)
(2369,) (3619,)
(3550,) (1610,)
(1293,) (4479,)
(4399,) (6382,)
(6318,) (5552,)
(4643,) (3726,)
(3600,) (2402,)
(2118,) (2122,)
(2097,) (2816,)
(2733,) (2683,)
(2587,) (2876,)
(2874,) (3776,)
(3674,) (3254,)
(2899,) (1705,)
(1656,) (3378,)
(2946,) (3658,)
(3639,) (2254,)
(2195,) (6191,)
(6027,) (2601,)
(2425,) (2167,)
(2167,) (7445,)
(7441,) (4894,)
(4211,) (5008,)
(4750,) (1609,)
(1605,) (449

gunzip: error writing to output: Broken pipe
gunzip: /data/DC3/20120602/LMA/LYLOUT_120602_200001_0600.dat.gz: uncompress failed
  h5file = T.openFile(outfile, mode='w', title='Flash-sorted New Mexico Tech LMA Data')
  group  = h5file.createGroup('/', 'events', 'Analyzed detected events')
  table  = h5file.createTable(group, time_code, Event, time_code)
  fl_group = h5file.createGroup('/', 'flashes', 'Sorted LMA flash data')
  fl_table  = h5file.createTable(fl_group, time_code, Flash, time_code)
gunzip: error writing to output: Broken pipe
gunzip: /data/DC3/20120602/LMA/LYLOUT_120602_201001_0600.dat.gz: uncompress failed
  h5file = T.openFile(outfile, mode='w', title='Flash-sorted New Mexico Tech LMA Data')
  group  = h5file.createGroup('/', 'events', 'Analyzed detected events')
  table  = h5file.createTable(group, time_code, Event, time_code)
  fl_group = h5file.createGroup('/', 'flashes', 'Sorted LMA flash data')
  fl_table  = h5file.createTable(fl_group, time_code, Flash, time_code)


Tour of the HDF5 data, including the data format. See lmatools example notebook.

Activity: modify script to 5, 6, 7 stations and change chi2 to 5 for 6 station case. Change base_sort_dir each time.
Examination of flash output results for different chi2, stations


There is an example of animating the grids in lmatools/examples/Plot LMA NetCDF.ipynb

Another notebook for vis and interesting checkpoints. 
2100-2200 has big surge and high rates; see PRK email. Southern storm 2000-2100 has updraft on forward side - outflow dominant.

Soundings: weak upper level flow. What were lower level winds ahead of storm? http://catalog.eol.ucar.edu/cgi-bin/dc3_2012/research/index
Surface: shows SSE 15G20 http://catalog.eol.ucar.edu/cgi-bin/dc3_2012/imagewrap.nonav?file_url=/dc3_2012/ops/gts_station_plot/20120602/ops.GTS_Station_Plot.201206022030.CO_regional.gif