See environment_setup.README (below) for instructions about the use of the DC3_plots_NALMA script. It is a version of the script used to process the DC3 dataset as in Barth et al. (2015, BAMS) and Bruning and Thomas (2015, JGR).

The flash sorting infrastructure is modular. This script uses the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN">DBSCAN algorithm </a> as implemented in the <a href="http://scikit-learn.org">scikit-learn</a> machine-learning library. In order to manage the $N^2$ efficiency of the underlying DBSCAN implementation, data are clustered in pairs of `thresh_duration` chunks.

The script is configurable in a few places. 
- `base_sort_dir` sets the path where 
- `center_ID` chooses a network center. The centers are defined in the `centers` dictionary. The ID is used later when constructing output filenames, too.
- The `params` dictionary configures the flash sorting algorithm. Of particular importance are the following.
  - `stations`: sets the (min, max) number of stations that must participate in each solution for it to count. Max should be larger than the number of stations. Min should be six or seven, depending on the number of stations.
  - `chi2`: sets the (min, max) chi-squared value. The minimum should be zero, while a good maximum to start with is 1.0.
  - `distance`: maximum distance between a source and its closest neighbor before a new flash is started
  - `thresh_critical_time`: maximum temporal separation between a source and its closest neighbor before a new flash is started
  - `thresh_duration`: All flashes should be last less than or equal to this number of seconds. All flashes of duration < `thresh_duration` are guaranteed to remain clustered. An occasional lucky flash of duration =  2 \* `thresh_duration` is possible.

The script is broken into three sections.
- Run the flash sorting, which creates HDF5 data files with VHF source data, their flash IDs, and a matching flash data table.
- Grab the flash-sorted files and create CF-compliant NetCDF grids
- Grab the grids and create PDF images of each grid

The grid spacing, boundaries, and frame intervals are configured at the begining of the gridding section of the script. This script creates regularly-spaced lat/lon grids, with the center grid cell size calculated to match the specified `dx_km` and `dy_km`. It is also possible to grid directly in a map projection of choice by changing `proj_name`, as well as `x_name` and `y_name` in the call to `make_plot`. For instance, a geostationary projection can be obtained with `proj='geos'` as described in the [documentation for the proj4 coordinate system library](http://trac.osgeo.org/proj/wiki/proj%3Dgeos).

The PDF images are created as small-multiple plots, with the number of columns given by `n_cols` at the beginning of the plotting section.

An example of reading and working with the resulting data files is found in the "Reading the flash-sorted files.ipynb"

As described below, additional scripts perform follow-on analysis.
- Assigning NLDN strokes to the best-matching flash
- Using a storm cell or storm region polygon to subset some flashes from the data files.
    - Creating time series plots of moments of the flash size distribution
    - Creating ASCII files of flash size and rate statistics


The IOP bounding box file included here is a rectangular lat/lon box, but the underlying code works with arbitrary polygons. Adapting the existing code to polygons is mostly a matter of reading in polygon vertices and sending its vertices instead of those for a rectangle.




In [3]:
%%bash

cat /data/GLM-wkshp/flashsort/environment_setup.README

8 September 2015 
Eric Bruning eric.bruning@ttu.edu

This document provides details about running a flash-sorting analysis, including
producing the flash time series statistics. These scripts are largely those used
for processing the DC3 dataset at TTU, and are suitable for (re)processing a large number of cases.

Python setup
------------

This analysis is run using the Anaconda Python distribution, with the primary
needs being numpy, scipy, matplotlib, pupynere, and pyproj.

After installing anaconda to your home directory, go to your home directory,
without anaconda already in $PATH

./anaconda/bin/conda create -n LMA --clone root

cd anaconda/envs/LMA/ source ./anaconda/bin/activate LMA

This sets up an environment with a python environment with just the necessary pieces for this LMA analysis.

then pip install git+http://github.com/deeplycloudy/lmatools
git+http://github.com/deeplycloudy/stormdrain

If you have to pause and return later then simply cd home and

source ./anaconda/b