Major updates to plotting, new dependencies, several new helper scrip…

…ts, and now functional in Python 2.7 and 3.5.
ahotovec · Mar 8, 2018 · 0bce911 · 0bce911
1 parent 9b8b029
commit 0bce911
Show file tree

Hide file tree

Showing 24 changed files with 1,160 additions and 304 deletions.
diff --git a/README.md b/README.md
@@ -1,23 +1,34 @@
 <img src="https://raw.githubusercontent.com/ahotovec/REDPy/master/img/logo.png" width=800 alt="REDPy Logo" />
 
 ## Overview
-REDPy (Repeating Earthquake Detector in Python) is a tool for automated detection and analysis of repeating earthquakes in continuous data. It works without any previous assumptions of what repeating seismicity looks like (that is, does not require a template event). Repeating earthquakes are clustered into "families" via the OPTICS ([Ordering Points To Identify the Clustering Structure](https://en.wikipedia.org/wiki/OPTICS_algorithm)) algorithm, with distance defined by cross-correlation. All data, including waveforms, are stored in an HDF5 table using [PyTables](http://www.pytables.org/).
+REDPy (Repeating Earthquake Detector in Python) is a tool for automated detection and analysis of repeating earthquakes in continuous data. It works without any previous assumptions of what repeating seismicity looks like (that is, does not require a template event). Repeating earthquakes are clustered into "families" based on cross-correlation across multiple stations. All data, including waveforms, are stored in an HDF5 table using [PyTables](http://www.pytables.org/).
 
 ## Installation
-REDPy runs on Python 2.7, and currently has the following major package dependencies:  
-[numpy](http://www.numpy.org/) | [scipy](http://www.scipy.org/) | [matplotlib with basemap add-on](http://www.matplotlib.org/) | [obspy](http://www.obspy.org/) | [pytables](http://www.pytables.org/) | [pandas](http://pandas.pydata.org/) | [bokeh](http://bokeh.pydata.org/) 
+REDPy runs on Python 2.7 and Python 3.5, with the following major package dependencies:  
+[numpy](http://www.numpy.org/) | [scipy](http://www.scipy.org/) | [matplotlib](http://www.matplotlib.org/) | [obspy](http://www.obspy.org/) | [pytables](http://www.pytables.org/) | [pandas](http://pandas.pydata.org/) | [bokeh](http://bokeh.pydata.org/) | [cartopy](http://scitools.org.uk/cartopy/)
 
-These dependencies can be easily installed via [Anaconda](https://www.continuum.io/) on the command line:
+These dependencies can be easily installed via [Anaconda](https://www.continuum.io/) on the command line. I *highly* recommend using a virtual environment so that your REDPy environment does not conflict with any other Python packages you may be using. This can be done with the following commands:
 ```
->> conda install -c obspy obspy pytables pandas basemap bokeh=0.9.3 mock nose PIL
+>> conda create -n redpy python=3.5
+>> source activate redpy
+>> conda install -c obspy obspy
+>> conda install -c conda-forge cartopy shapely=1.5.17
+>> conda install pytables bokeh pandas
 ```
+You may either use `python=3.5` or `python=2.7`, but other versions are not supported. Whenever you intend to run REDPy, be sure to `source activate redpy` and then `source deactivate` when you are done.
 
 ## Usage
-Once dependencies are installed and REDPy is downloaded, REDPy can be run out of the box with the following commands to test if the code is working on your computer:  
+Once dependencies are installed and REDPy is downloaded, REDPy can be run out of the box with the following commands to test if the code is working on your computer. If it completes without error, it will produce files in a folder named `default` after several minutes.
 ```
 >> python initialize.py
->> python catfill.py mshcat.csv
->> python backfill.py -s 2004-09-15 -e 2004-09-24
+>> python catfill.py -v mshcat.csv
+>> python backfill.py -v -s 2004-09-15 -e 2004-09-24
 ```
 
 Check out the [Wiki](https://github.com/ahotovec/REDPy/wiki) for more detailed usage!
+
+## Reference
+
+If you would like to reference REDPy in your paper, please cite the following abstract until I finish writing the *Electronic Seismologist* paper for it:
+
+> Hotovec-Ellis, A.J., and Jeffries, C., 2016. Near Real-time Detection, Clustering, and Analysis of Repeating Earthquakes: Application to Mount St. Helens and Redoubt Volcanoes – *Invited*, presented at Seismological Society of America Annual Meeting, Reno, Nevada, 20 Apr.
diff --git a/backfill.py b/backfill.py
@@ -1,5 +1,5 @@
 # REDPy - Repeating Earthquake Detector in Python
-# Copyright (C) 2016  Alicia Hotovec-Ellis (ahotovec@gmail.com)
+# Copyright (C) 2016-2018  Alicia Hotovec-Ellis (ahotovec@gmail.com)
 # Licensed under GNU GPLv3 (see LICENSE.txt)
 
 import argparse
@@ -9,21 +9,16 @@
 from obspy import UTCDateTime
 import time
 
-# Added this to remove the slew of warnings obspy/numpy was throwing at me
-import warnings
-warnings.filterwarnings("ignore")
-
 """
 Run this script to fill the table with data from the past. If a start time is not
 specified, it will check the attributes of the repeater table to pick up where it left
 off. Additionally, if this is the first run and a start time is not specified, it will
 assume one time chunk prior to the end time. If an end time is not specified, "now" is
 assumed. The end time updates at the end of each time chunk processed (default: by hour,
 set in configuration). This script can be run as a cron job that will pick up where it
-left off if a chunk is missed, but will not run until a full chunk of time has elapsed
-since the last trigger. Use -n if you are backfilling with a large amount of time; it will
-consume less time downloading the data in small chunks if NSEC is an hour or a day instead
-of a few minutes.
+left off if a chunk is missed. Use -n if you are backfilling with a large amount of time;
+it will consume less time downloading the data in small chunks if NSEC is an hour or a day
+instead of a few minutes, but at the cost of keeping orphans for longer.
  
 usage: backfill.py [-h] [-v] [-s STARTTIME] [-e ENDTIME] [-c CONFIGFILE] [-n NSEC]
 
@@ -103,16 +98,20 @@
             endtime = tend
         st, stC = redpy.trigger.getData(tstart+n*opt.nsec-opt.atrig, endtime, opt)
         alltrigs = redpy.trigger.trigger(st, stC, rtable, opt)
-    except (TypeError, obspy.fdsn.header.FDSNException, Exception):
+    except (TypeError, obspy.clients.fdsn.header.FDSNException, Exception):
         print('Could not download or trigger data... moving on')
         alltrigs = []
 
 	# Clean out data spikes etc.
-    trigs, junk = redpy.trigger.dataclean(alltrigs, opt, flag=1)
+    trigs, junk, junkFI, junkKurt = redpy.trigger.dataClean(alltrigs, opt, flag=1)
 
 	# Save junk triggers in separate table for quality checking purposes
     for i in range(len(junk)):
-        redpy.table.populateJunk(jtable, junk[i], 0, opt)
+        redpy.table.populateJunk(jtable, junk[i], 2, opt) # Both types of junk
+    for i in range(len(junkKurt)):
+        redpy.table.populateJunk(jtable, junkKurt[i], 1, opt) # Just kurtosis junk
+    for i in range(len(junkFI)):
+        redpy.table.populateJunk(jtable, junkFI[i], 0, opt) # Just 'teleseisms'
 
     # Append times of triggers to ttable to compare total seismicity later
     redpy.table.populateTriggers(ttable, trigs, ttimes, opt)
@@ -168,7 +167,7 @@
 print("Caught up to: {}".format(endtime-opt.atrig))
 
 if args.verbose: print("Updating plots...")
-redpy.plotting.createPlots(rtable, ftable, ttable, ctable, otable, jtable, opt)
+redpy.plotting.createPlots(rtable, ftable, ttable, ctable, otable, opt)
 
 if args.verbose: print("Closing table...")
 h5file.close()

diff --git a/catfill.py b/catfill.py
@@ -1,5 +1,5 @@
 # REDPy - Repeating Earthquake Detector in Python
-# Copyright (C) 2016  Alicia Hotovec-Ellis (ahotovec@gmail.com)
+# Copyright (C) 2016-2018  Alicia Hotovec-Ellis (ahotovec@gmail.com)
 # Licensed under GNU GPLv3 (see LICENSE.txt)
 
 import argparse
@@ -10,10 +10,6 @@
 import time
 import pandas as pd
 
-# Added this to remove the slew of warnings obspy/numpy was throwing at me
-import warnings
-warnings.filterwarnings("ignore")
-
 """
 Run this script to fill the table with data from the past using a catalog of events.
  
@@ -80,11 +76,15 @@
         alltrigs = []
 
 	# Clean out data spikes etc.
-    trigs, junk = redpy.trigger.dataclean(alltrigs, opt, flag=1)
+    trigs, junk, junkFI, junkKurt = redpy.trigger.dataClean(alltrigs, opt, flag=1)
 
 	# Save junk triggers in separate table for quality checking purposes
     for i in range(len(junk)):
-        redpy.table.populateJunk(jtable,junk[i],0,opt)
+        redpy.table.populateJunk(jtable, junk[i], 2, opt)
+    for i in range(len(junkKurt)):
+        redpy.table.populateJunk(jtable, junkKurt[i], 1, opt)
+    for i in range(len(junkFI)):
+        redpy.table.populateJunk(jtable, junkFI[i], 0, opt)
 
     # Append times of triggers to ttable to compare total seismicity later
     redpy.table.populateTriggers(ttable, trigs, ttimes, opt)
@@ -130,7 +130,7 @@
 
 if len(rtable) > 1:
     if args.verbose: print("Creating plots...")
-    redpy.plotting.createPlots(rtable, ftable, ttable, ctable, otable, jtable, opt)
+    redpy.plotting.createPlots(rtable, ftable, ttable, ctable, otable, opt)
 else:
     print("No repeaters to plot.")
 

diff --git a/clearJunk.py b/clearJunk.py
@@ -0,0 +1,55 @@
+# REDPy - Repeating Earthquake Detector in Python
+# Copyright (C) 2016-2018  Alicia Hotovec-Ellis (ahotovec@gmail.com)
+# Licensed under GNU GPLv3 (see LICENSE.txt)
+
+import redpy.config
+import redpy.table
+import redpy.plotting
+import argparse
+import numpy as np
+import os
+
+"""
+Run this script to clear the contents of the junk table.
+
+usage: clearJunk.py [-h] [-v] [-c CONFIGFILE]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         increase written print statements
+  -c CONFIGFILE, --configfile CONFIGFILE
+                        use configuration file named CONFIGFILE instead of
+                        default settings.cfg
+"""
+
+parser = argparse.ArgumentParser(description=
+    "Run this script to clear the contents of the junk table.")
+parser.add_argument("-v", "--verbose", action="count", default=0,
+    help="increase written print statements")
+parser.add_argument("-c", "--configfile",
+    help="use configuration file named CONFIGFILE instead of default settings.cfg")
+args = parser.parse_args()
+
+if args.configfile:
+    opt = redpy.config.Options(args.configfile)
+    if args.verbose: print("Using config file: {0}".format(args.configfile))
+else:
+    opt = redpy.config.Options("settings.cfg")
+    if args.verbose: print("Using config file: settings.cfg")
+
+if args.verbose: print("Opening hdf5 table: {0}".format(opt.filename))
+h5file, rtable, otable, ttable, ctable, jtable, dtable, ftable = redpy.table.openTable(opt)
+
+if args.verbose: print("Removing junk...")
+
+if len(jtable) > 1:
+# This will remove all but the last row (have to leave one)
+    for n in range(len(jtable)-1,0,-1):
+        jtable.remove_row(n)
+    jtable.flush()
+else:
+    if args.verbose: print("No junk to remove!")
+
+if args.verbose: print("Closing table...")
+h5file.close()
+if args.verbose: print("Done")
diff --git a/compareCatalog.py b/compareCatalog.py
@@ -0,0 +1,124 @@
+# REDPy - Repeating Earthquake Detector in Python
+# Copyright (C) 2016-2018  Alicia Hotovec-Ellis (ahotovec@gmail.com)
+# Licensed under GNU GPLv3 (see LICENSE.txt)
+
+import argparse
+import redpy
+import numpy as np
+import obspy
+from obspy import UTCDateTime
+import time
+import pandas as pd
+from matplotlib.dates import num2date
+
+"""
+Run this script to compare table with a specific catalog of events for agreement.
+ 
+usage: compareCatalog.py [-h] [-v] [-c CONFIGFILE] csvfile
+
+positional arguments:
+  csvfile               catalog csv file with required column 'Time UTC'
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         increase written print statements
+  -c CONFIGFILE, --configfile CONFIGFILE
+                        use configuration file named CONFIGFILE instead of
+                        default settings.cfg
+"""
+
+parser = argparse.ArgumentParser(description=
+    "Compares REDPy catalog with csv catalog")
+parser.add_argument("csvfile",
+    help="catalog csv file with required column 'Time UTC'")
+parser.add_argument("-v", "--verbose", action="count", default=0,
+    help="increase written print statements")
+parser.add_argument("-c", "--configfile",
+    help="use configuration file named CONFIGFILE instead of default settings.cfg")
+args = parser.parse_args()
+
+if args.configfile:
+    opt = redpy.config.Options(args.configfile)
+    if args.verbose: print("Using config file: {0}".format(args.configfile))
+else:
+    opt = redpy.config.Options("settings.cfg")
+    if args.verbose: print("Using config file: settings.cfg")
+
+if args.verbose: print("Opening hdf5 table: {0}".format(opt.filename))
+h5file, rtable, otable, ttable, ctable, jtable, dtable, ftable = redpy.table.openTable(opt)
+
+# Read in csv file using pandas
+df = pd.read_csv(args.csvfile)
+
+# First off, a huge assumption here is that the event times are simply within some number
+# of seconds of the trigger times. Unless otherwise necessary, I'm going to start off with
+# the assumption that they're within the length of time used for the correlation window.
+terr = opt.winlen/opt.samprate
+
+# I'll append the best candidate family and the number of seconds the event times differ,
+# so that the user can identify questionable matches. I may add the ability to assume a
+# location and do some simple ray-tracing like in checkComCat(), but at this point I'm
+# not convinced it's necessary. Column dt here is the REDPy trigger time - csv catalog
+# time; if it is negative, REDPy either triggered early or it may not be a match.
+df['Cluster'] = ''
+df['dt'] = ''
+
+# Set up times to compare
+evtimes = [UTCDateTime(i) for i in pd.to_datetime(df['Time UTC']).tolist()]
+rtimes = [UTCDateTime(i) for i in rtable.cols.startTime[:]]
+rtimes += rtable.cols.windowStart[:]/opt.samprate
+otimes = [UTCDateTime(i) for i in otable.cols.startTime[:]]
+otimes += otable.cols.windowStart[:]/opt.samprate
+jtimes = [UTCDateTime(i) for i in jtable.cols.startTime[:]]
+jtimes += jtable.cols.windowStart[:]/opt.samprate
+ttimes = [UTCDateTime(num2date(i)) for i in ttable.cols.startTimeMPL[:]]
+
+# Flatten families to list
+famlist = np.zeros((len(rtimes),)).astype(int)
+for fnum in range(len(ftable)):
+    members = np.fromstring(ftable[fnum]['members'], dtype=int, sep=' ')
+    famlist[members] = fnum
+
+for i in range(len(df)):
+
+    # See if there's junk that matches
+    if len(jtimes) > 0:
+        dtimesj = jtimes-evtimes[i]
+        bestjunk = dtimesj[np.argmin(np.abs(dtimesj))]
+        if np.abs(bestjunk) < terr:
+            df['Cluster'][i] = 'junk'
+            df['dt'][i] = bestjunk
+
+    # See if there are any expired orphans that match
+    if len(ttimes) > 0:
+        dtimest = np.array(ttimes)-evtimes[i]
+        besttrig = dtimest[np.argmin(np.abs(dtimest))]
+        if np.abs(besttrig) < terr:
+            df['Cluster'][i] = 'expired'
+            df['dt'][i] = besttrig
+
+    # See if there's an orphan that matches
+    if len(otimes) > 0:
+        dtimeso = otimes-evtimes[i]
+        bestorph = dtimeso[np.argmin(np.abs(dtimeso))]
+        if np.abs(bestorph) < terr:
+            df['Cluster'][i] = 'orphan'
+            df['dt'][i] = bestorph
+
+    # See if there's a repeater that matches
+    if len(rtimes) > 0:
+        dtimesr = rtimes-evtimes[i]
+        bestr = dtimesr[np.argmin(np.abs(dtimesr))]
+        if np.abs(bestr) < terr:
+            df['Cluster'][i] = famlist[np.argmin(np.abs(dtimesr))]
+            df['dt'][i] = bestr
+
+
+# Write to matches.csv
+if args.verbose: print("Saving to matches_{}.csv".format(opt.groupName))
+df.to_csv('matches_{}.csv'.format(opt.groupName), index=False)   
+
+if args.verbose: print("Closing table...")
+h5file.close()
+
+if args.verbose: print("Done")
diff --git a/createReport.py b/createReport.py
@@ -0,0 +1,55 @@
+# REDPy - Repeating Earthquake Detector in Python
+# Copyright (C) 2016-2018  Alicia Hotovec-Ellis (ahotovec@gmail.com)
+# Licensed under GNU GPLv3 (see LICENSE.txt)
+
+import redpy.config
+import redpy.table
+import argparse
+
+"""
+Run this script to manually produce a more detailed "report" page for a given family
+(or families)
+
+usage: createReport.py [-h] [-v] [-o] [-c CONFIGFILE] N [N ...]
+
+positional arguments:
+  N                     family number(s) to be reported on
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         increase written print statements
+  -o, --ordered         order plots by OPTICS
+  -c CONFIGFILE, --configfile CONFIGFILE
+                        use configuration file named CONFIGFILE instead of
+                        default settings.cfg
+"""
+
+parser = argparse.ArgumentParser(description=
+    "Run this script to manually remove families/clusters")
+parser.add_argument('famnum', metavar='N', type=int, nargs='+',
+    help="family number(s) to be reported on")
+parser.add_argument("-v", "--verbose", action="count", default=0,
+    help="increase written print statements")
+parser.add_argument("-o", "--ordered", action="count", default=0,
+    help="order plots by OPTICS")
+parser.add_argument("-c", "--configfile",
+    help="use configuration file named CONFIGFILE instead of default settings.cfg")
+args = parser.parse_args()
+
+if args.configfile:
+    opt = redpy.config.Options(args.configfile)
+    if args.verbose: print("Using config file: {0}".format(args.configfile))
+else:
+    opt = redpy.config.Options("settings.cfg")
+    if args.verbose: print("Using config file: settings.cfg")
+
+if args.verbose: print("Opening hdf5 table: {0}".format(opt.filename))
+h5file, rtable, otable, ttable, ctable, jtable, dtable, ftable = redpy.table.openTable(opt)
+
+for fnum in args.famnum:
+    if args.verbose: print("Creating report for family {}...".format(fnum))
+    redpy.plotting.plotReport(rtable, ftable, ctable, opt, fnum, args.ordered)
+
+if args.verbose: print("Closing table...")
+h5file.close()
+if args.verbose: print("Done")