Earthquake Catalog
==================

The following cells contain production code to:

(1) process raw CSV files extracted from the ANSS website into CSV files with Python datetime and timestamp.  
(2) filter earthquakes by:  
  * Magnitude
  * Latitudes and Longitudes (box coordinate - upper left; lower right)
  * Time

## Earthquake Catalog Extraction

You need to go onto the ANSS website and performed queries to generate CSV files:  
http://quake.geo.berkeley.edu/anss/catalog-search.html  

The code assumes you will be following the following naming convention:

alleq_mag5.csv - all earthquakes worldwide of magnitude > 5.0  
alleq_mag4.csv - all earthquakes worldwide of magnitude > 4.0  
alleq_mag3.csv - all earthquakes worldwide of magnitude > 3.0 

In [1]:
# As usual, a bit of setup

import time, os, json
import numpy as np
import matplotlib.pyplot as plt
import csv

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 6.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Data Conversion

The code below takes in earthquake data from a CVS file and do the following:  
* Convert date-time into Unix Timestamp and add an extra column
* Store each row of earthquake data into a dictionary with 5 fields - (1) date-time, (2) timestamp, (3) long, (4) lat, (5) magnitude, and (6) depth  
* Write the dictionary to a new CVS file

The new CSV files generated follow the following naming convention:

mag3_eq_dict.csv  
mag4_eq_dict.csv  
mag5_eq_dict.csv  

Note that we will need to perform some **manual** corrections due to some bad data in the ANSS dataset (hours is 32, or second is 60 and so on). For earthquake data of magnitude 3.0 and above, there are only 5 such errors so it is not worth writing code to catch and clean these errors.


In [2]:
from datetime import datetime
from data_util.data_analyze import *
import calendar

earthquakes = []
earthquake = {}

input_filename = 'earthquake_data/alleq_mag5.csv'
output_filename = 'earthquake_data/mag5_eq_dict.csv'

# Read in earthquake data from CVS file
with open(input_filename, 'rb') as f:
    reader = csv.DictReader(f, delimiter=',')
    fieldnames = reader.fieldnames
    
    for row in reader:
        # Convert UTC datetime to UNIX timestamp
        try:
            dt = datetime.strptime(row['DateTime'], "%Y/%m/%d %H:%M:%S.%f") 
        except ValueError:
            print row['DateTime']
        timestamp = datatime_2_timestamp(dt,utc=True)  # I wrote this in data_analyze.py
        
        # Store data into a list of dictionary with 5 fields
        earthquake["timestamp"] = timestamp
        earthquake["datetime"] = row['DateTime']        
        earthquake["lat"] = float(row['Latitude'])
        earthquake["long"] = float(row['Longitude'])
        try:
            earthquake["depth"] = float(row['Depth'])
        except ValueError:
            earthquake["depth"] = np.nan
            
        earthquake["magnitude"] = float(row['Magnitude']) 
        earthquakes.append(earthquake.copy())
               
f.close()

# Write to a CVS file mapping the dictionaries to output rows.
with open(output_filename, 'w') as csvfile:
    fieldnames = ['timestamp', 'datetime', 'long', 'lat', 'magnitude', 'depth']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for quake in earthquakes:
        writer.writerow({'timestamp':quake['timestamp'], 'datetime':quake['datetime'], 'long':quake["long"], 
                         'lat':quake["lat"], 'magnitude':quake['magnitude'], 'depth':quake['depth']})
f.close()

## Filter by Time Frame

I have written a function in ['data_util/eq_data_util.py'](data_util/eq_data_util.py) that extract earthquake data by time window and latitude/longitude box.

This is very useful for generating truth labels for evaluating the efficacy of earthquake precursors or for supervised learning.

In [5]:
from datetime import datetime
from data_util.eq_data_util import *


input = 'earthquake_data/mag5_eq_dict.csv'
output = 'earthquake_data/mag5_eq_kodiak_10-25-2014_07-31-2016.csv'

begin = 1414219500  # 2014-10-25 06:45:00
end = 1469923200  # 2016-07-31 00:00:00
Alaska_Kodiak_Window = ((62.2,-158.3),(52.2,-148.3))  # Kodiak Alaska is at 57.2 lat, -153.3 long

num_eq = extract_eq_data(input, output, timewindow=(1414219500,1469923200), latlong=Alaska_Kodiak_Window)
print datetime.utcfromtimestamp(1414219500)
print datetime.utcfromtimestamp(1469923200)
print num_eq

2014-10-25 06:45:00
2016-07-31 00:00:00
8
