Earthquake Catalog
==================

I have gone on the ANSS website and performed queries to generate 3 CSV files:  
http://quake.geo.berkeley.edu/anss/catalog-search.html  

alleq_mag5.csv - all earthquakes worldwide of magnitude > 5.0  
alleq_mag4.csv - all earthquakes worldwide of magnitude > 4.0  
alleq_mag3.csv - all earthquakes worldwide of magnitude > 3.0  

In this Notebook, we will develop a couple of functions to do the following:
* Print out some sample data
* Convert time (from datetime to UNIX timestamp)
* Filter earthquake data by:
  * Magnitude
  * Latitudes and Longitudes (box coordinate - upper left; lower right)
  * Time


In [1]:
# As usual, a bit of setup

import time, os, json
import numpy as np
import matplotlib.pyplot as plt
import csv

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 6.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Count earthquakes
I have written a function in data_analyze.py to count the number of rows in a cvs file.

* alleq_mag5.csv - 84,838 earthquake datapoints
* alleq_mag4.csv - 420,078 earthquake datapoints
* alleq_mag3.csv - 627,484 earthquake datapoints

In [2]:
from data_util.data_analyze import *

num = num_datapoints('earthquake_data/alleq_mag3.csv', dict=True)  # The data has header
print num

627484


## Print some data

In [1]:
# Run some setup code for this notebook.
from data_util.data_analyze import *
import random
import numpy as np
import matplotlib.pyplot as plt
import csv

file = 'earthquake_data/alleq_mag4.csv'
size = num_datapoints(file, dict=True)  # The data has header

with open(file, 'rb') as f:
    reader = csv.reader(f, delimiter=' ')
    i = 0
    for row in reader:
        i += 1
        if i is 1:
            print "There are %d items per row of data" %len(row)    

        if i < 10:
            print row   # Print first 10 rows
            
        if i > size-10:    
            print row   # Print last 10 rows  
            
f.close()

There are 1 items per row of data
['DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID']
['1898/06/29', '18:36:00.00,52.0000,172.0000,0.00,7.60,ML,0,,,,AK,']
['1898/10/11', '16:37:32.70,50.7100,-179.5000,0.00,6.90,ML,0,,,,AK,']
['1899/07/14', '13:32:00.00,60.0000,-150.0000,0.00,7.20,ML,0,,,,AK,']
['1899/09/04', '00:22:00.00,60.0000,-142.0000,25.00,8.30,ML,0,,,,AK,']
['1899/09/04', '04:40:00.00,60.0000,-142.0000,0.00,6.90,ML,0,,,,AK,']
['1899/09/10', '17:25:00.00,60.0000,-140.0000,25.00,7.80,ML,0,,,,AK,']
['1899/09/10', '21:40:00.00,60.0000,-140.0000,25.00,8.60,ML,0,,,,AK,']
['1899/09/17', '12:50:00.00,59.0000,-136.0000,0.00,6.90,ML,0,,,,AK,']
['2016/12/30', '11:23:07.91,16.2416,-87.9808,21.25,4.40,Mb,,40,2,1.40,us,201612302024']
['2016/12/30', '12:56:13.69,-30.5957,-177.7777,25.12,5.40,Mb,,59,8,0.96,us,201612302026']
['2016/12/30', '13:11:20.78,-32.7096,-179.6688,10.00,4.80,Mb,,188,5,0.95,us,201612302027']
['2016/12/30', '14:51:58.37,-30.6792

## Describe earthquakes
This function describes an earthquake event in natural language.

In [4]:
with open('earthquake_data/alleq_mag4.csv', 'rb') as f:
    reader = csv.DictReader(f, delimiter=',')
    fieldnames = reader.fieldnames

    print 'ANSS Mag 4+ Earthquakes'
    print fieldnames

    i = 0
    for row in reader:
        i += 1
        if i > 3:
            break

        print "The earthquake started on %s" % row['DateTime']
        print "Epicenter is located at %.2f lat, %.2f long" %(float(row['Latitude']), float(row['Longitude']))
        print "Magnitude is %.2f, with depth of %.2f \n" %(float(row['Magnitude']), float(row['Depth']))

f.close()


ANSS Mag 4+ Earthquakes
['DateTime', 'Latitude', 'Longitude', 'Depth', 'Magnitude', 'MagType', 'NbStations', 'Gap', 'Distance', 'RMS', 'Source', 'EventID']
The earthquake started on 1898/06/29 18:36:00.00
Epicenter is located at 52.00 lat, 172.00 long
Magnitude is 7.60, with depth of 0.00 

The earthquake started on 1898/10/11 16:37:32.70
Epicenter is located at 50.71 lat, -179.50 long
Magnitude is 6.90, with depth of 0.00 

The earthquake started on 1899/07/14 13:32:00.00
Epicenter is located at 60.00 lat, -150.00 long
Magnitude is 7.20, with depth of 0.00 



## Number of earthquakes based on magnitudes:

Mag 8+ -   
Mag 7+ -  
Mag 6+ -   
Mag 5+ - 84,838  
Mag 4+ - 420,078  
Mag 3+ - 627,484  

In [5]:
from data_util.data_conversion import *

num = num_datapoints('earthquake_data/alleq_mag3.csv', dict=True)  # The data has header
print num

627484


## UTC Datetime --> UNIX Timestamp

It is rather cumbersome to work with 6 seperate columns of data denoting year, month, date, hour, minute and second. So we convert it to Unix timestamp.

In [6]:
import datetime
import calendar

dt = datetime.datetime(2017, 5, 1, 0, 0, 0)
timestamp = calendar.timegm(dt.timetuple())
print timestamp

1493596800


## Convert to Dictionary
The code below takes in earthquake data from a CVS file and do the following:  
* Convert date-time into Unix Timestamp and add an extra column
* Store each row of earthquake data into a dictionary with 5 fields - (1) date-time, (2) timestamp, (3) long, (4) lat, (5) magnitude, and (6) depth  
* Write the dictionary to a new CVS file

The CSV files generated are mag3_eq_dict.csv, mag4_eq_dict.csv and mag5_eq_dict.csv. Note that we need to perform some **manual** corrections due to bad data in the ANSS dataset.

In [8]:
from datetime import datetime
from data_util.data_analyze import *
import calendar

earthquakes = []
earthquake = {}

# Read in earthquake data from CVS file
with open('earthquake_data/alleq_mag3.csv', 'rb') as f:
    reader = csv.DictReader(f, delimiter=',')
    fieldnames = reader.fieldnames
    
    for row in reader:
        # Convert UTC datetime to UNIX timestamp
        try:
            dt = datetime.strptime(row['DateTime'], "%Y/%m/%d %H:%M:%S.%f") 
        except ValueError:
            print row['DateTime']
        timestamp = datatime_2_timestamp(dt,utc=True)  # I wrote this in data_analyze.py
        
        # Store data into a list of dictionary with 5 fields
        earthquake["timestamp"] = timestamp
        earthquake["datetime"] = row['DateTime']        
        earthquake["lat"] = float(row['Latitude'])
        earthquake["long"] = float(row['Longitude'])
        try:
            earthquake["depth"] = float(row['Depth'])
        except ValueError:
            earthquake["depth"] = np.nan
            
        earthquake["magnitude"] = float(row['Magnitude']) 
        earthquakes.append(earthquake.copy())
               
f.close()

# Write to a CVS file mapping the dictionaries to output rows.
with open('earthquake_data/mag3_eq_dict.csv', 'w') as csvfile:
    fieldnames = ['timestamp', 'datetime', 'long', 'lat', 'magnitude', 'depth']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for quake in earthquakes:
        writer.writerow({'timestamp':quake['timestamp'], 'datetime':quake['datetime'], 'long':quake["long"], 
                         'lat':quake["lat"], 'magnitude':quake['magnitude'], 'depth':quake['depth']})
f.close()

## Filter by Time Frame

I have written a function in ['data_util/eq_data_util.py'](data_util/eq_data_util.py) that extract earthquake data by time window and latitude/longitude box.

In [8]:
from datetime import datetime
from data_util.eq_data_util import *

input = 'earthquake_data/mag5_eq_dict.csv'
output = 'earthquake_data/mag5_eq_timeframe.csv'

begin = 1414219500  # 2014-10-25 06:45:00
end = 1469923200  # 2016-07-31 00:00:00

num_eq = extract_eq_data(input, output, timewindow=(1414219500,1469923200), latlong=((90,180),(-90,-180)))

print num_eq



2714


## Extract Earthquakes for Alaska Kodiak station 7

I have written a function in ['data_util/eq_data_util.py'](data_util/eq_data_util.py) that extract earthquake data by time window and latitude/longitude box.

* Number of magnitude 4+ earthquakes occurring in the lat/long box defined by ((59.2,-155.3),(55.2,-151.3)) from 2014-12-03 to 2016-07-31. There are 11 such events.

* Number of magnitude 5+ earthquakes occurring in the lat/long box defined by ((59.2,-155.3),(55.2,-151.3)) from 2014-12-03 to 2016-07-31. There are zero such events.


In [9]:
from datetime import datetime
from data_util.eq_data_util import *

input = 'earthquake_data/mag5_eq_dict.csv'
output = 'earthquake_data/mag5_eq_Kodiak_20141203_20160731.csv'

Alaska_Kodiak = ((59.2,-155.3),(55.2,-151.3))  # Kodiak Alaska is at 57.2 lat, -153.3 long
num_eq = extract_eq_data(input, output, timewindow=(1417585200,1469923200), latlong=Alaska_Kodiak)

print datetime.utcfromtimestamp(1417585200)
print datetime.utcfromtimestamp(1469923200)
print num_eq

2014-12-03 05:40:00
2016-07-31 00:00:00
0


## Mag 5+ Quakes - Wider Window

we then extract earthquakes occurring in a wider lat/long box defined by ((62.2,-158.3),(52.2,-148.3)) from 2014-12-03 to 2016-07-31.

* Number of magnitude 4+:  There are 100 such events.

* Number of magnitude 5+: There are 8 such events.


In [11]:
from datetime import datetime
from data_util.eq_data_util import *

input = 'earthquake_data/mag4_eq_dict.csv'
output = 'earthquake_data/mag4_eq_Kodiak_20141203_20160731_wider2.csv'

Alaska_Kodiak = ((62.2,-158.3),(52.2,-148.3))  # Kodiak Alaska is at 57.2 lat, -153.3 long
num_eq = extract_eq_data(input, output, timewindow=(1417585200,1469923200), latlong=Alaska_Kodiak)

print datetime.utcfromtimestamp(1417585200)
print datetime.utcfromtimestamp(1469923200)
print num_eq

2014-12-03 05:40:00
2016-07-31 00:00:00
100


## Mag 4+ Quakes - Wider Window

we then extract all magnitude 4+ earthquakes occurring in the lat/long box defined by ((62.2,-158.3),(52.2,-148.3)) from 2014-12-03 to 2016-07-31. There are 100 such events.

In [12]:
from datetime import datetime
from data_util.eq_data_util import *

input = 'earthquake_data/mag4_eq_dict.csv'
output = 'earthquake_data/mag4_eq_Kodiak_20141203_20160731_wider2.csv'

Alaska_Kodiak = ((62.2,-158.3),(52.2,-148.3))  # Kodiak Alaska is at 57.2 lat, -153.3 long
num_eq = extract_eq_data(input, output, timewindow=(1417585200,1469923200), latlong=Alaska_Kodiak)

print datetime.utcfromtimestamp(1417585200)
print datetime.utcfromtimestamp(1469923200)
print num_eq

2014-12-03 05:40:00
2016-07-31 00:00:00
100


## UNIX Timestamp --> Datatime

In [5]:
from datetime import datetime
from data_util.eq_data_util import *

print datetime.utcfromtimestamp(1449706208)


2015-12-10 00:10:08
