## Proyecto de minería de datos 2019-1: fase 2 - Obtención de terremotos y cruce con tsunamis

- Instalación de biblioteca COMCAT: https://github.com/usgs/libcomcat

- Documentación: http://usgs.github.io/libcomcat/apidoc/libcomcat.search.html

In [170]:
from libcomcat.search import search,count,get_event_by_id
import datetime
import os.path
import pandas as pd
import numpy as np


In [172]:
data = pd.read_csv('tsunamidf.csv').astype({'MONTH': 'int64', 'DAY': 'int64', 'HOUR': 'int64'})
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 537 entries, 0 to 536
Data columns (total 13 columns):
YEAR                 537 non-null int64
MONTH                537 non-null int64
DAY                  537 non-null int64
HOUR                 537 non-null int64
MINUTE               536 non-null float64
SECOND               498 non-null float64
CAUSE_CODE           537 non-null float64
FOCAL_DEPTH          537 non-null float64
LATITUDE             537 non-null object
LONGITUDE            537 non-null object
PRIMARY_MAGNITUDE    537 non-null float64
COUNTRY              537 non-null object
EVENT_VALIDITY       537 non-null int64
dtypes: float64(5), int64(5), object(3)
memory usage: 54.7+ KB


Criterios de busqueda:
- Se busca en base a la fecha del tsunami, considerando hasta la hora en que ocurrió, con un margen de error de +-1 hora. Esto puede arrojar mas de un terremoto asociado a un tsunami, lo cual no deja de ser interesante de analizar.

- Se utiliza la magnitud minima.

In [155]:
def X_matching(data):
    query = -1
    flag = 1
    for idx,row in data.iterrows():
        date = datetime.datetime(row['YEAR'],row['MONTH'],row['DAY'],int(row['HOUR']))
        time_str = str(date.year)+'-'+str(date.month)+'-'+str(date.day)
        eventlist = search(starttime=date-datetime.timedelta(hours=1),
                           endtime=date+datetime.timedelta(hours=1),
                           minmagnitude=row['PRIMARY_MAGNITUDE'])
        if len(eventlist)==0:
            continue
        else:
            for event in eventlist:
                depth = event.depth
                mag = event['mag']
                gap = event['gap']
                magType = event['magType']
                dmin = event['dmin']
                rms = event['rms']
                place = event['place']
                long = event.longitude
                lat = event.latitude
                time = event.time
                if flag:
                    query = np.array([[time, lat, long, depth, gap, mag, magType, dmin, rms, place, 1, idx]])
                    flag=0
                else:
                    row = np.array([[time, lat, long, depth, gap, mag, magType, dmin, rms, place, 1, idx]])
                    query = np.append(query, row, axis=0)
    if type(query) != int:
        columns=['DATE','LATITUDE','LONGITUDE','DEPTH','GAP','MAGNITUDE',
                 'MAGTYPE','DMIN','RMS','PLACE','TSUNAMI', 'IDX']
        query = pd.DataFrame(query,columns=columns)
    return query
matched = X_matching(data)

224

294

298

322

352

410

437

442

#### Tsunamis con ocurrencias de más de 1 terremoto en la hora.

Unnamed: 0,DATE,LATITUDE,LONGITUDE,DEPTH,GAP,MAGNITUDE,MAGTYPE,DMIN,RMS,PLACE,TSUNAMI,IDX
125,1977-04-20 23:42:50.500,-9.89,160.348,19.0,,7.5,ms,,,Solomon Islands,1,224
126,1977-04-20 23:49:13.100,-9.844,160.822,33.0,,7.5,ms,,,Solomon Islands,1,224
174,1990-02-08 07:15:32.230,9.755,124.694,25.9,,6.8,mw,,1.4,"Bohol, Philippines",1,294
175,1990-02-08 07:46:59.780,9.725,124.625,30.3,,6.6,mw,,1.2,"Bohol, Philippines",1,294
179,1990-09-23 20:33:49.730,-6.726,130.373,33.0,,6.5,ms,,1.1,Banda Sea,1,298
180,1990-09-23 21:13:07.460,33.267,138.643,10.0,,6.5,mw,,1.3,"Izu Islands, Japan region",1,298
202,1994-06-05 01:09:30.150,24.511,121.905,11.4,,6.4,mwb,,1.2,Taiwan,1,322
203,1994-06-05 01:45:02.160,-10.349,113.398,25.9,,6.1,mw,,1.3,"south of Java, Indonesia",1,322
232,1996-10-19 14:44:40.790,31.885,131.468,22.0,,6.7,mwc,,1.02,"Kyushu, Japan",1,352
233,1996-10-19 14:53:48.780,-20.412,-178.51,590.8,,6.9,mwc,,0.82,Fiji region,1,352


Se verifica si se tiene la fecha exacta de los que salieron con 2 terremotos o mas asociados. Descartando de la lista de terremotos que produjeron tsunamis los que no lo hicieron.

In [287]:
anti_drop = []
ready = []
dupl = matched[matched.duplicated(subset=['IDX'], keep=False)]
for idx in dupl.index:
    if int(data['MINUTE'].loc[dupl['IDX'].loc[idx]]) == dupl['DATE'].loc[idx].minute:
        anti_drop.append(idx)
        ready.append(dupl['IDX'].loc[idx])
        dupl = dupl.drop(index=idx)
        # significa que encontro el asociado a la fecha exacta.
for idx, row in dupl[dupl.duplicated(subset=['IDX'])].iterrows():
    if len(dupl[dupl['IDX']==row['IDX']])>1:
        for idx2, row2 in dupl[dupl['IDX']==row['IDX']].iterrows():
            dupl = dupl.drop(index=idx2)
            # como no se sabe con exactitud, se conservaran todos aquellos que se tengan.
# se descartan los duplicados restantes
matched = matched.drop(index=dupl.index)

Exportando los terremotos listos

In [290]:
matched = matched.drop(columns=['IDX'])
matched.to_csv('matched.csv', index=False)