IMT 2200 - Introducción a Ciencia de Datos<br>
**Pontificia Universidad Católica de Chile**<br>
**Instituto de Ingeniería Matemática y Computacional**<br>
**Semestre 2025-S2**<br>
**Profesor:** Rodrigo A. Carrasco <br>

---

## Proyecto de Investigación – Terremotos en Chile





## 1. Objetivos

**-Obtener información de terremotos en Chile**<br>
**-Calcular los mayores terremotos por región**<br> 
**-Calcular las zonas de mayores riesgos para terremotos comparando la escala de mercalli con la de richer**<br> 
**-Crear un modelo que pueda predecir en base a las estadísticas los posibles lugares de riegos**<br>
**-Crear un promedio de tiempo en el que pueda predecir cada cuanto puede acontecer un terremoto**<br>


## Datos

Los datos serán extraídos de https://www.usgs.gov/programs/earthquake-hazards/earthquakes. Principalmente del link, https://earthquake.usgs.gov/earthquakes/search/ . Donde se pueden filtrar los terremotos por zonas, año  y magnitud, generando un archivo .csv con la información.

In [38]:
import pandas as pd
from pathlib import Path

## Limpieza de datos

La apertura del documento descargado podemos observar que el Data Frame generado cuenta con más de 10.000 filas, y con 22 columnas, con información de tiempo, ubicación, magnitud, entre otros.

In [39]:
ruta = Path("query.csv")
df = pd.read_csv(ruta) 
df.columns


Index(['time', 'latitude', 'longitude', 'depth', 'mag', 'magType', 'nst',
       'gap', 'dmin', 'rms', 'net', 'id', 'updated', 'place', 'type',
       'horizontalError', 'depthError', 'magError', 'magNst', 'status',
       'locationSource', 'magSource'],
      dtype='object')

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10407 entries, 0 to 10406
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   time             10407 non-null  object 
 1   latitude         10407 non-null  float64
 2   longitude        10407 non-null  float64
 3   depth            10407 non-null  float64
 4   mag              10407 non-null  float64
 5   magType          10407 non-null  object 
 6   nst              4033 non-null   float64
 7   gap              5882 non-null   float64
 8   dmin             2799 non-null   float64
 9   rms              8082 non-null   float64
 10  net              10407 non-null  object 
 11  id               10407 non-null  object 
 12  updated          10407 non-null  object 
 13  place            10407 non-null  object 
 14  type             10407 non-null  object 
 15  horizontalError  2563 non-null   float64
 16  depthError       5446 non-null   float64
 17  magError    

In [41]:
df.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2025-09-01T13:48:01.759Z,-30.6105,-71.2446,61.209,4.6,mb,50.0,71.0,0.342,0.81,...,2025-09-01T17:26:31.440Z,"4 km WSW of Ovalle, Chile",earthquake,5.13,5.452,0.109,25.0,reviewed,us,us
1,2025-08-29T21:48:43.387Z,-18.9492,-69.4232,114.517,4.6,mb,53.0,103.0,0.616,0.98,...,2025-08-30T03:24:45.040Z,"40 km N of Camiña, Chile",earthquake,5.16,5.411,0.079,49.0,reviewed,us,us
2,2025-08-25T18:01:03.119Z,-21.7142,-68.4897,118.465,4.7,mb,44.0,63.0,0.704,1.19,...,2025-08-25T18:16:37.040Z,"59 km SSW of Ollagüe, Chile",earthquake,4.54,5.964,0.06,85.0,reviewed,us,us
3,2025-08-24T03:44:08.460Z,-19.3888,-69.2674,100.534,5.0,mb,59.0,104.0,0.287,1.04,...,2025-08-24T15:32:52.214Z,"18 km ESE of Camiña, Chile",earthquake,6.76,5.716,0.029,391.0,reviewed,us,us
4,2025-08-23T09:07:59.103Z,-32.5045,-71.5221,53.111,4.7,mb,49.0,133.0,0.527,0.84,...,2025-08-24T15:09:12.040Z,"27 km WSW of La Ligua, Chile",earthquake,2.93,5.576,0.097,32.0,reviewed,us,us


Observando los datos vamos a partir editando la columnas de time para crear 2 columnas nuevas, una con el nombre "date" con la fecha del terremoto y la otra como "hour" con la hora, ambas columnas las transformaremos al formato de fecha.

In [114]:
df_copy = df.copy()
df_copy["dates"] = df_copy["time"].apply(lambda x: x.split("T"))
df_copy["date"] = df_copy["dates"].apply(lambda x:  pd.to_datetime(x[0]).strftime("%Y-%m-%d"))
df_copy["hour"] = df_copy["dates"].apply(lambda x: pd.to_datetime(x[1][:-1]).strftime("%H:%M:%S"))
df_copy

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,dates,date,hour
0,2025-09-01T13:48:01.759Z,-30.6105,-71.2446,61.209,4.60,mb,50.0,71.0,0.342,0.81,...,5.13,5.452,0.109,25.0,reviewed,us,us,"[2025-09-01, 13:48:01.759Z]",2025-09-01,13:48:01
1,2025-08-29T21:48:43.387Z,-18.9492,-69.4232,114.517,4.60,mb,53.0,103.0,0.616,0.98,...,5.16,5.411,0.079,49.0,reviewed,us,us,"[2025-08-29, 21:48:43.387Z]",2025-08-29,21:48:43
2,2025-08-25T18:01:03.119Z,-21.7142,-68.4897,118.465,4.70,mb,44.0,63.0,0.704,1.19,...,4.54,5.964,0.060,85.0,reviewed,us,us,"[2025-08-25, 18:01:03.119Z]",2025-08-25,18:01:03
3,2025-08-24T03:44:08.460Z,-19.3888,-69.2674,100.534,5.00,mb,59.0,104.0,0.287,1.04,...,6.76,5.716,0.029,391.0,reviewed,us,us,"[2025-08-24, 03:44:08.460Z]",2025-08-24,03:44:08
4,2025-08-23T09:07:59.103Z,-32.5045,-71.5221,53.111,4.70,mb,49.0,133.0,0.527,0.84,...,2.93,5.576,0.097,32.0,reviewed,us,us,"[2025-08-23, 09:07:59.103Z]",2025-08-23,09:07:59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10402,1906-10-02T14:33:45.430Z,-32.1750,-71.7630,15.000,6.35,mw,,,,,...,,13.100,0.200,,reviewed,iscgem,iscgem,"[1906-10-02, 14:33:45.430Z]",1906-10-02,14:33:45
10403,1906-08-19T09:34:08.360Z,-39.2280,-72.7210,15.000,6.74,mw,,,,,...,,25.000,0.200,,reviewed,iscgemsup,iscgemsup,"[1906-08-19, 09:34:08.360Z]",1906-08-19,09:34:08
10404,1906-08-17T00:40:04.250Z,-32.4000,-71.4000,35.000,8.20,mw,,,,,...,,11.100,0.200,,reviewed,iscgemsup,iscgemsup,"[1906-08-17, 00:40:04.250Z]",1906-08-17,00:40:04
10405,1904-12-11T17:05:42.720Z,-32.3120,-73.7060,10.000,6.72,mw,,,,,...,,25.000,0.230,,reviewed,iscgemsup,iscgemsup,"[1904-12-11, 17:05:42.720Z]",1904-12-11,17:05:42


In [115]:
df_copy["status"].astype("category")
df_copy["status"].unique()

array(['reviewed', 'automatic'], dtype=object)

In [116]:
df_copy["locationSource"].astype("category")
df_copy["locationSource"].unique()

array(['us', 'guc', 'sja', 'us_guc', 'us_sja', 'iscgem', 'iscgemsup'],
      dtype=object)

In [117]:
df_copy["net"].astype("category")
df_copy["net"].unique()

array(['us', 'iscgem', 'official', 'iscgemsup'], dtype=object)

In [118]:
df_copy["magSource"].astype("category")
df_copy["magSource"].unique()

array(['us', 'guc', 'gcmt', 'us_guc', 'sja', 'iscgem', 'official', 'hrv',
       'san', 'nc', 'iscgemsup'], dtype=object)

In [119]:
df_copy["type"].astype("category")
df_copy["type"].unique()

array(['earthquake'], dtype=object)

In [120]:
df_copy[["time", "updated"]]

Unnamed: 0,time,updated
0,2025-09-01T13:48:01.759Z,2025-09-01T17:26:31.440Z
1,2025-08-29T21:48:43.387Z,2025-08-30T03:24:45.040Z
2,2025-08-25T18:01:03.119Z,2025-08-25T18:16:37.040Z
3,2025-08-24T03:44:08.460Z,2025-08-24T15:32:52.214Z
4,2025-08-23T09:07:59.103Z,2025-08-24T15:09:12.040Z
...,...,...
10402,1906-10-02T14:33:45.430Z,2022-04-25T20:40:47.117Z
10403,1906-08-19T09:34:08.360Z,2022-05-09T15:12:00.619Z
10404,1906-08-17T00:40:04.250Z,2024-12-13T06:53:22.309Z
10405,1904-12-11T17:05:42.720Z,2022-05-09T15:22:32.530Z


Observando los datos podemos ver 5 columnas que no aportan información para nuestra investigación, tenemos "locationSource", "iscgem" y "magSource", que son solo información sobre las fuentes de la información, por otro lado, la columna "type" solo existe una categoría que es earthqueake. Por otro lado, tenemos la columna "updated" que es solo la fecha en la que fue actualizada la información. Como consideramos que no es útil para nuestra investigación vamos a descartar estas columnas.

In [121]:
df_copy = df_copy.drop(columns= ["time", "dates", "status", "locationSource", "net", "updated", "magSource", "type"])

In [122]:
df_copy

Unnamed: 0,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,id,place,horizontalError,depthError,magError,magNst,date,hour
0,-30.6105,-71.2446,61.209,4.60,mb,50.0,71.0,0.342,0.81,us7000qt0p,"4 km WSW of Ovalle, Chile",5.13,5.452,0.109,25.0,2025-09-01,13:48:01
1,-18.9492,-69.4232,114.517,4.60,mb,53.0,103.0,0.616,0.98,us7000qs9z,"40 km N of Camiña, Chile",5.16,5.411,0.079,49.0,2025-08-29,21:48:43
2,-21.7142,-68.4897,118.465,4.70,mb,44.0,63.0,0.704,1.19,us7000qqvr,"59 km SSW of Ollagüe, Chile",4.54,5.964,0.060,85.0,2025-08-25,18:01:03
3,-19.3888,-69.2674,100.534,5.00,mb,59.0,104.0,0.287,1.04,us6000r3fa,"18 km ESE of Camiña, Chile",6.76,5.716,0.029,391.0,2025-08-24,03:44:08
4,-32.5045,-71.5221,53.111,4.70,mb,49.0,133.0,0.527,0.84,us6000r3as,"27 km WSW of La Ligua, Chile",2.93,5.576,0.097,32.0,2025-08-23,09:07:59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10402,-32.1750,-71.7630,15.000,6.35,mw,,,,,iscgem610548661,"58 km WNW of La Ligua, Chile",,13.100,0.200,,1906-10-02,14:33:45
10403,-39.2280,-72.7210,15.000,6.74,mw,,,,,iscgemsup16957914,"17 km NNW of Loncoche, Chile",,25.000,0.200,,1906-08-19,09:34:08
10404,-32.4000,-71.4000,35.000,8.20,mw,,,,,iscgemsup16957911,"The 1906 Valparaiso, Chile Earthquake",,11.100,0.200,,1906-08-17,00:40:04
10405,-32.3120,-73.7060,10.000,6.72,mw,,,,,iscgemsup610548593,"210 km WNW of Valparaíso, Chile",,25.000,0.230,,1904-12-11,17:05:42


Ahora como nuestra investigación se da solo en Chile filtraremos por la columna de "place" dejando solo aquellas con el nombre de Chile en ellas.

In [123]:
df_copy = df_copy[df_copy["place"].str.contains("Chile", na=False)].reset_index()
df_copy

Unnamed: 0,index,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,id,place,horizontalError,depthError,magError,magNst,date,hour
0,0,-30.6105,-71.2446,61.209,4.60,mb,50.0,71.0,0.342,0.81,us7000qt0p,"4 km WSW of Ovalle, Chile",5.13,5.452,0.109,25.0,2025-09-01,13:48:01
1,1,-18.9492,-69.4232,114.517,4.60,mb,53.0,103.0,0.616,0.98,us7000qs9z,"40 km N of Camiña, Chile",5.16,5.411,0.079,49.0,2025-08-29,21:48:43
2,2,-21.7142,-68.4897,118.465,4.70,mb,44.0,63.0,0.704,1.19,us7000qqvr,"59 km SSW of Ollagüe, Chile",4.54,5.964,0.060,85.0,2025-08-25,18:01:03
3,3,-19.3888,-69.2674,100.534,5.00,mb,59.0,104.0,0.287,1.04,us6000r3fa,"18 km ESE of Camiña, Chile",6.76,5.716,0.029,391.0,2025-08-24,03:44:08
4,4,-32.5045,-71.5221,53.111,4.70,mb,49.0,133.0,0.527,0.84,us6000r3as,"27 km WSW of La Ligua, Chile",2.93,5.576,0.097,32.0,2025-08-23,09:07:59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9479,10402,-32.1750,-71.7630,15.000,6.35,mw,,,,,iscgem610548661,"58 km WNW of La Ligua, Chile",,13.100,0.200,,1906-10-02,14:33:45
9480,10403,-39.2280,-72.7210,15.000,6.74,mw,,,,,iscgemsup16957914,"17 km NNW of Loncoche, Chile",,25.000,0.200,,1906-08-19,09:34:08
9481,10404,-32.4000,-71.4000,35.000,8.20,mw,,,,,iscgemsup16957911,"The 1906 Valparaiso, Chile Earthquake",,11.100,0.200,,1906-08-17,00:40:04
9482,10405,-32.3120,-73.7060,10.000,6.72,mw,,,,,iscgemsup610548593,"210 km WNW of Valparaíso, Chile",,25.000,0.230,,1904-12-11,17:05:42


In [124]:
df_copy["place"] = df_copy["place"].apply(lambda x: x.split(",")[0])

In [125]:
df_copy

Unnamed: 0,index,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,id,place,horizontalError,depthError,magError,magNst,date,hour
0,0,-30.6105,-71.2446,61.209,4.60,mb,50.0,71.0,0.342,0.81,us7000qt0p,4 km WSW of Ovalle,5.13,5.452,0.109,25.0,2025-09-01,13:48:01
1,1,-18.9492,-69.4232,114.517,4.60,mb,53.0,103.0,0.616,0.98,us7000qs9z,40 km N of Camiña,5.16,5.411,0.079,49.0,2025-08-29,21:48:43
2,2,-21.7142,-68.4897,118.465,4.70,mb,44.0,63.0,0.704,1.19,us7000qqvr,59 km SSW of Ollagüe,4.54,5.964,0.060,85.0,2025-08-25,18:01:03
3,3,-19.3888,-69.2674,100.534,5.00,mb,59.0,104.0,0.287,1.04,us6000r3fa,18 km ESE of Camiña,6.76,5.716,0.029,391.0,2025-08-24,03:44:08
4,4,-32.5045,-71.5221,53.111,4.70,mb,49.0,133.0,0.527,0.84,us6000r3as,27 km WSW of La Ligua,2.93,5.576,0.097,32.0,2025-08-23,09:07:59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9479,10402,-32.1750,-71.7630,15.000,6.35,mw,,,,,iscgem610548661,58 km WNW of La Ligua,,13.100,0.200,,1906-10-02,14:33:45
9480,10403,-39.2280,-72.7210,15.000,6.74,mw,,,,,iscgemsup16957914,17 km NNW of Loncoche,,25.000,0.200,,1906-08-19,09:34:08
9481,10404,-32.4000,-71.4000,35.000,8.20,mw,,,,,iscgemsup16957911,The 1906 Valparaiso,,11.100,0.200,,1906-08-17,00:40:04
9482,10405,-32.3120,-73.7060,10.000,6.72,mw,,,,,iscgemsup610548593,210 km WNW of Valparaíso,,25.000,0.230,,1904-12-11,17:05:42


Observamos que hay varias columnas con datos nulos, pero no la descartaremos aun ya que no queremos perder información. Por ello dependiendo de la pregunta descartaremos algunas columnas con datos nulos.

## Pregunta 1:
