This project looks at where earthquakes and other events occured around the world.

Data taken from: https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php




In [None]:
import pandas as pd
import requests
from io import StringIO
import geopandas as gpd
import matplotlib.pyplot as plt
import glob
from shapely.geometry import Point, Polygon
from pyproj import CRS
import numpy as np

!python --version
print('Pandas Version:', pd.__version__)
print('Requests Version:', requests.__version__)
print('Geopandas Version:', gpd.__version__)

Imports and versions:


In [None]:
# https://stackoverflow.com/questions/56611698/pandas-how-to-read-csv-file-from-google-drive-public
def import_data(url):
    file_id = url.split('/')[-2]
    download = 'https://drive.google.com/uc?export=download&id=' + file_id
    url = requests.get(download).text
    raw = StringIO(url)
    return pd.read_csv(raw)


file_1 = import_data('https://drive.google.com/file/d/1UZnCkoibG6G9c8Txn36T-09zi7q8qnj1/view?usp=sharing')
file_2 = import_data('https://drive.google.com/file/d/10zxNYvxIpQhkasEXUEQBVd9PlmIxZUuE/view?usp=sharing')
file_3 = import_data('https://drive.google.com/file/d/1X4Np6OX-73oohyJ7jz2v2vM1xzbb3m6P/view?usp=sharing')

Retrieving the files from google drive and converting them to dataframes. The source I used only allows downloading of a month's worth of data so I uploaded files I previously downloaded to have more data to work with.

In [None]:
frames = [file_1, file_2, file_3]
df = pd.concat(frames)

Combining the files into one dataframe.

In [None]:
df.info()

There are 31,742 entries, but due to not downloading each file at the exact time, I believe there are duplicate entries. 

In [None]:
df.nunique()

Looking at the number of unique entries confirms this. I know from the source that id is a unique value so I'll drop any id duplicates.

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
df.drop_duplicates(subset='id', inplace=True)

In [None]:
df.info()

In [None]:
# https://www.w3resource.com/python-exercises/pandas/datetime/pandas-datetime-exercise-3.php
df['time'] = df['time'].astype('datetime64[ns]')
print('First:', df.time.min())
print('Last:', df.time.max())
print('Days:', (df.time.max()-df.time.min()).days)

Now there are only 25,449 entries ranging from Jan 23 to Apr 1. 68 total days.

In [None]:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

I use a built-in world map from geopandas for plotting.

In [None]:
# https://towardsdatascience.com/plotting-maps-with-geopandas-428c97295a73
crs = CRS("ESRI:54009")
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]
geo_df = gpd.GeoDataFrame(df, crs = crs, geometry = geometry)

In [None]:
geo_df.head()

Creating a new geo dataframe with the lat and long in its own column.

In [None]:
# https://towardsdatascience.com/plotting-maps-with-geopandas-428c97295a73
fig, ax = plt.subplots(figsize = (20, 20))
world.to_crs(epsg=4326).plot(ax=ax, color='lightgrey')
geo_df.plot(
    ax=ax, 
    alpha = .5,
    markersize = 2)
ax.set_title('Earthquakes and Events')

The bulk of events reported occurred in the United States which makes sense since this data is from the USGS (United States Geological Survey). I would presume that most of the detection equipment is located in the US. Almost all the events from the rest of the world are on a tectonic plate boundary which makes sense.

In [None]:
# https://towardsdatascience.com/plotting-maps-with-geopandas-428c97295a73
fig, ax = plt.subplots(figsize = (30, 20))
world.to_crs(epsg=4326).plot(ax=ax, color='lightgrey')
geo_df.plot(
    column = 'mag',
    ax=ax,
    cmap = 'rainbow',
    legend = True,
    legend_kwds={'shrink': 0.3},
    alpha = 1,
    markersize = 30)
ax.set_title('Earthquake and Event Magnitudes')

Looking at the magnitudes for events around the world, most of the events in the US are a lower magnitude, 2 and below. While most events around the world are around 4 and above. If most of the USGS detection equipment is in the US, it makes sense that they would detect lower magnitudes in the US and higher elsewhere. 

In [None]:
fig, ax = plt.subplots(figsize = (20, 20))
world.to_crs(epsg=4326).plot(ax=ax, color='lightgrey')
geo_df.plot(
    column = 'type',
    ax=ax,
    cmap = 'rainbow',
    legend = True,
    # legend_kwds={'shrink': 0.3},
    alpha = 1,
    markersize = 5)
ax.set_title('Event Types')

It's a little hard to see, but it looks like only in the US, were events other than earthquakes detected. It's cluttered but quarry blasts and explosions are the most common non earthquakes.

In [None]:
df.groupby('type')['mag'].value_counts(bins=1).sort_index().to_frame()

This confirms it, most common non earthquakes are quarry blasts, followed by explosions. They have relatively low magnitude so going by my theory, would not be detected outside the US.

In [None]:
fig, ax = plt.subplots(figsize = (30, 20))
world.to_crs(epsg=4326).plot(ax=ax, color='lightgrey')
geo_df.plot(
    column = 'depth',
    ax=ax,
    cmap = 'rainbow',
    legend = True,
    legend_kwds={'shrink': 0.3},
    alpha = 1,
    markersize = 5)
ax.set_title('Depth')

The source states that depth is the most prone to error and guesswork. It is recorded as depth in kilometers, so 600 would be 600 km down. Most events are recorded as having low depths with only quakes in Indonesia and the Pacific generally being more than 400

In [None]:
USA = world[world.name == 'United States of America']
geo_df['within'] = ''

# https://stackoverflow.com/questions/63369715/filter-a-geopandas-dataframe-within-a-polygon-and-remove-from-the-dataframe-the
within_list = []
for lon,lat in zip(geo_df.longitude, geo_df.latitude):
    pt = Point(lon, lat)
    within = pt.within(USA['geometry'].values[0])
    within_list.append(within)

geo_df['within'] = within_list


In [None]:
USA_quakes = geo_df[geo_df['within'] == True]

Creating a new geo dataframe of just the US and filtering out events outside the US.

In [None]:
fig, ax = plt.subplots(figsize = (30, 20))
USA.to_crs(epsg=4326).plot(ax=ax, facecolor='none', edgecolor='black')
USA_quakes.plot(
    ax=ax,
    alpha = .1,
    markersize = 20)
ax.set_title('Earthquakes in the US')

Going into more detail in the US, most events happen in California, Alaska, and Hawaii. This makes sense, since they are the most geologically active states.

In [None]:
fig, ax = plt.subplots(figsize = (30, 20))
USA.to_crs(epsg=4326).plot(ax=ax, facecolor='none', edgecolor='black')
USA_quakes.plot(
    ax=ax,
    column = ('depth'),
    cmap = 'rainbow',
    legend = True,
    legend_kwds={'shrink': 0.3},
    alpha = .1,
    markersize = 20)
ax.set_title('Depth in the US')

Like the previous depth map, most events are recorded as being close to the surface with only Alaska generally having more than 100 km depth.

In [None]:
neg_dep = USA_quakes[USA_quakes['depth'] < 0]
fig, ax = plt.subplots(figsize = (30, 20))
USA.to_crs(epsg=4326).plot(ax=ax, facecolor='none', edgecolor='black')
neg_dep.plot(
    ax=ax,
    column = ('depth'),
    cmap = 'rainbow',
    legend = True,
    legend_kwds={'shrink': 0.3},
    markersize = 20)
ax.set_title('Negative Depth in the US')

In the previous project I noted that there were negative depths for some events and speculated that they corresponded to events in the mountains. Going by this map this seems to have been correct as events with a negative depth occur in the Rockies, the Alaskan mountains, and Hawaii's mountain. 

In [None]:
no_USA_quakes = USA_quakes[USA_quakes['type'] != 'earthquake']
fig, ax = plt.subplots(figsize = (30, 20))
USA.to_crs(epsg=4326).plot(ax=ax, facecolor="none", edgecolor="black")
no_USA_quakes.plot(
    column = 'type',
    ax=ax,
    cmap = 'rainbow',
    legend = True,
    # legend_kwds={'shrink': 0.3},
    alpha = .8,
    markersize = 50)
ax.set_title('Event Types')

Most of the explosions occur in the Pacific Northwest. A lot of Quarry blasts in California, Arizona, North Texas and Oklahoma, and Montana. There was a mine collapse in West Virginia/Kentucky. A Mining Explosion in Arizona. Ice Quakes only occur in Alaska which also has explosions and other events.

It's interesting that mainly the Pacific Northwest only has explosions. I don't know why. It makes sense that only Alaska has Ice quakes. I'm curious as to what the other events in Alaska are.