# Preparing data on earthquakes

This is one of the Jupyter notebooks I used in my preparation of *Probably Overthinking It: How to Use Data to Answer Questions, Avoid Statistical Traps, and Make Better Decisions*.

The book is scheduled to be published by University of Chicago Press in 2023.
If you would like to get infrequent email announcements about the book, please
[sign up for my mailing list](http://eepurl.com/h0nfbX).



[Click here to run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/ProbablyOverthinkingIt/blob/book/notebooks/clean_quake.ipynb).

In [2]:
from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

In [3]:
download("https://scedc.caltech.edu/ftp/catalogs/SCSN/SCSN_catalogs.tar.gz")

In [4]:
!ls -lh SCSN_catalogs.tar.gz

-rw-rw-r-- 1 downey downey 19M Mar  1 19:53 SCSN_catalogs.tar.gz


In [5]:
!tar -xzf SCSN_catalogs.tar.gz

In [1]:
import pandas as pd

quake_dfs = []
for i in range(1981, 2023):
    filename = f'SCSN/{i}.catalog'
    df = pd.read_fwf(filename, colspecs='infer', skiprows=9)

    # drop the last row
    n = len(df)
    df.drop(n-1, inplace=True)
    
    print(i, n, df['MAG'].isna().sum(), df['MAG'].min())
    quake_dfs.append(df)

In [7]:
quake = pd.concat(quake_dfs)
quake.shape

(803451, 14)

In [8]:
columns = ['#YYY', 'MM', 'DD', 'MAG']
quake[columns].to_csv('quake.csv', index=False)

In [9]:
!ls -lh quake.csv

-rw-rw-r-- 1 downey downey 12M Mar  1 20:00 quake.csv


Probably Overthinking It

Copyright 2022 Allen Downey 

The code in this notebook and `utils.py` is under the [MIT license](https://mit-license.org/).