Skip to content

Gmoog/data512_finalproject

Repository files navigation

Data 512 A-6 Final Project - Predicting Earthquakes

Abstract

The overall goal of this project is to analyze publicly available data for earthquakes and apply various machine learning algorithms to try and predict future occurrences.

All the code used for cleaning, prepping and transforming the data as well as creating the visualizations and running the algorithms have been written in Python, and has been listed in the Jupyter notebook 'Data512_Final_Project.ipynb'.

This work was performed as the final project for the Data 512 course at the University of Washington Fall 2018.
The course wiki page can be found at the link below
Course Wiki
Lets jump in.

License

The code is released under the MIT license, and is intended to be fully reproducible. Additional details can be found at the link below
License

Reproducibility

All the code and analysis performed in here is intended to be fully reproducible.
To clone this repository use the command

git clone https://github.com/Gmoog/data512_finalproject.git

In case you would rather not clone the entire repository, the data can be downloaded in a tab separated file, and can be imported into excel or python for analysis.
The data can be downloaded from the link below
National Geophysical Data Center / World Data Service (NGDC/WDS): Significant Earthquake Database. National Geophysical Data Center, NOAA

Data

The data is from the significant earthquake database, and contains historic data from 2150 BC to the present day, of earthquakes from all over the world. The data is constantly updated to reflect the latest events.

As per the description from the 'data.nodc.noaa.gov' site where the data is hosted:

The Significant Earthquake Database is a global listing of over 5,700 earthquakes from 2150 BC to the present. A significant earthquake is classified as one that meets at least one of the following criteria:

caused deaths, caused moderate damage (approximately 1 million dollars or more), magnitude 7.5 or greater, Modified Mercalli Intensity (MMI) X or greater, or the earthquake generated a tsunami.

The database provides information on the date and time of occurrence, latitude and longitude, focal depth, magnitude, maximum MMI intensity, and socio-economic data such as the total number of casualties, injuries, houses destroyed, and houses damaged, and dollar damage estimates. References, political geography, and additional comments are also provided for each earthquake. If the earthquake was associated with a tsunami or volcanic eruption, it is flagged and linked to the related tsunami event or significant volcanic eruption.

The fields that are key from the perspective of our analysis are detailed in the table below

Field Name Datatype Description
Year Integer The year of occurence
Focal Depth Integer Depth of the epicenter
Eq_Primary Float Magnitude of the earthquake
Country String Name of the country
Location_Name String Specific location in the country
Latitude Float Coordinates of the exact location
Longitude Float Coordinates of the exact location

The data can be downloaded in a tab separated file, and can be imported into excel or python for analysis.
All the data used in this analysis can be found in the file 'earthquake_historical.tsv'

For more information about the data and downloading it, please access the link below
National Geophysical Data Center / World Data Service (NGDC/WDS): Significant Earthquake Database. National Geophysical Data Center, NOAA

License

The data is hosted on a public information website, and can be freely distributed and copied, with the appropriate credits.

A copy of the text from the webiste is quoted below for reference:

The National Centers for Environmental Information (formerly the National Geophysical Data Center) website is provided as a public service by the U.S. Department of Commerce, National Oceanic and Atmospheric Administration (NOAA), National Environmental Satellite, Data and Information Service (NESDIS). Information presented on these web pages is considered public information and may be distributed or copied.

More details about their privacy policy can be found at the link below
NOAA privacy

Software and Tools used

All of the analysis has been performed using Python and its libraries like Pandas and Numpy with matplotlib aiding in the visualizations. The data is not too big, and all the training and testing of the models were done locally, without needing support from cloud computing resources (Amazon EC2).

I list out all the python packages used in the analysis below

  • Numpy
  • Pandas
  • Matplotlib
  • Basemap

The different algorithms tested on the data are part of the Scikit Learn python package

  • Linear Regression
  • Decision tree
  • Random forest

Resources

Machine learning predicts earthquakes in a lab
Hindukush earthquake magnitude prediction
Analysis of soil radon data using decision trees
Using Neural Networks to predict earthquake magnitude
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6033417/
https://earthquake.usgs.gov/data/data.php#eq
https://arxiv.org/pdf/1702.05774.pdf
https://www.scientificamerican.com/article/can-artificial-intelligence-predict-earthquakes/

About

Final Project repository for the Data512 course at University of Washington

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published