The overall goal of this project is to analyze publicly available data for earthquakes and apply various machine learning algorithms to try and predict future occurrences.
All the code used for cleaning, prepping and transforming the data as well as creating the visualizations and running the algorithms have been written in Python, and has been listed in the Jupyter notebook 'Data512_Final_Project.ipynb'.
This work was performed as the final project for the Data 512 course at the University of Washington Fall 2018.
The course wiki page can be found at the link below
Course Wiki
Lets jump in.
The code is released under the MIT license, and is intended to be fully reproducible.
Additional details can be found at the link below
License
All the code and analysis performed in here is intended to be fully reproducible.
To clone this repository use the command
In case you would rather not clone the entire repository, the data can be downloaded in a tab separated file, and can be imported into excel or python for analysis.
The data can be downloaded from the link below
National Geophysical Data Center / World Data Service (NGDC/WDS): Significant Earthquake Database. National Geophysical Data Center, NOAA
The data is from the significant earthquake database, and contains historic data from 2150 BC to the present day, of earthquakes from all over the world. The data is constantly updated to reflect the latest events.
As per the description from the 'data.nodc.noaa.gov' site where the data is hosted:
The Significant Earthquake Database is a global listing of over 5,700 earthquakes from 2150 BC to the present. A significant earthquake is classified as one that meets at least one of the following criteria:
caused deaths, caused moderate damage (approximately 1 million dollars or more), magnitude 7.5 or greater, Modified Mercalli Intensity (MMI) X or greater, or the earthquake generated a tsunami.
The database provides information on the date and time of occurrence, latitude and longitude, focal depth, magnitude, maximum MMI intensity, and socio-economic data such as the total number of casualties, injuries, houses destroyed, and houses damaged, and dollar damage estimates. References, political geography, and additional comments are also provided for each earthquake. If the earthquake was associated with a tsunami or volcanic eruption, it is flagged and linked to the related tsunami event or significant volcanic eruption.
The fields that are key from the perspective of our analysis are detailed in the table below
Field Name | Datatype | Description |
---|---|---|
Year | Integer | The year of occurence |
Focal Depth | Integer | Depth of the epicenter |
Eq_Primary | Float | Magnitude of the earthquake |
Country | String | Name of the country |
Location_Name | String | Specific location in the country |
Latitude | Float | Coordinates of the exact location |
Longitude | Float | Coordinates of the exact location |
The data can be downloaded in a tab separated file, and can be imported into excel or python for analysis.
All the data used in this analysis can be found in the file 'earthquake_historical.tsv'
For more information about the data and downloading it, please access the link below
National Geophysical Data Center / World Data Service (NGDC/WDS): Significant Earthquake Database. National Geophysical Data Center, NOAA
License
The data is hosted on a public information website, and can be freely distributed and copied, with the appropriate credits.
A copy of the text from the webiste is quoted below for reference:
The National Centers for Environmental Information (formerly the National Geophysical Data Center) website is provided as a public service by the U.S. Department of Commerce, National Oceanic and Atmospheric Administration (NOAA), National Environmental Satellite, Data and Information Service (NESDIS). Information presented on these web pages is considered public information and may be distributed or copied.
More details about their privacy policy can be found at the link below
NOAA privacy
All of the analysis has been performed using Python and its libraries like Pandas and Numpy with matplotlib aiding in the visualizations. The data is not too big, and all the training and testing of the models were done locally, without needing support from cloud computing resources (Amazon EC2).
I list out all the python packages used in the analysis below
- Numpy
- Pandas
- Matplotlib
- Basemap
The different algorithms tested on the data are part of the Scikit Learn python package
- Linear Regression
- Decision tree
- Random forest
Machine learning predicts earthquakes in a lab
Hindukush earthquake magnitude prediction
Analysis of soil radon data using decision trees
Using Neural Networks to predict earthquake magnitude
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6033417/
https://earthquake.usgs.gov/data/data.php#eq
https://arxiv.org/pdf/1702.05774.pdf
https://www.scientificamerican.com/article/can-artificial-intelligence-predict-earthquakes/