# Earthquake Magnitude Prediction using Neural Network

#                                                     Data Preprocessing 

# Dependencies:
This project requires the following dependencies to be installed:

* Python (>=3.6) 
* NumPy
* Pandas
* Matplotlib
* Scikit-learn (for data preprocessing)
* Keras (with TensorFlow backend)

In [12]:
import platform
version = platform.python_version()
version

'3.10.6'

# Load Dataset
Using the pandas library, we can load a dataset into the Python program.
To load a CSV file into Python, we use pandas.read_csv(“path of file”)

In [63]:
import numpy as np
import pandas as pd
import datetime
import time

df=pd.read_csv("database.csv")
df

Unnamed: 0,Date,Time,Latitude,Longitude,Type,Depth,Depth Error,Depth Seismic Stations,Magnitude,Magnitude Type,...,Magnitude Seismic Stations,Azimuthal Gap,Horizontal Distance,Horizontal Error,Root Mean Square,ID,Source,Location Source,Magnitude Source,Status
0,01/02/1965,13:44:18,19.2460,145.6160,Earthquake,131.60,,,6.0,MW,...,,,,,,ISCGEM860706,ISCGEM,ISCGEM,ISCGEM,Automatic
1,01/04/1965,11:29:49,1.8630,127.3520,Earthquake,80.00,,,5.8,MW,...,,,,,,ISCGEM860737,ISCGEM,ISCGEM,ISCGEM,Automatic
2,01/05/1965,18:05:58,-20.5790,-173.9720,Earthquake,20.00,,,6.2,MW,...,,,,,,ISCGEM860762,ISCGEM,ISCGEM,ISCGEM,Automatic
3,01/08/1965,18:49:43,-59.0760,-23.5570,Earthquake,15.00,,,5.8,MW,...,,,,,,ISCGEM860856,ISCGEM,ISCGEM,ISCGEM,Automatic
4,01/09/1965,13:32:50,11.9380,126.4270,Earthquake,15.00,,,5.8,MW,...,,,,,,ISCGEM860890,ISCGEM,ISCGEM,ISCGEM,Automatic
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23406,12/28/2016,8:22:12,38.3917,-118.8941,Earthquake,12.30,1.2,40.0,5.6,ML,...,18.0,42.47,0.120,,0.1898,NN00570710,NN,NN,NN,Reviewed
23407,12/28/2016,9:13:47,38.3777,-118.8957,Earthquake,8.80,2.0,33.0,5.5,ML,...,18.0,48.58,0.129,,0.2187,NN00570744,NN,NN,NN,Reviewed
23408,12/28/2016,12:38:51,36.9179,140.4262,Earthquake,10.00,1.8,,5.9,MWW,...,,91.00,0.992,4.8,1.5200,US10007NAF,US,US,US,Reviewed
23409,12/29/2016,22:30:19,-9.0283,118.6639,Earthquake,79.00,1.8,,6.3,MWW,...,,26.00,3.553,6.0,1.4300,US10007NL0,US,US,US,Reviewed


In [64]:
df.shape

(23411, 21)

# Select the relevant column from the dataset
* Remove the non-relevant column from the dataset*  Because it does not help the model to learn but instead makes the learning 
as complex.


In [65]:
df = pd.DataFrame(df[['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude']])
df

Unnamed: 0,Date,Time,Latitude,Longitude,Depth,Magnitude
0,01/02/1965,13:44:18,19.2460,145.6160,131.60,6.0
1,01/04/1965,11:29:49,1.8630,127.3520,80.00,5.8
2,01/05/1965,18:05:58,-20.5790,-173.9720,20.00,6.2
3,01/08/1965,18:49:43,-59.0760,-23.5570,15.00,5.8
4,01/09/1965,13:32:50,11.9380,126.4270,15.00,5.8
...,...,...,...,...,...,...
23406,12/28/2016,8:22:12,38.3917,-118.8941,12.30,5.6
23407,12/28/2016,9:13:47,38.3777,-118.8957,8.80,5.5
23408,12/28/2016,12:38:51,36.9179,140.4262,10.00,5.9
23409,12/29/2016,22:30:19,-9.0283,118.6639,79.00,6.3


# Checking for maximum and minimum of Latitude and Longitude

* Always the Latitude and Longitude lies between constant range.
* Latitude should be between -90 to 90 degrees.
* Longitude should be lies between -180 to 180 degrees.

The .between(-90,90) method is used to check whether the value of latitude is 
falls between range or not. It returns a boolean result as ‘True’ or ‘False’.

It filters the dataset, retaining only the rows for the condition ‘True’. In other 
words, it keeps only the rows with valid latitude and longitude values

In [66]:
import pandas as pd

# Check latitude values between -90 to 90
df[df['Latitude'].between(-90, 90)]

# Check longitude values between -180 to 180
df[df['Longitude'].between(-180, 180)]

df.tail()


Unnamed: 0,Date,Time,Latitude,Longitude,Depth,Magnitude
23406,12/28/2016,8:22:12,38.3917,-118.8941,12.3,5.6
23407,12/28/2016,9:13:47,38.3777,-118.8957,8.8,5.5
23408,12/28/2016,12:38:51,36.9179,140.4262,10.0,5.9
23409,12/29/2016,22:30:19,-9.0283,118.6639,79.0,6.3
23410,12/30/2016,20:08:28,37.3973,141.4103,11.94,5.5


# Remove nan row or data from dataset

* Check for Not a Number in the given dataset. If any nan present in 
dataset, remove the row from the dataset
* To remove rows or data with NaN values from a dataset in python
* We use dropna() from pandas. it returns a Boolean result.

In [67]:
#count the number of missing (NaN) values in each column
nan_counts = df.isna().sum()

nan_counts

Date         0
Time         0
Latitude     0
Longitude    0
Depth        0
Magnitude    0
dtype: int64

# Convert date and time into same format

Dataset contains date different format like dd-mm-yyyy, dd/mm/yyyy.
So, convert date and time into same format using datetime library.

In [68]:
import pandas as pd

df = pd.read_csv('database.csv')

def formatconvert(date_str):
        return pd.to_datetime(date_str).strftime('%d/%m/%Y')
  
df['Date'] = df['Date'].apply(formatconvert)


df['Date']


0        02/01/1965
1        04/01/1965
2        05/01/1965
3        08/01/1965
4        09/01/1965
            ...    
23406    28/12/2016
23407    28/12/2016
23408    28/12/2016
23409    29/12/2016
23410    30/12/2016
Name: Date, Length: 23411, dtype: object

# Convert date and time into TimeStamp

Timestamps are numerical values representing a specific point in time, 
usually in seconds or milliseconds. It provide a consistent and 
standardized way to represent time across different systems and 
programming langua

gThe timestamps represent the number of seconds since the Unix epoch 
from January 1, 1970, to till date so, drop the before it.
e.

In [70]:
df=pd.DataFrame(df)

df = df.drop(df.index[0:1457])
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,Date,Time,Latitude,Longitude,Type,Depth,Depth Error,Depth Seismic Stations,Magnitude,Magnitude Type,...,Magnitude Seismic Stations,Azimuthal Gap,Horizontal Distance,Horizontal Error,Root Mean Square,ID,Source,Location Source,Magnitude Source,Status
0,11/11/1973,7:14:52,30.5730,52.8870,Earthquake,11.00,,,5.5,MB,...,,,,,,USP0000447,US,US,US,Reviewed
1,12/11/1973,3:53:44,-6.1530,154.4590,Earthquake,50.00,,,5.9,MS,...,,,,,,USP000044M,US,US,US,Reviewed
2,13/11/1973,1:12:12,38.6220,142.1500,Earthquake,78.00,,,5.5,MB,...,,,,,,USP0000451,US,US,US,Reviewed
3,13/11/1973,16:10:59,-18.2750,-178.1330,Earthquake,571.00,,,5.6,MB,...,,,,,,USP0000456,US,US,US,Reviewed
4,15/11/1973,15:06:36,-1.3740,-15.7910,Earthquake,33.00,,,5.5,MB,...,,,,,,USP000045R,US,US,US,Reviewed
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20492,28/12/2016,8:22:12,38.3917,-118.8941,Earthquake,12.30,1.2,40.0,5.6,ML,...,18.0,42.47,0.120,,0.1898,NN00570710,NN,NN,NN,Reviewed
20493,28/12/2016,9:13:47,38.3777,-118.8957,Earthquake,8.80,2.0,33.0,5.5,ML,...,18.0,48.58,0.129,,0.2187,NN00570744,NN,NN,NN,Reviewed
20494,28/12/2016,12:38:51,36.9179,140.4262,Earthquake,10.00,1.8,,5.9,MWW,...,,91.00,0.992,4.8,1.5200,US10007NAF,US,US,US,Reviewed
20495,29/12/2016,22:30:19,-9.0283,118.6639,Earthquake,79.00,1.8,,6.3,MWW,...,,26.00,3.553,6.0,1.4300,US10007NL0,US,US,US,Reviewed


* We can create a new column by combining the ‘Date’ and ‘Time’
Column into single timestamp using pd.to_datetime.strptime()
* Using mktime() we can convert date and time into timestamp.
* Finally remove the date and time column from the dataset.

In [71]:
import datetime
import time

timestamp = []
for d, t in zip(df['Date'], df['Time']):
    ts = datetime.datetime.strptime(d+' '+t, '%d/%m/%Y %H:%M:%S')
    timestamp.append(time.mktime(ts.timetuple()))

timeStamp = pd.Series(timestamp)


In [72]:
df['Timestamp'] = timeStamp.values
data = df.drop(['Date', 'Time'], axis=1)
data = data[data.Timestamp != 'ValueError']
data

Unnamed: 0,Latitude,Longitude,Type,Depth,Depth Error,Depth Seismic Stations,Magnitude,Magnitude Type,Magnitude Error,Magnitude Seismic Stations,Azimuthal Gap,Horizontal Distance,Horizontal Error,Root Mean Square,ID,Source,Location Source,Magnitude Source,Status,Timestamp
0,30.5730,52.8870,Earthquake,11.00,,,5.5,MB,,,,,,,USP0000447,US,US,US,Reviewed,1.218303e+08
1,-6.1530,154.4590,Earthquake,50.00,,,5.9,MS,,,,,,,USP000044M,US,US,US,Reviewed,1.219046e+08
2,38.6220,142.1500,Earthquake,78.00,,,5.5,MB,,,,,,,USP0000451,US,US,US,Reviewed,1.219813e+08
3,-18.2750,-178.1330,Earthquake,571.00,,,5.6,MB,,,,,,,USP0000456,US,US,US,Reviewed,1.220353e+08
4,-1.3740,-15.7910,Earthquake,33.00,,,5.5,MB,,,,,,,USP000045R,US,US,US,Reviewed,1.222042e+08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20492,38.3917,-118.8941,Earthquake,12.30,1.2,40.0,5.6,ML,0.320,18.0,42.47,0.120,,0.1898,NN00570710,NN,NN,NN,Reviewed,1.482894e+09
20493,38.3777,-118.8957,Earthquake,8.80,2.0,33.0,5.5,ML,0.260,18.0,48.58,0.129,,0.2187,NN00570744,NN,NN,NN,Reviewed,1.482897e+09
20494,36.9179,140.4262,Earthquake,10.00,1.8,,5.9,MWW,,,91.00,0.992,4.8,1.5200,US10007NAF,US,US,US,Reviewed,1.482909e+09
20495,-9.0283,118.6639,Earthquake,79.00,1.8,,6.3,MWW,,,26.00,3.553,6.0,1.4300,US10007NL0,US,US,US,Reviewed,1.483031e+09


# Check for magnitude max and min (2.5 to 9.1)

* Find Maximun and Minimun value for magnitude because the magnitude value 
cannot be negative or more than 10.
* Check for magnitude value within 2.5 to 9.1, if anything exceeds delete the row 
or data from the dataset.


In [61]:
df['Magnitude'].max()

9.1

In [60]:
df['Magnitude'].min()

5.5

In [59]:
df[df['Magnitude'].between(5.5, 9.5)]

Unnamed: 0,Latitude,Longitude,Depth,Magnitude,Year,Month,Day,Hour,Minutes,Seconds
0,19.2460,145.6160,131.60,6.0,1965,1,2,13,44,18
1,1.8630,127.3520,80.00,5.8,1965,1,4,11,29,49
2,-20.5790,-173.9720,20.00,6.2,1965,1,5,18,5,58
3,-59.0760,-23.5570,15.00,5.8,1965,1,8,18,49,43
4,11.9380,126.4270,15.00,5.8,1965,1,9,13,32,50
...,...,...,...,...,...,...,...,...,...,...
23406,38.3917,-118.8941,12.30,5.6,2016,12,28,8,22,12
23407,38.3777,-118.8957,8.80,5.5,2016,12,28,9,13,47
23408,36.9179,140.4262,10.00,5.9,2016,12,28,12,38,51
23409,-9.0283,118.6639,79.00,6.3,2016,12,29,22,30,19


# Display the data after preprocessing

Dataset after preprocessing is done. Save the file in csv format using to_csv().
The preprocessed dataset is use for train the model.


In [None]:
df.to_csv("resultdata.csv",index='False')
df

# Conclusion:

* The preprocessing steps are foundational in preparing the 
earthquake dataset for further analysis and model development
* By cleaning, organizing, and standardizing the data, we create a 
solid foundation for accurate and meaningful predictions 
regarding earthquake occurrences and magnitude
* This dataset can now be used for exploratory data analysis, 
feature engineering, and the development of machine learning 
model
* As we progress further, we will dive into more advanced aspects 
of machine learning and develop a model that can potentially 
contribute to earthquake forecasting and risk reductions.s..