# Earthquake Analysis

## Project Description

* The primary objective of this project is to conduct a simple-view Exploratory Data Analysis (EDA) of earthquake data to gain valuable insights into the characteristics, distribution, and patterns of seismic events worldwide.
* It aims to explore and visualize key factors, including earthquake magnitude, geographical distribution, alert levels, and the 'Sig' (Significance) parameter, to uncover trends and patterns.
* The choice of earthquakes as the subject of analysis stems from the desire to venture beyond the comfort zone and tackle a complex and real-world dataset.
* EDA techniques, including data cleaning, descriptive statistics, and visualizations, will be employed to understand the dataset's characteristics thoroughly.
* Discovering patterns and trends within the earthquake data can contribute to improving earthquake prediction, preparedness, and response strategies.

In [82]:
import pandas as pd
import streamlit as st
import plotly.express as px

In [83]:
# Load the dataset into a DataFrame
df=pd.read_csv(r"C:\Users\oscar\practicum-project-6\tripleten_project6\earthquake_data.csv")

In [84]:
df

Unnamed: 0,title,magnitude,date_time,cdi,mmi,alert,tsunami,sig,net,nst,dmin,gap,magType,depth,latitude,longitude,location,continent,country
0,"M 7.0 - 18 km SW of Malango, Solomon Islands",7.0,22-11-2022 02:03,8,7,green,1,768,us,117,0.509,17.0,mww,14.000,-9.7963,159.596,"Malango, Solomon Islands",Oceania,Solomon Islands
1,"M 6.9 - 204 km SW of Bengkulu, Indonesia",6.9,18-11-2022 13:37,4,4,green,0,735,us,99,2.229,34.0,mww,25.000,-4.9559,100.738,"Bengkulu, Indonesia",,
2,M 7.0 -,7.0,12-11-2022 07:09,3,3,green,1,755,us,147,3.125,18.0,mww,579.000,-20.0508,-178.346,,Oceania,Fiji
3,"M 7.3 - 205 km ESE of Neiafu, Tonga",7.3,11-11-2022 10:48,5,5,green,1,833,us,149,1.865,21.0,mww,37.000,-19.2918,-172.129,"Neiafu, Tonga",,
4,M 6.6 -,6.6,09-11-2022 10:14,0,2,green,1,670,us,131,4.998,27.0,mww,624.464,-25.5948,178.278,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
777,"M 7.7 - 28 km SSW of Puerto El Triunfo, El Sal...",7.7,13-01-2001 17:33,0,8,,0,912,us,427,0.000,0.0,mwc,60.000,13.0490,-88.660,"Puerto El Triunfo, El Salvador",,
778,"M 6.9 - 47 km S of Old Harbor, Alaska",6.9,10-01-2001 16:02,5,7,,0,745,ak,0,0.000,0.0,mw,36.400,56.7744,-153.281,"Old Harbor, Alaska",North America,
779,"M 7.1 - 16 km NE of Port-Olry, Vanuatu",7.1,09-01-2001 16:49,0,7,,0,776,us,372,0.000,0.0,mwb,103.000,-14.9280,167.170,"Port-Olry, Vanuatu",,Vanuatu
780,"M 6.8 - Mindanao, Philippines",6.8,01-01-2001 08:54,0,5,,0,711,us,64,0.000,0.0,mwc,33.000,6.6310,126.899,"Mindanao, Philippines",,


In [85]:
#df.shape

In [86]:
#df.info()

In [87]:
#there are null values
df.isnull().values.any()

True

In [88]:
#Replacing nan values with 'not specified'
df.fillna('not specified', inplace=True)

In [89]:
#checking for nulls
df.isnull().values.any()

False

### When looking at the info from the graph, the date_time column has nanoseconds for precise measurement data.

This is great but for the purpose of this analysis, we want only the year in which these eartquakes occurred. So, we'll be changing the type to a datetime type and apply the .apply function to swtich it back to a string for year respectively.

In [90]:
#since column "date_time" has nanoseconds, we want to change it to to just years for a slider effect
df["date_time"]=df["date_time"].apply(pd.to_datetime)

In [91]:
df['date_time'] = df['date_time'].apply(lambda x: x.strftime('%Y'))

In [92]:
#df.info()

### This data contains oceanic earthquake information, therefore creating tsunamis. We'll led the user choose whether to include tsunami occurrence in the data.

In [93]:
#creating header with an option to filter the data and the checkbox:
#data set includes 2 options for tsunami: 1=yes an 0=no
#let users decide whether they want to see

st.header('Earthquake effects.')
st.write("""
##### Filter the data below to see how earthquake data is affected by whether a tsunami occured
""")
show_tsunami = st.checkbox('Include tsunami occurence')

In [94]:
show_tsunami

False

In [97]:
if not show_tsunami:
    df = df[df.tsunami!='1']

In [57]:
#creating options for filter from all servers
country_choice = df['country'].unique()
country_choice_man = st.selectbox('Select country:', country_choice)

In [58]:
country_choice_man

'Solomon Islands'

In [59]:
#next let's create a slider for years, so that users can filter earthquakes 
#creating min and max years as limits fro sliders
min_year, max_year=int(df['date_time'].min()), int(df['date_time'].max())

#creating slider
year_range = st.slider(
    "Choose years",
    value=(min_year,max_year),min_value=min_year,max_value=max_year)

In [60]:
#year_range

(2001, 2022)

In [64]:
st.header('Earthquake analysis')
st.write("""
###### Let's analyze what influences earthquakes the most. We will check how distribution of earthquakes varies depending on the alert and continent
""")

# Will create 2 histograms with the choice: color and alert, color and continent

#creating list of options to choose from
list_for_hist=['alert', 'continent']

#creating selectbox
choice_for_hist = st.selectbox('Split for magnitude distribution', list_for_hist)

#plotly histogram, where magnitude is split by alert level
fig1 = px.histogram(df, x="magnitude", color=choice_for_hist,
                    color_discrete_map={
                        "green": "green",
                        "yellow": "yellow",
                        "orange": "orange",
                        "red": "red",
                        "not specified": "gray"})
#adding title
fig1.update_layout(title="<b> Split of magnitude by {}</b>".format(choice_for_hist))

#embedding into streamlit
st.plotly_chart(fig1)

DeltaGenerator()

In [65]:
fig1.show()

In [66]:
#count_all = df['alert'].value_counts()

In [67]:
#count_all

not specified    367
green            325
yellow            56
orange            22
red               12
Name: alert, dtype: int64

In [69]:
st.write("""
###### Alert Level refers to an assessment of the potential population exposure to earthquakes in proximity to specific ares.

• Green = Litle to no
• Yellow = Limited
• Orange = Significant
• Red = Extensive
""")

### Alert Level Meaning

Alert Level refers to an assessment of the potential population exposure to earthquakes in proximity to specific ares.

</p>• Green = Ltle to no</p>
• Yellow = Limited</p>
• Orange = Significant</p>
• Red = Extensive

In [70]:
df['age']=2023-df['date_time'].values.astype(float)
    
def age_category(x):
    if x<5: return '<5'
    elif x>=5 and x<10: return '5-10'
    elif x>=10 and x<20: return '10-20'
    else: return '>20'
    
df['age_category']= df['age'].apply(age_category)

In [71]:
#df['age_category']

0       <5
1       <5
2       <5
3       <5
4       <5
      ... 
777    >20
778    >20
779    >20
780    >20
781    >20
Name: age_category, Length: 782, dtype: object

In [72]:
st.write("""
###### Now let's check if Earthquake occurrence has increased over the years and if they're becoming more "significant"
""")

fig2 = px.scatter(df, x="magnitude", y="sig", color="age_category", hover_data=['date_time'])

st.plotly_chart(fig2)

DeltaGenerator()

In [73]:
fig2.show()

In [74]:
st.write("""
###### The "sig" numbers tend to rise with increasing magnitude numbers. Moreover, the year span doesn't appear to reveal any hidden insights in this graph.
""")

### SIG

Sig - A number describing how significant the event is. Larger numbers indicate a more significant event. This value is determined on a number of factors, including: magnitude, maximum MMI, felt reports, and estimated impact

## Conclusion

* Through analysis and visualization, this project seeks to shed light on the dynamics of seismic events, ultimately contributing to our understanding of earthquakes' impact and mitigation strategies.
* The "sig" numbers tend to rise with increasing magnitude numbers. Moreover, the year span doesn't appear to reveal any hidden insights in this graph.
* Additionally, a great majority of these Earquakes occur at a Green Alert Level (325 total), this comes at 42% of earthquakes occurring at Green Alert Lever for this dataset (Total of 782 data, including not specified).
* We can further analyze other insights such as mapping in which Continent produces higher Red Alert Level for further precautionary steps in those areas.

In [75]:
#streamlit run earthquake_eda.py