# DATA ANALYSIS

## Loading the dataset

First things first, we will import the required libraries

In [8]:
# import pandas
import pandas as pd

Now, we will load the cleaned dataset into the notebook

In [11]:
# Read the dataset 'quakes-cleaned.csv' into a pandas DataFrame.
# Use the values in the first column as the index for the DataFrame.
df = pd.read_csv('quakes-cleaned.csv', index_col=0)

Let us see the first few rows in the dataset

In [12]:
df.head()

Unnamed: 0_level_0,time.1,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,state_country
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,0,0.0,0.0,0.22,...,"5 km SSW of Houston, Alaska",earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Beavertail Drive, Matanuska-Susitna, Alaska, U..."
2023-11-15 08:53:06.688000+00:00,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,0,0.0,0.0,0.8,...,"55 km NE of Whittier, Alaska",earthquake,0.0,0.3,0.0,0,automatic,ak,ak,"Chugach, Alaska, United States"
2023-11-15 08:41:52.480000+00:00,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15,153.0,0.0,0.2,...,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15,automatic,hv,hv,"Volcano, Volcano CDP, Hawaiʻi County, Hawaii, ..."
2023-11-15 07:44:53.035000+00:00,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,0,0.0,0.0,0.31,...,Others,earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Houston, Matanuska-Susitna, Alaska, 99694, Uni..."
2023-11-15 07:19:44.540000+00:00,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37,236.0,0.0,0.12,...,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5,automatic,hv,hv,United States


## Insights

### 1. What is the most common seismic event type during the specified time period?

We will attempt to investigate which seismic event was predominant during the period of time in consideration.

In [13]:
df.type.value_counts()

earthquake                18581
quarry blast                270
ice quake                   230
explosion                   134
other event                  17
volcanic eruption             9
experimental explosion        1
landslide                     1
mining explosion              1
Name: type, dtype: int64

**During the two-and-a-half-month period, seismic events were predominantly earthquakes, accounting for approximately 96% of the total occurrences.**

### 2. What magnitudes characterize the seismic events observed during the specified time period?

In [14]:
df.mag.describe()

count    19244.000000
mean         2.156412
std          1.141293
min          1.000000
25%          1.320000
50%          1.800000
75%          2.470000
max          7.100000
Name: mag, dtype: float64

**Between September 1st, 2023, and 9:30 am on November 15th, 2023, 19,244 seismic events were recorded. On average, these events had a magnitude of approximately 2.156 units, indicating a moderate strength. The variability around this average is captured by a standard deviation of 1.141. Quartiles reveal that the central 50% of earthquakes fell within the range of 1.320 to 2.470 units. The dataset shows a diverse range of magnitudes, with a minimum of 1.000 and a maximum of 7.100 units.**

### 3. What depth range characterizes the seismic events observed within the specified time frame?

In [15]:
df.depth.describe()

count    19244.000000
mean        33.060217
std         64.335900
min         -3.430000
25%          5.290000
50%         10.000000
75%         35.000000
max        675.265000
Name: depth, dtype: float64

**The dataset, comprising 19,244 seismic events during the specified period, shows an average depth of approximately 33.06 units. However, the data exhibits notable variability with a standard deviation of about 64.34 units. The median depth stands at 10.00 units, while the maximum recorded depth reaches 675.27 units, representing the deepest seismic event in the dataset.**

**A minimum depth of -3.43 units was recorded. Notably, in accordance with the U.S. Geological Survey (2018),  positive depths indicate downward from sea level, and negative depths indicate upward from sea level.**

Now, let us see the proportion on the seismic events that had negative depths

In [16]:
# proportion of seismic events with negative depths
df[df.depth < 0].shape[0]/df.shape[0] * 100

2.4007482851797963

**About  2.4% of the seismic events during this period occurred above sea level.**

### 4. Which seismic detection and location method prevailed during the specified period?

**According to the University of Alaska's Fairbanks (2018), seismic event location stages can be referred to as automatic, reviewed or Revised. The term "automatic" denotes events detected and reported by seismic monitoring systems without human intervention, whereas "reviewed" indicates events that have undergone manual inspection and validation by seismologists or experts. Revised refers to events that have undergone subsequent updates or adjustments after an initial report.**

**Therefore, leveraging the information in the status column, we will investigate the predominant seismic detection method during the specified time period**

In [17]:
df.status.value_counts()

reviewed     16801
automatic     2443
Name: status, dtype: int64

**Apparently, a greater percentage of the seismic events were manually inspected and validated.**

Let us examine the proportion of events that were manually inspected.

In [18]:
df[df.status == 'reviewed'].shape[0]/ df.shape[0] * 100

87.30513406776139

**About 87% of the seismic events were manually validated.**

### 5. Which geographical area is associated with the most seismic activity during the study timeframe?

To explore which geographical area is associated with the highest seismic activity during the study timeframe, we will conduct a count of seismic events per geographical location. This analysis aims to identify countries with a higher incidence of seismic occurrences compared to others.

To accomplish this task, we will execute a data splitting operation on the 'state_country' column. This operation aims to extract the country names corresponding to the geographical location where a seismic event occurred.

In [19]:
# spliting the values in the state_country column
df['country'] = df['state_country'].apply(lambda x: x.split(',')[-1].strip() if ',' in x else x.strip())

In [20]:
df.head()

Unnamed: 0_level_0,time.1,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,state_country,country
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,0,0.0,0.0,0.22,...,earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Beavertail Drive, Matanuska-Susitna, Alaska, U...",United States
2023-11-15 08:53:06.688000+00:00,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,0,0.0,0.0,0.8,...,earthquake,0.0,0.3,0.0,0,automatic,ak,ak,"Chugach, Alaska, United States",United States
2023-11-15 08:41:52.480000+00:00,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15,153.0,0.0,0.2,...,earthquake,0.33,0.38,0.59,15,automatic,hv,hv,"Volcano, Volcano CDP, Hawaiʻi County, Hawaii, ...",United States
2023-11-15 07:44:53.035000+00:00,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,0,0.0,0.0,0.31,...,earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Houston, Matanuska-Susitna, Alaska, 99694, Uni...",United States
2023-11-15 07:19:44.540000+00:00,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37,236.0,0.0,0.12,...,earthquake,0.71,0.89,0.88,5,automatic,hv,hv,United States,United States


Now, we will do a count to see the country that is most associated with siesmic activities during the timeframe

In [21]:
df.country.value_counts()

United States                     15158
No location found                  2324
Indonesia                           302
Canada                              189
Papua New Guinea                    165
                                  ...  
South Sudan                           1
Federated States of Micronesia        1
Oman                                  1
Zambia                                1
Uzbekistan                            1
Name: country, Length: 79, dtype: int64

**As observed earlier, the United States recorded the highest number of seismic events, totaling 15,158, constituting approximately 78% of the total events. Indonesia and Canada followed in frequency.**

# REFRENCES

The United States Geological Survey. (2018). Volcano Watch — Why do some earthquakes have negative depths? [Online], available: https://www.usgs.gov/observatories/hvo/news/volcano-watch-why-do-some-earthquakes-have-negative-         depths#:~:text=Positive%20depths%20indicate%20downward%20from,by%20Earth's%20gravity%20and%20rotation. [accessed 25 November, 2023] 

University of Alaska's Fairbanks. (2018). THREE STAGES OF EARTHQUAKE LOCATIONS: AUTOMATIC, REVIEWED, REVISED. [Online], available: https://earthquake.alaska.edu/three-stages-earthquake-locations-automatic-reviewed-revised. [accessed 25 November 2023]