## 🧭 Data Loading — USGS Global Earthquake Dataset (1900–Present)

This dataset contains detailed information on significant global earthquakes 
(magnitude ≥ 5.0) recorded since 1900. Sourced weekly from the United States 
Geological Survey (USGS), it includes comprehensive data such as timestamp, 
location (latitude & longitude), magnitude, depth, and measurement network details.

Earthquakes originate from tectonic activity across both highly seismic regions, such 
as the Pacific Ring of Fire, and less active zones like Europe and Africa. The dataset 
serves as a valuable resource for analyzing long-term seismic patterns, understanding 
geological behavior, and developing predictive models.

### Key Columns Explained:
- **time**: Timestamp in milliseconds since Unix epoch (UTC) representing event occurrence.
- **latitude / longitude**: Coordinates of the earthquake's epicenter (in decimal degrees).
- **depth**: Depth of the event in kilometers.
- **mag**: Reported earthquake magnitude.
- **magType**: Type of magnitude measurement (e.g., mb, ml, mw).
- **nst**: Number of stations used to compute the earthquake solution.
- **gap**: Largest azimuthal gap between stations, in degrees.
- **dmin**: Distance to the nearest recording station (in degrees).
- **rms**: Root-mean-square residual of seismic station readings.
- **net / id**: Network and unique event identifiers.
- **updated**: Last update timestamp for the event record.
- **place**: Human-readable location description.
- **type**: Event type (e.g., “earthquake”, “explosion”, “quarry blast”).
- **horizontalError / depthError / magError**: Uncertainty measurements of calculations.
- **status**: Event status within USGS catalog (e.g., “reviewed”, “automatic”).
- **locationSource / magSource**: Networks providing the location and magnitude data.

This dataset supports exploratory analyses, time-series pattern detection, and 
predictive-modeling of seismic events worldwide, making it highly applicable to 
earthquake research and risk assessment projects.

Source Link: https://www.kaggle.com/datasets/usamabuttar/significant-earthquakes?resource=download


In [None]:

# Example: Load USGS Earthquake Data
import pandas as pd

# Replace this path or URL with your actual dataset location
data_path = "earthquakes_data.csv"
earthquake_df = pd.read_csv(data_path)

# Display basic information and preview
earthquake_df.info()
earthquake_df.head()

## 🔍 Data Inspection — Exploring Earthquake Dataset Structure

In this stage, we perform an initial inspection of the earthquake data 
to understand its overall structure, completeness, and summary statistics.

Typical inspection steps include:
1. Viewing the dataset shape and column names.
2. Identifying data types and non-null counts.
3. Checking for missing values.
4. Reviewing unique entries in categorical columns.
5. Exploring basic descriptive statistics to detect scale variations or anomalies.

In [None]:
# View the dataset shape (rows × columns)
print("Dataset shape:", earthquake_df.shape)

In [None]:
# Display the first few records
display(earthquake_df.head())

In [None]:
# Display column names and their data types
print("\n--- Column Details ---")
earthquake_df.info()

In [None]:
# Check for missing values across all columns
print("\n--- Missing Values Summary ---")
missing_summary = earthquake_df.isnull().sum().sort_values(ascending=False)
display(missing_summary[missing_summary > 0])

In [None]:
# View unique event types and magnitude types
print("\n--- Unique Event Types ---")
print(earthquake_df["type"].unique())

In [None]:
print("\n--- Unique Magnitude Types ---")
print(earthquake_df["magType"].unique())

In [None]:
# Get statistical overview for numerical columns
print("\n--- Descriptive Statistics ---")
display(earthquake_df.describe())

In [None]:
# Example: Checking for duplicate records using unique ID
duplicate_count = earthquake_df.duplicated(subset=['id']).sum()
print(f"\nDuplicate Records Based on 'id': {duplicate_count}")