Title Page / Header Cell 

Module: CT7201 Python Notebooks and Scripting 

Assignment Title: Earthquake Data Analysis Using Python Scripting (2023 Global Dataset) 

Student Names & IDs 

Date of submission 

Tutor Name 


Executive Summary (Short Overview) 

A single short paragraph that explains: 

What the project does 

What the dataset contains 

What analyses and models you will build 

The purpose of your Python scripting/OOP 

The key findings (a preview) 

Introduction 

Explain: 

Why we analyse earthquakes 

Why Python scripting is appropriate 

The importance of visualisation, functions, and clean coding 

A short explanation of what will be done in the notebook 

Dataset Description 

Cover: 

Source: USGS Earthquake Hazards Program 

Scope: Global events in 2023 

Number of records and variables 

Key fields (time, magnitude, depth, location, errors, network) 

Why this dataset is suitable for scripting and analysis 

Project Objectives 

Write them cleanly and academically: 

Load, clean, and prepare the earthquake dataset using Python scripting. 

Implement functions and modular code to automate analysis steps. 

Perform univariate, bivariate, and multivariate analysis. 

Produce clear and readable visualisations using matplotlib/seaborn. 

Implement a 3D visualisation using Python libraries. 

Build a simple machine learning model (classification or clustering). 

Demonstrate good programming practice, clarity, modularity, and documentation. 

 Methodology 

A clear step-by-step description of the workflow: 

Import libraries 

Load raw CSV 

Clean data and handle missing values 

Engineer additional features 

Perform exploratory analysis (EDA) 

Build visualisations 

Train and evaluate a simple ML model 

Interpret outputs 

Conclude findings 

Python Scripting & Functions Section 

CT7201 marks heavily focus on scripting. 

You MUST: 

✔ Create multiple custom Python functions: 

load_data() 

clean_data() 

engineer_features() 

plot_magnitude_distribution() 

plot_depth_boxplot() 

calculate_correlations() 

build_classifier() 

plot_3D_scatter() 

✔ Use: 

docstrings 

comments 

parameters 

return values 

✔ Avoid: 

long messy code cells 

repeating the same code 

In [8]:
# importing the libraries
import pandas as pd
import numpy as np
import matplotlib

In [10]:
# load the raw csv file
eq_df = pd.read_csv ('earthquake_dataset.csv')
eq_df.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2023-01-01T00:49:25.294Z,52.0999,178.5218,82.77,3.1,ml,14.0,139.0,0.87,0.18,...,2023-03-11T22:51:52.040Z,"Rat Islands, Aleutian Islands, Alaska",earthquake,8.46,21.213,0.097,14.0,reviewed,us,us
1,2023-01-01T01:41:43.755Z,7.1397,126.738,79.194,4.5,mb,32.0,104.0,1.152,0.47,...,2023-03-11T22:51:45.040Z,"23 km ESE of Manay, Philippines",earthquake,5.51,7.445,0.083,43.0,reviewed,us,us
2,2023-01-01T03:29:31.070Z,19.1631,-66.5251,24.0,3.93,md,23.0,246.0,0.8479,0.22,...,2023-03-11T22:51:29.040Z,Puerto Rico region,earthquake,0.91,15.95,0.09,16.0,reviewed,pr,pr
3,2023-01-01T04:09:32.814Z,-4.7803,102.7675,63.787,4.3,mb,17.0,187.0,0.457,0.51,...,2023-03-11T22:51:45.040Z,"99 km SSW of Pagar Alam, Indonesia",earthquake,10.25,6.579,0.238,5.0,reviewed,us,us
4,2023-01-01T04:29:13.793Z,53.3965,-166.9417,10.0,3.0,ml,19.0,190.0,0.4,0.31,...,2023-03-11T22:51:38.040Z,"59 km SSW of Unalaska, Alaska",earthquake,1.41,1.999,0.085,18.0,reviewed,us,us


In [11]:
eq_df.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,26642.0,26642.0,26642.0,26642.0,25227.0,25225.0,24776.0,26642.0,25093.0,26642.0,24970.0,25065.0
mean,16.852798,-11.487497,67.491224,4.007395,42.571332,124.930971,2.692908,0.581575,7.017267,4.475056,0.122735,33.315939
std,30.3892,130.053399,116.762456,0.794423,37.662352,67.430145,4.043568,0.256276,4.072365,4.451649,0.102271,48.022567
min,-65.8497,-179.9987,-3.37,2.6,0.0,8.0,0.0,0.01,0.0,0.0,0.0,0.0
25%,-6.415275,-149.60865,10.0,3.22,19.0,73.0,0.612,0.41,4.14,1.848,0.08,10.0
50%,18.884167,-64.811833,21.998,4.3,30.0,111.0,1.579,0.59,7.06,2.019,0.111,18.0
75%,41.82795,126.9651,66.833,4.5,52.0,165.0,3.172,0.75,9.73,6.669,0.15,36.0
max,86.5939,179.9994,681.238,7.8,423.0,350.0,50.82,1.88,99.0,60.67,4.49,884.0


In [12]:
eq_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26642 entries, 0 to 26641
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   time             26642 non-null  object 
 1   latitude         26642 non-null  float64
 2   longitude        26642 non-null  float64
 3   depth            26642 non-null  float64
 4   mag              26642 non-null  float64
 5   magType          26642 non-null  object 
 6   nst              25227 non-null  float64
 7   gap              25225 non-null  float64
 8   dmin             24776 non-null  float64
 9   rms              26642 non-null  float64
 10  net              26642 non-null  object 
 11  id               26642 non-null  object 
 12  updated          26642 non-null  object 
 13  place            25034 non-null  object 
 14  type             26642 non-null  object 
 15  horizontalError  25093 non-null  float64
 16  depthError       26642 non-null  float64
 17  magError    

In [None]:
# show individual functions in these blocks

load_data()



In [None]:
clean_data()
# handle missing values with justification
# remove any deduplications with justification

In [None]:
engineer_features() 

In [None]:
plot_magnitude_distribution()

In [None]:
plot_depth_boxplot()

Object-Oriented Programming Section  

Create at least two classes: 

Class 1: EarthquakeDataset 

Handles: 

Loading 

Cleaning 

Feature engineering 

Class 2: EarthquakeVisualizer 

Handles: 

All plots (KDE, boxplots, scatter, violin, heatmap, 3D) 

Optional Class: EarthquakeModel 

Handles: 

Classification 

Clustering 

You must show good encapsulation, methods, attributes, and documentation. 

A-grade notebooks always demonstrate clear OOP. 

Data Preparation Section 

Using your functions & classes: 

Load dataset 

Convert timestamps 

Handle missing values 

Clean column types 

Feature engineering (month, depth class, strong quake flag) 

Show: 

head() 

info() 

describe() 

Include short commentary below each output, not above. 

Univariate Analysis Section 

Follow A-grade style: 

For each variable: 

One sentence explaining why this variable matters 

One function call to create the plot 

Short interpretation paragraph 

Variables: 

Magnitude (KDE) 

Depth (boxplot + histogram) 

NSt or magNst 

Dmin 

Monthly or daily counts 

Keep code clean and modular. 

Bivariate Analysis Section 

Use the same pattern: 

explain → code → visual → interpretation 

Analyses: 

Magnitude vs Depth + correlation 

Depth vs DepthError 

Net vs Depth (violin) + ANOVA 

Time vs Magnitude 

MagType vs Net (heatmap)

3D Visualisation Section (CT7201 Bonus Marks) 

Implement at least one 3D visual: 

3D scatter plot 

(lat, lon, depth coloured by magnitude) 

Optional: 

3D clustering visual 

3D classification decision boundary 

This significantly boosts grade potential because it shows advanced scripting.

Machine Learning Section 

The ML part doesn’t need to be huge: 

Recommended simple model: 

Logistic Regression 
 OR 

Decision Tree 
 OR 

K-means clustering 

Must include: 

Train/test split 

Confusion matrix or cluster summary 

Short interpretation 

This satisfies the “application of algorithms” requirement. 

Discussion Section 

Summarise: 

global earthquake patterns 

spatial trends 

magnitude–depth relationships 

model performance 

limitations 

strengths of scripting approach 

Write academically. 

Conclusion Section 

Short, direct summary: 

what was achieved 

key insights 

usefulness of Python scripting 

References 

USGS 

Python libraries 

Any academic papers if used 