In [None]:
<VSCode.Cell language="markdown"># Reproducible Analysis & Reporting

This notebook reproduces the key analysis steps: loading the cleaned database, running anomaly detection, producing figures, and saving artifacts. Follow the instructions in the repository `README.md` to install dependencies and run the scripts.
</VSCode.Cell>
<VSCode.Cell language="python"># Section 1: Environment & Dependencies
import sys
import platform
print('Python:', sys.version)
print('Platform:', platform.platform())

# Show a minimal set of installed packages useful for this project
try:
    import pandas as pd, numpy as np, matplotlib
    print('pandas', pd.__version__)
    print('numpy', np.__version__)
    print('matplotlib', matplotlib.__version__)
except Exception as e:
    print('Some plotting/data packages are not installed:', e)
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 2: Project Files & Data
We expect `backend/instance/earthquakes.db` to exist. The script `backend/generate_report.py` generates figures in `reports/`.
</VSCode.Cell>
<VSCode.Cell language="python"># Section 3: Import Libraries & Set Random Seeds
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
</VSCode.Cell>
<VSCode.Cell language="python"># Section 4: Load and Inspect Data
import sqlite3, os
DB = os.path.join('..', 'backend', 'instance', 'earthquakes.db')
if not os.path.exists(DB):
    DB = os.path.join('backend', 'instance', 'earthquakes.db')
print('Using DB:', DB)
conn = sqlite3.connect(DB)
df = pd.read_sql_query('SELECT date, time, lat, lng, depth, mag, location FROM Earthquake', conn)
conn.close()
print('Rows:', len(df))
df.head()
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 5: Data Cleaning & Preprocessing
The project cleaning steps are implemented in the backend scraping scripts; here we do minimal checks and conversions.
</VSCode.Cell>
<VSCode.Cell language="python"># Convert types and check missing
df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
df['lng'] = pd.to_numeric(df['lng'], errors='coerce')
df['mag'] = pd.to_numeric(df['mag'], errors='coerce')
print(df[['lat','lng','mag']].isna().sum())
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 6: Exploratory Data Analysis (EDA)</VSCode.Cell>
<VSCode.Cell language="python"># Simple EDA plots
plt.figure(figsize=(8,4))
sns.histplot(df['mag'].dropna(), bins=30)
plt.title('Magnitude distribution')
plt.show()
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 7: Feature Engineering
Create a depth in km column and a datetime index for later models.</VSCode.Cell>
<VSCode.Cell language="python">df['depth_km'] = pd.to_numeric(df['depth'], errors='coerce')
df['depth_km'] = df['depth_km'].abs()
# merge date+time to datetime
try:
    df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], errors='coerce')
except Exception:
    df['datetime'] = pd.NaT

print(df[['depth_km','datetime']].head())
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 8-12: Modeling & Evaluation (Anomaly Detection)
We use the project's `analysis.detect_anomalies` implementation (IsolationForest). This notebook demonstrates usage and basic evaluation.
</VSCode.Cell>
<VSCode.Cell language="python">from backend.analysis import detect_anomalies
res = detect_anomalies(df.copy())
print('Anomalies:', res['is_anomaly'].sum())
res.head()
</VSCode.Cell>
<VSCode.Cell language="python"># Plot anomalies on scatter
plt.figure(figsize=(8,6))
norm = res[res['is_anomaly']!=True]
anom = res[res['is_anomaly']==True]
plt.scatter(norm['lng'], norm['lat'], s=8, c='C0', label='normal')
plt.scatter(anom['lng'], anom['lat'], s=20, c='red', label='anomaly')
plt.legend(); plt.title('Event locations (anomalies in red)')
plt.show()
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 13: Save Model / Artifacts
The notebook shows how to run `backend/generate_report.py` to save figures; run the script and check `reports/`.
</VSCode.Cell>
<VSCode.Cell language="python"># run the script if available
if os.path.exists('backend/generate_report.py'):
    print('Running report script...')
    !python backend/generate_report.py
else:
    print('Script backend/generate_report.py not found.')
</VSCode.Cell>
<VSCode.Cell language="markdown">## Section 14-16: Tests, Profiling, Quality
Include short unit tests and timing checks in `tests/` for reproducibility; linting can be run with `flake8`.
</VSCode.Cell>