# Earthquake Data Analysis - Country Summary

This notebook loads processed earthquake data (gold layer), performs aggregations to summarize data by country and significance class, and saves the results.

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg, col

## Initialize SparkSession
Get or create a SparkSession.

In [None]:
spark = SparkSession.builder.appName("EarthquakeCountrySummary").getOrCreate()

## Define Input/Output Paths
Specify the paths for the gold data and the output summary. 
**Note:** These paths might need to be updated based on the actual Databricks environment setup (e.g., different mount points or direct ADLS paths like `abfss://<container>@<storageaccount>.dfs.core.windows.net/...`).

In [None]:
gold_data_path = "/mnt/datalake/gold/earthquake_events_gold/"
output_path = "/mnt/datalake/gold/country_summary/"

## Load Gold Data
Read the processed earthquake data from the gold layer (Parquet format).

In [None]:
gold_df = spark.read.parquet(gold_data_path)

### Optional: Display Schema and Sample Data
Uncomment the lines below to display the schema and a sample of the loaded data to verify.

In [None]:
# gold_df.printSchema()
# gold_df.show(5, truncate=False)

## Perform Aggregations
Group the data by `country_code` and `sig_class` (significance class), then calculate the average magnitude and average depth for each group.

In [None]:
summary_df = gold_df.groupBy("country_code", "sig_class") \
    .agg( \
        avg("mag").alias("avg_magnitude"), \
        avg("depth").alias("avg_depth") \
    ) \
    .orderBy("country_code", "sig_class")

### Optional: Display Aggregated Data
Uncomment the line below to display a sample of the aggregated data.

In [None]:
# summary_df.show(10, truncate=False)

## Save Output
Write the aggregated summary DataFrame to the specified output path in Parquet format, overwriting if it already exists.

In [None]:
summary_df.write.mode("overwrite").parquet(output_path)

## Final Confirmation

In [None]:
print(f"Country summary data saved successfully to {output_path}")