# Gold Top Item Analytics Notebook

This notebook builds a gold-level analytics mart for top item views by platform and year. The workflow includes:
* Importing required libraries
* Loading and cleaning the silver event data from [workspace.silver_layer.silver_event](#table)
* Filtering for item view events and aggregating views per item/year
* Identifying the most used platform for each item/year
* Ranking items by popularity within each year
* Saving the final results to [workspace.gold_layer.gold_top_item](#table)

Use this notebook to analyze item popularity trends and platform usage over time.

### Import Libraries

In [0]:
from pyspark.sql import functions as F
from pyspark.sql.window import Window

### Define Function

### Load Data

In [0]:
# Load silver_event table from workspace.silver_layer without technical columns
technical_cols = ["tc_ingestion_timestamp", "tc_source_file", "tc_bronze_id"]
silver_event_df = spark.read.table("workspace.silver_layer.silver_event").drop(*technical_cols)

# Display heads for verification
print("Silver Event Table Sample:")
display(silver_event_df.limit(10))

In [0]:
# Top Item Analytics from silver_event_df
# 1️⃣ Filter for item views only (event_name == 'view_item')
df_silver_view = silver_event_df.filter(F.col("event_name") == "view_item")

# 2️⃣ Total views per item per year
views_df = (
    df_silver_view.groupBy("event_parameter_value", "event_year")
      .agg(F.count("*").alias("total_views"))
      .withColumnRenamed("event_parameter_value", "item_id")
)

# 3️⃣ Most used platform per item/year
platform_counts = (
    df_silver_view.groupBy("event_parameter_value", "event_year", "event_platform")
      .agg(F.count("*").alias("platform_views"))
)

platform_window = (
    Window.partitionBy("event_parameter_value", "event_year")
          .orderBy(F.desc("platform_views"))
)

top_platform_df = (
    platform_counts
        .withColumn("rn", F.row_number().over(platform_window))
        .filter("rn = 1")
        .select(
            F.col("event_parameter_value").alias("item_id"),
            "event_year",
            "event_platform"
        )
)

# 4️⃣ Rank items within each year based on total views
rank_window = Window.partitionBy("event_year").orderBy(F.desc("total_views"))

ranked_df = views_df.withColumn("item_rank", F.rank().over(rank_window))

# 5️⃣ Join into final mart
top_item_df = (
    ranked_df.join(top_platform_df, ["item_id", "event_year"], "left")
)

### Save Table

In [0]:
# Save Gold table
top_item_df.write.mode("overwrite").saveAsTable("workspace.gold_layer.gold_top_item")